Introduction à Pandas

Pandas est l’élément central de l’écosystème Python pour la data science. Le succès récent de Python dans l’analyse de données tient beaucoup à Pandas qui a permis d’importer la logique SQL dans le langage Python. Pandas embarque énormément de fonctionalités qui permettent d’avoir des chaînes de traitement efficaces pour traiter des données de volumétrie moyenne (jusqu’à quelques Gigas). Au-delà de cette volumétrie, il faudra se tourner vers d’autres solutions (DuckDB, Dask, Polars, Spark…).

Tutoriel
Manipulation
Auteur·rice

Lino Galiana

Date de publication

2024-11-20

La partie Pandas a évolué récemment. Vous pouvez retrouver les contenus liés à Pandas dans les chapitres suivants:

Informations additionnelles

environment files have been tested on.

Latest built version: 2024-11-20

Python version used:

'3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0]'
Package Version
affine 2.4.0
aiobotocore 2.15.1
aiohappyeyeballs 2.4.3
aiohttp 3.10.8
aioitertools 0.12.0
aiosignal 1.3.1
alembic 1.13.3
altair 5.4.1
aniso8601 9.0.1
annotated-types 0.7.0
appdirs 1.4.4
archspec 0.2.3
asttokens 2.4.1
attrs 24.2.0
babel 2.16.0
bcrypt 4.2.0
beautifulsoup4 4.12.3
black 24.8.0
blinker 1.8.2
blis 0.7.11
bokeh 3.5.2
boltons 24.0.0
boto3 1.35.23
botocore 1.35.23
branca 0.7.2
Brotli 1.1.0
cachetools 5.5.0
cartiflette 0.0.2
Cartopy 0.24.1
catalogue 2.0.10
cattrs 24.1.2
certifi 2024.8.30
cffi 1.17.1
charset-normalizer 3.3.2
click 8.1.7
click-plugins 1.1.1
cligj 0.7.2
cloudpathlib 0.20.0
cloudpickle 3.0.0
colorama 0.4.6
comm 0.2.2
commonmark 0.9.1
conda 24.9.1
conda-libmamba-solver 24.7.0
conda-package-handling 2.3.0
conda_package_streaming 0.10.0
confection 0.1.5
contextily 1.6.2
contourpy 1.3.0
cryptography 43.0.1
cycler 0.12.1
cymem 2.0.8
cytoolz 1.0.0
dask 2024.9.1
dask-expr 1.1.15
databricks-sdk 0.33.0
debugpy 1.8.6
decorator 5.1.1
Deprecated 1.2.14
diskcache 5.6.3
distributed 2024.9.1
distro 1.9.0
docker 7.1.0
duckdb 0.10.1
en-core-web-sm 3.7.1
entrypoints 0.4
et_xmlfile 2.0.0
exceptiongroup 1.2.2
executing 2.1.0
fastexcel 0.11.6
fastjsonschema 2.20.0
fiona 1.10.1
Flask 3.0.3
folium 0.17.0
fontawesomefree 6.6.0
fonttools 4.54.1
frozendict 2.4.4
frozenlist 1.4.1
fsspec 2023.12.2
gensim 4.3.2
geographiclib 2.0
geopandas 1.0.1
geoplot 0.5.1
geopy 2.4.1
gitdb 4.0.11
GitPython 3.1.43
google-auth 2.35.0
graphene 3.3
graphql-core 3.2.4
graphql-relay 3.2.0
graphviz 0.20.3
great-tables 0.12.0
greenlet 3.1.1
gunicorn 22.0.0
h2 4.1.0
hpack 4.0.0
htmltools 0.6.0
hyperframe 6.0.1
idna 3.10
imageio 2.36.0
importlib_metadata 8.5.0
importlib_resources 6.4.5
inflate64 1.0.0
ipykernel 6.29.5
ipython 8.28.0
itsdangerous 2.2.0
jedi 0.19.1
Jinja2 3.1.4
jmespath 1.0.1
joblib 1.4.2
jsonpatch 1.33
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
jupyter-cache 1.0.0
jupyter_client 8.6.3
jupyter_core 5.7.2
kaleido 0.2.1
kiwisolver 1.4.7
langcodes 3.5.0
language_data 1.3.0
lazy_loader 0.4
libmambapy 1.5.9
locket 1.0.0
lxml 5.3.0
lz4 4.3.3
Mako 1.3.5
mamba 1.5.9
mapclassify 2.8.1
marisa-trie 1.2.1
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
matplotlib-inline 0.1.7
mdurl 0.1.2
menuinst 2.1.2
mercantile 1.2.1
mizani 0.11.4
mlflow 2.16.2
mlflow-skinny 2.16.2
msgpack 1.1.0
multidict 6.1.0
multivolumefile 0.2.3
munkres 1.1.4
murmurhash 1.0.10
mypy-extensions 1.0.0
narwhals 1.14.1
nbclient 0.10.0
nbformat 5.10.4
nest_asyncio 1.6.0
networkx 3.3
nltk 3.9.1
numpy 1.26.4
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
opentelemetry-api 1.16.0
opentelemetry-sdk 1.16.0
opentelemetry-semantic-conventions 0.37b0
OWSLib 0.28.1
packaging 24.1
pandas 2.2.3
paramiko 3.5.0
parso 0.8.4
partd 1.4.2
pathspec 0.12.1
patsy 0.5.6
Pebble 5.0.7
pexpect 4.9.0
pickleshare 0.7.5
pillow 10.4.0
pip 24.2
platformdirs 4.3.6
plotly 5.24.1
plotnine 0.13.6
pluggy 1.5.0
polars 1.8.2
preshed 3.0.9
prometheus_client 0.21.0
prometheus_flask_exporter 0.23.1
prompt_toolkit 3.0.48
protobuf 4.25.3
psutil 6.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
py7zr 0.20.8
pyarrow 17.0.0
pyarrow-hotfix 0.6
pyasn1 0.6.1
pyasn1_modules 0.4.1
pybcj 1.0.2
pycosat 0.6.6
pycparser 2.22
pycryptodomex 3.21.0
pydantic 2.9.2
pydantic_core 2.23.4
Pygments 2.18.0
PyNaCl 1.5.0
pynsee 0.1.8
pyogrio 0.10.0
pyOpenSSL 24.2.1
pyparsing 3.1.4
pyppmd 1.1.0
pyproj 3.7.0
pyshp 2.3.1
PySocks 1.7.1
python-dateutil 2.9.0
python-dotenv 1.0.1
python-magic 0.4.27
pytz 2024.1
pyu2f 0.1.5
pywaffle 1.1.1
PyYAML 6.0.2
pyzmq 26.2.0
pyzstd 0.16.2
querystring_parser 1.2.4
rasterio 1.4.2
referencing 0.35.1
regex 2024.9.11
requests 2.32.3
requests-cache 1.2.1
retrying 1.3.4
rich 13.9.4
rpds-py 0.21.0
rsa 4.9
ruamel.yaml 0.18.6
ruamel.yaml.clib 0.2.8
s3fs 2023.12.2
s3transfer 0.10.2
scikit-image 0.24.0
scikit-learn 1.5.2
scipy 1.13.0
seaborn 0.13.2
setuptools 74.1.2
shapely 2.0.6
shellingham 1.5.4
six 1.16.0
smart-open 7.0.5
smmap 5.0.0
sortedcontainers 2.4.0
soupsieve 2.5
spacy 3.7.5
spacy-legacy 3.0.12
spacy-loggers 1.0.5
SQLAlchemy 2.0.35
sqlparse 0.5.1
srsly 2.4.8
stack-data 0.6.2
statsmodels 0.14.4
tabulate 0.9.0
tblib 3.0.0
tenacity 9.0.0
texttable 1.7.0
thinc 8.2.5
threadpoolctl 3.5.0
tifffile 2024.9.20
toolz 1.0.0
topojson 1.9
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
truststore 0.9.2
typer 0.13.1
typing_extensions 4.12.2
tzdata 2024.2
Unidecode 1.3.8
url-normalize 1.4.3
urllib3 1.26.20
wasabi 1.1.3
wcwidth 0.2.13
weasel 0.4.1
webdriver-manager 4.0.2
websocket-client 1.8.0
Werkzeug 3.0.4
wheel 0.44.0
wordcloud 1.9.3
wrapt 1.16.0
xgboost 2.1.1
xlrd 2.0.1
xyzservices 2024.9.0
yarl 1.13.1
yellowbrick 1.5
zict 3.0.0
zipp 3.20.2
zstandard 0.23.0

View file history

SHA Date Author Description
e0d615e 2024-05-03 11:15:29 Lino Galiana Restructure la partie Pandas (#497)
c9f9f8a 2024-04-24 15:09:35 Lino Galiana Dark mode and CSS improvements (#494)
d75641d 2024-04-22 18:59:01 Lino Galiana Editorialisation des chapitres de manipulation de données (#491)
c03aa61 2024-01-16 17:33:18 Lino Galiana Exercice sur les chemins relatifs (#483)
056c606 2023-12-20 20:08:25 linogaliana Change pandas image
005d89b 2023-12-20 17:23:04 Lino Galiana Finalise l’affichage des statistiques Git (#478)
3fba612 2023-12-17 18:16:42 Lino Galiana Remove some badges from python (#476)
1684220 2023-12-02 12:06:40 Antoine Palazzolo Première partie de relecture de fin du cours (#467)
1f23de2 2023-12-01 17:25:36 Lino Galiana Stockage des images sur S3 (#466)
a06a268 2023-11-23 18:23:28 Antoine Palazzolo 2ème relectures chapitres ML (#457)
09654c7 2023-11-14 15:16:44 Antoine Palazzolo Suggestions Git & Visualisation (#449)
cef6a0d 2023-10-18 13:18:46 Lino Galiana Allègement des actions github (#437)
97676f5 2023-10-14 17:56:44 Lino Galiana Du style pour le site (#434)
7221e7b 2023-10-10 14:00:44 Thomas Faria Relecture Thomas TD Pandas (#431)
a771183 2023-10-09 11:27:45 Antoine Palazzolo Relecture TD2 par Antoine (#418)
ac80862 2023-10-07 21:05:25 Lino Galiana Relecture antuki (#427)
7e03cea 2023-10-04 14:07:17 Lino Galiana Clean pandas tutorial and exercises (#417)
e8d0062 2023-09-26 15:54:49 Kim A Relecture KA 25/09/2023 (#412)
154f09e 2023-09-26 14:59:11 Antoine Palazzolo Des typos corrigées par Antoine (#411)
9a4e226 2023-08-28 17:11:52 Lino Galiana Action to check URL still exist (#399)
8082302 2023-08-25 17:48:36 Lino Galiana Mise à jour des scripts de construction des notebooks (#395)
3bdf3b0 2023-08-25 11:23:02 Lino Galiana Simplification de la structure 🤓 (#393)
c312bdc 2023-08-11 18:06:25 Lino Galiana A few controls for Quarto website (#389)
5d4874a 2023-08-11 15:09:33 Lino Galiana Pimp les introductions des trois premières parties (#387)
dde3e93 2023-07-21 22:22:05 Lino Galiana Fix bug on chapter order (#385)
3560f1f 2023-07-21 17:04:56 Lino Galiana Build on smaller sized image (#384)
f146354 2023-07-21 18:15:10 Lino Galiana Update index.qmd
f6dde33 2023-07-18 22:32:00 Lino Galiana Change badges (#376)
143e706 2023-07-18 19:37:28 Lino Galiana Améliore la navigation (#375)
130ed71 2023-07-18 19:37:11 Lino Galiana Restructure les titres (#374)
ef28fef 2023-07-07 08:14:42 Lino Galiana Listing pour la première partie (#369)
64baaf8 2023-07-03 17:05:53 Lino Galiana Script for branch deploy (#367)
f21a24d 2023-07-02 10:58:15 Lino Galiana Pipeline Quarto & Pages 🚀 (#365)
867325e 2023-06-11 13:56:43 Lino Galiana Add numeric_only argument (#359)
9918817 2022-12-30 15:10:59 Lino Galiana Retour sur le chapitre DallE / StableDiffusion (#344)
94e7c0a 2022-12-29 09:42:35 Lino Galiana pip install pynsee (#342)
a8dd720 2022-12-26 21:35:52 Lino Galiana Improve aesthetics on Github (#338)
e2b53ac 2022-09-28 17:09:31 Lino Galiana Retouche les chapitres pandas (#287)
eb8f922 2022-09-22 17:40:43 Lino Galiana Corrige bug TP pandas (#276)
fd439f0 2022-09-19 09:37:50 avouacr fix ssp cloud links
3056d41 2022-09-02 12:19:55 avouacr fix all SSP Cloud launcher links
8042a16 2022-08-24 16:23:36 Lino Galiana Box pour les notebooks :sparkles: (#256)
494a85a 2022-08-05 14:49:56 Lino Galiana Images featured ✨ (#252)
d201e3c 2022-08-03 15:50:34 Lino Galiana Pimp la homepage ✨ (#249)
2360ff7 2022-08-02 16:29:57 Lino Galiana Test wowchemy update (#247)
d3a5406 2022-06-27 17:44:30 Lino Galiana Utilisation test du système de référence de quarto (#240)
1239e3e 2022-06-21 14:05:15 Lino Galiana Enonces (#239)
48606dd 2022-05-31 19:05:11 Lino Galiana Amélioration rendu dataframe pandas (#229)
12965ba 2022-05-25 15:53:27 Lino Galiana :launch: Bascule vers quarto (#226)
9c71d6e 2022-03-08 10:34:26 Lino Galiana Plus d’éléments sur S3 (#218)
5cac236 2021-12-16 19:46:43 Lino Galiana un petit mot sur mercator (#201)
6777f03 2021-10-29 09:38:09 Lino Galiana Notebooks corrections (#171)
2a8809f 2021-10-27 12:05:34 Lino Galiana Simplification des hooks pour gagner en flexibilité et clarté (#166)
5ad057f 2021-10-10 15:13:16 Lino Galiana Relectures pandas & geopandas (#159)
4870662 2021-10-05 08:29:33 Romain Avouac fix and simplify pyinsee install (#157)
0677932 2021-10-03 15:32:51 Lino Galiana Ajoute un code pour download pynsee (#156)
2fa78c9 2021-09-27 11:24:19 Lino Galiana Relecture de la partie numpy/pandas (#152)
85ba119 2021-09-16 11:27:56 Lino Galiana Relectures des TP KA avant 1er cours (#142)
2f4d390 2021-09-02 15:12:29 Lino Galiana Utilise un shortcode github (#131)
2e4d586 2021-09-02 12:03:39 Lino Galiana Simplify badges generation (#130)
4a317e3 2021-08-31 12:38:17 Lino Galiana pynsee pour importer des données Insee 🚀 (#127)
2f7b52d 2021-07-20 17:37:03 Lino Galiana Improve notebooks automatic creation (#120)
6729a72 2021-06-22 18:07:05 Lino Galiana Mise à jour badge onyxia (#115)
4cdb759 2021-05-12 10:37:23 Lino Galiana :sparkles: :star2: Nouveau thème hugo :snake: :fire: (#105)
175d377 2021-05-04 18:29:26 Raphaele Adjerad Quelques manipulations supplémentaires pandas (#106)
7f9f97b 2021-04-30 21:44:04 Lino Galiana 🐳 + 🐍 New workflow (docker 🐳) and new dataset for modelization (2020 🇺🇸 elections) (#99)
0a0d034 2021-03-26 20:16:22 Lino Galiana Ajout d’une section sur S3 (#97)
6d010fa 2020-09-29 18:45:34 Lino Galiana Simplifie l’arborescence du site, partie 1 (#57)
66f9f87 2020-09-24 19:23:04 Lino Galiana Introduction des figures générées par python dans le site (#52)
76e206c 2020-09-09 18:02:08 Lino Galiana Finalisation du chapitre pandas (#24)
5c1e76d 2020-09-09 11:25:38 Lino Galiana Ajout des éléments webscraping, regex, API (#21)
d48e68f 2020-09-08 18:35:07 Lino Galiana Continuer la partie pandas (#13)
85365ca 2020-09-05 14:50:10 linogaliana ajout badges onyxia
611be4d 2020-09-05 14:27:47 linogaliana modifs marginales
0559398 2020-09-05 14:22:55 linogaliana modifs marginales
9c12c2c 2020-09-04 17:39:09 Lino Galiana Introduction à pandas (#11)
Retour au sommet

Citation

BibTeX
@book{galiana2023,
  author = {Galiana, Lino},
  title = {Python pour la data science},
  date = {2023},
  url = {https://pythonds.linogaliana.fr/},
  doi = {10.5281/zenodo.8229676},
  langid = {fr}
}
Veuillez citer ce travail comme suit :
Galiana, Lino. 2023. Python pour la data science. https://doi.org/10.5281/zenodo.8229676.