Pandas and Numpy, the first packages of our introductory journey,
are essential for manipulating data. However, it is important not to overlook the fundamentals of the Python language when discovering it. A good understanding of the fundamental elements of the language helps to better grasp the logic of data science packages, understand the errors encountered, and results in greater productivity and freedom.
To explore basic objects and the structure of the language, a series of notebooks is provided below. The course is flexible; you can work through these notebooks in any order or only complete parts of them if you are already familiar with some of the content.
Once you’ve reviewed the material, you will find summary exercises to help you put your knowledge of data structures into practice. Once you have done these, you will find the rest of the course in the “Handling data” section.
1 Review notebooks
2 Synthesis exercises
To make sure you are fluent on data structures and operations in Python for different problems, here is a series of exercises.
They require you to think hard about the appropriate data structure to answer - that’s normal!
For once, the exercises are done directly on this page rather than via notebooks1. The correction will soon be available on the dedicated page.
2.1 Exercice 1
You have at your disposal a list of quotations from the magnificent French literary heritage:
See the list of quotations that will be useful
| Citation | Auteur |
|---|---|
| “Rien ne sert de courir ; il faut partir à point.” | La Fontaine |
| “Selon que vous serez puissant ou misérable, les jugements de cour vous rendront blanc ou noir.” | La Fontaine |
| “Heureux qui comme Ulysse a fait bon voyage” | Du Bellay |
| “L’homme est né libre, et partout il est dans les fers.” | Rousseau |
| “Parce que c’était lui, parce que c’était moi.” | Montaigne |
| “La première fois qu’Aurélien vit Bérénice, il la trouva franchement laide” | Aragon |
| “Aujourd’hui maman est morte. Ou peut-être hier, je ne sais pas.” | Camus |
Create a citations object that uses a suitable data structure in Python to:
- Easily retrieve all quotations associated with an author. Create a
citation_aragonobject that tests the validity of your approach with the Aragon example. - For each quotation, count the number of words and the number of unique words. Create a
stats_phrasesobject that lists these properties. Test on La Fontaine quotations.
Here are the quotes, in bulk, to get you started.
In this cell, create the appropriate data structure for this exercise.
What data structure would enable this type of code?
citations.get("La Fontaine")What data structure could make an object look like
For question 1, you can use this cell.
For question 2, you can use this cell.
How do you iterate over the keys and values of your stats_phrases dict?
Adopt this general structure :
for auteur, liste_citations in citations.items():
# do somethingHow to have the length of a sentence (sequence of words) and single words in it?
How can I get the length of a sentence (sequence of words) and the single words in it?
- A sentence (string) can be broken down into words (list) using the
split()method. - A
setcan be used to deduplicate a list
You should get this type of result:
Ideally, you should pay attention to punctuation.
But we have not yet learned how to make refined substitutions in strings - that will come in another chapter.
2.2 Exercise 2
Advent of Code challenges are excellent practical problems for learning algorithms. With one problem per day of increasing difficulty between 1 December and Christmas Day, you will quickly become comfortable with the many data structures offered by Python.
The next exercise proposes solving the first part of day 1 (year 2022) using only basic Python.
It would be appropriate to use Numpy for the next problem. But the purpose of the next chapter is to show how this package simplifies basic numerical operations.
Here are the objects we will need for this exercise:
You can write your code for question 2 in this cell:
We will need to iterate over each value of elves_example: how can we transform the character string into a more manageable object?
Be careful with the type of object you will obtain. Don’t forget to convert it to a numeric format if you want to perform arithmetic operations.
You can write your code for question 3 in this cell:
Finally, after generalizing, you should find the following values:
Elf number 217 carries the most calories (72602 calories).
2.3 Exercise 3: strings
This is the final pattern you should end up in question 1 and 2
On 17 March 2022, Nadia (34) reported an income of $33,333.33.
2.4 Exercise 3
- Using the
formatmethod, try to obtain the expected result. - Do the same with f-strings.
- Modify the message to create a monthly income rounded to 1 digit. Use a space as the thousands separator rather than the default separator (the comma).
- Here is a fictitious path
C:\Users\Nadia\Documents\cours\chapitre_03\notes.txtto a file. Make a correct print of it.
In the first part of this exercise, you should have these results.
Question 1:
On 17 March 2022, Nadia (34) reported an income of $33,333.33.
Question 2:
On 17 March 2022, Nadia (34) reported an income of $33,333.33.
Question 3:
On 17 March 2022 Nadia (34) reported an income of $2 777.8.
Try replicating these results using interactive cell below
In the second part of the exercise, if you use the wrong type of string, you should get an error:
Cell In[14], line 2 chemin_bad = "C:\Users\Nadia\Documents\cours\chapitre_03\notes.txt" ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Slashes often cause problems. With raw strings, you should not encounter this problem:
File can be found here: C:\Users\Nadia\Documents\cours\chapitre_03\notes.txt
Informations additionnelles
This site was built automatically through a Github action using the Quarto
The environment used to obtain the results is reproducible via uv. The pyproject.toml file used to build this environment is available on the linogaliana/python-datascientist repository
pyproject.toml
[project]
name = "python-datascientist"
version = "0.1.0"
description = "Source code for Lino Galiana's Python for data science course"
readme = "README.md"
requires-python = ">=3.13,<3.14"
dependencies = [
"altair>=6.0.0",
"black==24.8.0",
"cartiflette",
"contextily==1.6.2",
"duckdb>=0.10.1",
"folium>=0.19.6",
"gdal!=3.11.1",
"geoplot==0.5.1",
"graphviz==0.20.3",
"great-tables>=0.12.0",
"gt-extras>=0.0.8",
"ipykernel>=6.29.5",
"jupyter>=1.1.1",
"jupyter-cache==1.0.0",
"kaleido==0.2.1",
"langchain-community>=0.3.27",
"loguru==0.7.3",
"markdown>=3.8",
"nbclient==0.10.0",
"nbformat==5.10.4",
"nltk>=3.9.1",
"pip>=25.1.1",
"plotly>=6.1.2",
"plotnine>=0.15",
"polars==1.8.2",
"pyarrow>=17.0.0",
"pynsee==0.1.8",
"python-dotenv==1.0.1",
"python-frontmatter>=1.1.0",
"pywaffle==1.1.1",
"requests>=2.32.3",
"scikit-image==0.24.0",
"scipy>=1.13.0",
"selenium<4.39.0",
"spacy>=3.8.4",
"webdriver-manager==4.0.2",
"wordcloud==1.9.3",
]
[tool.uv.sources]
cartiflette = { git = "https://github.com/inseefrlab/cartiflette" }
gdal = [
{ index = "gdal-wheels", marker = "sys_platform == 'linux'" },
{ index = "geospatial_wheels", marker = "sys_platform == 'win32'" },
]
[[tool.uv.index]]
name = "geospatial_wheels"
url = "https://nathanjmcdougall.github.io/geospatial-wheels-index/"
explicit = true
[[tool.uv.index]]
name = "gdal-wheels"
url = "https://gitlab.com/api/v4/projects/61637378/packages/pypi/simple"
explicit = true
[dependency-groups]
dev = [
"nb-clean>=4.0.1",
]
To use exactly the same environment (version of Python and packages), please refer to the documentation for uv.
| SHA | Date | Author | Description |
|---|---|---|---|
| 8cfd74d9 | 2025-12-26 13:57:40 | Lino Galiana | Adding fstring/rstring exercise (#671) |
| 086116d6 | 2025-12-23 13:35:15 | lgaliana | Fix problem with script location |
| 7b8b7f9b | 2025-12-15 11:01:54 | lgaliana | Traduction anglaise |
| 8e387110 | 2025-12-15 10:20:42 | Lino Galiana | Fix WASM build in GHA (#663) |
| 02abcf02 | 2025-12-14 22:52:19 | Lino Galiana | Intro à Python: des exercices sur les structures de données (#662) |
| 81837397 | 2025-09-26 15:13:01 | lgaliana | deployment url from vscode for intro notebooks |
| 91431fa2 | 2025-06-09 17:08:00 | Lino Galiana | Improve homepage hero banner (#612) |
| dac49604 | 2024-08-29 15:07:49 | linogaliana | Change URL on edit on github button |
| f8b04136 | 2024-08-28 15:15:04 | Lino Galiana | Révision complète de la partie introductive (#549) |
Footnotes
It is possible to run
Pythonon the browser thanks to its implementation in WASM via Pyodide. This is not a recommended method when you need to perform big calculations, as in the next chapters. But since we are working on small examples here, this is quite practical to benefit from the interactivity enabled by the browser, notably the helpers in case ou are blocked.↩︎
Citation
@book{galiana2025,
author = {Galiana, Lino},
title = {Python Pour La Data Science},
date = {2025},
url = {https://pythonds.linogaliana.fr/},
doi = {10.5281/zenodo.8229676},
langid = {en}
}