A functional Python environment for data science
Ce chapitre introduit les bases de l’environnement Python pour la data science en mettant l’accent sur la modularité du langage et l’utilisation des notebooks…
A complete course about data science.
Lino Galiana
2025-12-23
Python Second year engineering course from ENSAE (Master 1).
The entire content of this course is freely available here
or on Github
and can be tested
as notebooks Jupyter
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction |
Lesson #1 or self learning
|
Read from website
|
|
| A functional Python environment for data science |
Lesson #1 or self learning
|
Read from website and exercise notebooks
|
|
| How to deal with a data set |
Lesson #1 or self learning
|
Read from website
|
|
| A few refresher exercises to get back in the saddle |
Lesson #1 or self learning
|
Read from website
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 1: data wrangling |
Lesson #1 or self learning
|
Read from website
|
|
| Numpy, the foundation of data science |
Lesson #1 or self learning
|
Read from website and exercise notebooks
|
|
| Introduction to Pandas |
Lesson #2 or self learning
|
Read from website and exercise notebooks
|
|
| Data wrangling with Pandas |
Lesson #2 or self learning
|
Read from website and exercise notebooks
|
|
| Introduction to spatial data with Geopandas |
Lesson #4 or self learning
|
Read from website and exercise notebooks
|
|
| Web scraping with Python |
Self learning
|
Read from website and exercise notebooks
|
|
| Retrieve data with APIs from Python |
Lesson #4 or self learning
|
Read from website and exercise notebooks
|
|
| Mastering regular expressions |
Self learning
|
Read from website and exercise notebooks
|
|
| New ways of accessing data: Parquet format and data in the cloud |
Self learning
|
Read from website
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 2: communicating with data |
Self learning
|
Read from website
|
|
| Building graphics with Python |
Self learning
|
Read from website and exercise notebooks
|
|
| Introduction to cartography with Python |
Self learning
|
Read from website and exercise notebooks
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 3: modelisation |
Lesson #5 or self learning
|
Read from website
|
|
| Preprocessing before building machine learning models |
Lesson #5 or self learning
|
Read from website and exercise notebooks
|
|
| Evaluating model quality |
Lesson #5 or self learning
|
Read from website and exercise notebooks
|
|
| Discovering classification with the SVM technique |
Lesson #5 or self learning
|
Read from website and exercise notebooks
|
|
| An introduction to regression |
Lesson #6 or self learning
|
Read from website and exercise notebooks
|
|
| Variable selection: an introduction |
Lesson #6 or self learning
|
Read from website and exercise notebooks
|
|
| Clustering |
Lesson #6 or self learning
|
Read from website and exercise notebooks
|
|
| Premier pas vers l'industrialisation avec les pipelines scikit |
Self learning
|
Read from website and exercise notebooks
|
|
| Mettre à disposition un modèle par le biais d'une API |
Self learning
|
Read from website
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 4: Natural Language Processing (NLP) |
Self learning
|
Read from website
|
|
| Cleaning and structuring information in textual data |
Lesson #7 or self learning
|
Read from website and exercise notebooks
|
|
| Frequentist analysis using the bag-of-words approach: forces and limitations |
Lesson #7 or self learning
|
Read from website and exercise notebooks
|
|
| Synthetizing textual information with embeddings |
Lesson #7 or self learning
|
Read from website and exercise notebooks
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Git: un outil nécessaire pour les data scientists |
Lesson #3 or self learning
|
Read from website
|
|
| Découvrir Git par la pratique: la gymnastique quotidienne |
Lesson #3 or self learning
|
Read from website
|
|
Un cadavre exquis pour découvrir le travail collaboratif Git |
Lesson #3 or self learning
|
Read from website
|
{
const orderedDivs = document.querySelectorAll(".list-chapter-ordered");
const unorderedDivs = document.querySelectorAll(".list-chapter-unordered");
const lang = version_ui; // 'fr' ou 'en'
const otherLang = lang === "fr" ? "en" : "fr";
if (ordre === "vrac") {
unorderedDivs.forEach((div, i) => (div.innerHTML = initialUnordered[i]));
orderedDivs.forEach(div => (div.innerHTML = ""));
} else {
orderedDivs.forEach((div, i) => {
// Choisir la source HTML à injecter
const useDarkFr = (quartoThemeIsDark && lang === "fr" && initialOrderedDarkFr?.[i]);
div.innerHTML = useDarkFr ? initialOrderedDarkFr[i] : initialOrdered[i];
// Supprimer le bloc de la langue non sélectionnée (si présent)
const otherLangBlock = div.querySelector(`.list-chapter-ordered-${otherLang}`);
if (otherLangBlock) otherLangBlock.remove();
// Nettoyage: si jamais un wrapper dark-fr traîne, on le supprime
const darkFrBlock = div.querySelector(".list-chapter-ordered-dark-fr");
if (darkFrBlock) darkFrBlock.remove();
});
unorderedDivs.forEach(div => (div.innerHTML = ""));
}
}{
const orderedDivs = document.querySelectorAll('.list-chapter-ordered');
const unorderedDivs = document.querySelectorAll('.list-chapter-unordered');
const lang = version_ui; // 'fr' ou 'en'
const otherLang = lang === 'fr' ? 'en' : 'fr';
if (ordre === 'vrac') {
unorderedDivs.forEach((div, i) => div.innerHTML = initialUnordered[i]);
orderedDivs.forEach(div => div.innerHTML = '');
} else {
orderedDivs.forEach((div, i) => {
// Restaurer le contenu initial
div.innerHTML = initialOrdered[i];
// Supprimer le bloc de la langue non sélectionnée
const otherLangBlock = div.querySelector(`.list-chapter-ordered-${otherLang}`);
if (otherLangBlock) otherLangBlock.remove();
});
// Cacher les unordered
unorderedDivs.forEach(div => div.innerHTML = '');
}
}viewof quartoThemeIsDark = {
const span = html`<span style="display: none;"></span>`;
function updateTheme() {
const isDark = document.body.classList.contains("quarto-dark");
span.value = isDark;
span.dispatchEvent(new CustomEvent("input"));
}
// Initial check
updateTheme();
// Observer les changements de classe sur <body>
const observer = new MutationObserver(() => updateTheme());
observer.observe(document.body, {
attributes: true,
attributeFilter: ["class"]
});
// Nettoyage si jamais nécessaire
span.remove = () => observer.disconnect();
return span;
}ToggleSwitch = ({
labels = ["Gauche", "Droite"],
values = ["left", "right"],
value = values[0],
sliderColor = "#275EFE",
textColor = "#444"
} = {}) => {
const [labelLeft, labelRight] = labels;
const [valLeft, valRight] = values;
const node = html`<div class="toggle-wrapper"
style="--slider-color: ${sliderColor}; --text-color: ${textColor}">
<span class="label left">${labelLeft}</span>
<label class="toggle-ios">
<input type="checkbox" ${value === valRight ? "checked" : ""}>
<span class="slider"></span>
</label>
<span class="label right">${labelRight}</span>
<style>
${styleOptions}
</style>
</div>`;
const input = node.querySelector("input");
const updateLabels = () => {
node.querySelector(".label.left").classList.toggle("active", !input.checked);
node.querySelector(".label.right").classList.toggle("active", input.checked);
};
Object.defineProperty(node, "value", {
get: () => (input.checked ? valRight : valLeft),
set: (v) => {
input.checked = (v === valRight);
updateLabels();
}
});
input.addEventListener("change", () => {
updateLabels();
node.dispatchEvent(new CustomEvent("input"));
});
updateLabels();
return node;
}
styleOptions = `
.toggle-wrapper {
display: flex;
align-items: center;
gap: 0.5rem;
font-family: sans-serif;
font-size: 14px;
color: var(--text-color);
}
.label {
transition: font-weight 0.2s;
}
.label.active {
font-weight: bold;
}
.toggle-ios {
position: relative;
display: inline-block;
width: 44px;
height: 24px;
}
.toggle-ios input {
opacity: 0;
width: 0;
height: 0;
}
.slider {
position: absolute;
cursor: pointer;
top: 0;
left: 0;
right: 0;
bottom: 0;
background-color: #ccc;
transition: 0.3s;
border-radius: 34px;
}
.slider:before {
position: absolute;
content: "";
height: 18px;
width: 18px;
left: 3px;
top: 3px;
background-color: white;
transition: 0.3s;
border-radius: 50%;
}
input:checked + .slider {
background-color: var(--slider-color);
}
input:checked + .slider:before {
transform: translateX(20px);
}
`{
const appendix = document.querySelector('#quarto-appendix');
if (appendix) {
const wrapper = document.createElement('div');
wrapper.className = 'content-block py-4';
wrapper.innerHTML = appendix.innerHTML;
// Optional: replace the original appendix in the DOM
appendix.replaceWith(wrapper);
wrapper;
} else {
"appendix not found";
}
}@book{galiana2025,
author = {Galiana, Lino},
title = {Python Pour La Data Science},
date = {2025},
url = {https://pythonds.linogaliana.fr/},
doi = {10.5281/zenodo.8229676},
langid = {en}
}