Avoir un environnement Python fonctionnel pour la data science
Ce chapitre introduit les bases de l’environnement Python pour la data science en mettant l’accent sur la modularité du langage et l’utilisation des notebooks…
A complete course about data science.
Lino Galiana
2025-12-23
Python Cours de deuxième année du cursus d’ingénieur de l’ENSAE (Master 1).
L’ensemble du contenu de ce cours est librement disponible ici
ou sur Github
et peut être testé
sous forme de notebooks Jupyter
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction |
Lesson #1 or self learning
|
Read from website
|
|
| A functional Python environment for data science |
Lesson #1 or self learning
|
Read from website and exercise notebooks
|
|
| How to deal with a data set |
Lesson #1 or self learning
|
Read from website
|
|
| A few refresher exercises to get back in the saddle |
Lesson #1 or self learning
|
Read from website
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 1: data wrangling |
Lesson #1 or self learning
|
Read from website
|
|
| Numpy, the foundation of data science |
Lesson #1 or self learning
|
Read from website and exercise notebooks
|
|
| Introduction to Pandas |
Lesson #2 or self learning
|
Read from website and exercise notebooks
|
|
| Data wrangling with Pandas |
Lesson #2 or self learning
|
Read from website and exercise notebooks
|
|
| Introduction to spatial data with Geopandas |
Lesson #4 or self learning
|
Read from website and exercise notebooks
|
|
| Web scraping with Python |
Self learning
|
Read from website and exercise notebooks
|
|
| Retrieve data with APIs from Python |
Lesson #4 or self learning
|
Read from website and exercise notebooks
|
|
| Mastering regular expressions |
Self learning
|
Read from website and exercise notebooks
|
|
| New ways of accessing data: Parquet format and data in the cloud |
Self learning
|
Read from website
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 2: communicating with data |
Self learning
|
Read from website
|
|
| Building graphics with Python |
Self learning
|
Read from website and exercise notebooks
|
|
| Introduction to cartography with Python |
Self learning
|
Read from website and exercise notebooks
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 3: modelisation |
Lesson #5 or self learning
|
Read from website
|
|
| Preprocessing before building machine learning models |
Lesson #5 or self learning
|
Read from website and exercise notebooks
|
|
| Evaluating model quality |
Lesson #5 or self learning
|
Read from website and exercise notebooks
|
|
| Discovering classification with the SVM technique |
Lesson #5 or self learning
|
Read from website and exercise notebooks
|
|
| An introduction to regression |
Lesson #6 or self learning
|
Read from website and exercise notebooks
|
|
| Variable selection: an introduction |
Lesson #6 or self learning
|
Read from website and exercise notebooks
|
|
| Clustering |
Lesson #6 or self learning
|
Read from website and exercise notebooks
|
|
| Premier pas vers l'industrialisation avec les pipelines scikit |
Self learning
|
Read from website and exercise notebooks
|
|
| Mettre à disposition un modèle par le biais d'une API |
Self learning
|
Read from website
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Introduction to part 4: Natural Language Processing (NLP) |
Self learning
|
Read from website
|
|
| Cleaning and structuring information in textual data |
Lesson #7 or self learning
|
Read from website and exercise notebooks
|
|
| Frequentist analysis using the bag-of-words approach: forces and limitations |
Lesson #7 or self learning
|
Read from website and exercise notebooks
|
|
| Synthetizing textual information with embeddings |
Lesson #7 or self learning
|
Read from website and exercise notebooks
|
| Title | Available in | Learning mode | Resource type |
|---|---|---|---|
| Git: un outil nécessaire pour les data scientists |
Lesson #3 or self learning
|
Read from website
|
|
| Découvrir Git par la pratique: la gymnastique quotidienne |
Lesson #3 or self learning
|
Read from website
|
|
Un cadavre exquis pour découvrir le travail collaboratif Git |
Lesson #3 or self learning
|
Read from website
|
{
const orderedDivs = document.querySelectorAll(".list-chapter-ordered");
const unorderedDivs = document.querySelectorAll(".list-chapter-unordered");
const lang = version_ui; // 'fr' ou 'en'
const otherLang = lang === "fr" ? "en" : "fr";
if (ordre === "vrac") {
unorderedDivs.forEach((div, i) => (div.innerHTML = initialUnordered[i]));
orderedDivs.forEach(div => (div.innerHTML = ""));
} else {
orderedDivs.forEach((div, i) => {
// Choisir la source HTML à injecter
const useDarkFr = (quartoThemeIsDark && lang === "fr" && initialOrderedDarkFr?.[i]);
div.innerHTML = useDarkFr ? initialOrderedDarkFr[i] : initialOrdered[i];
// Supprimer le bloc de la langue non sélectionnée (si présent)
const otherLangBlock = div.querySelector(`.list-chapter-ordered-${otherLang}`);
if (otherLangBlock) otherLangBlock.remove();
// Nettoyage: si jamais un wrapper dark-fr traîne, on le supprime
const darkFrBlock = div.querySelector(".list-chapter-ordered-dark-fr");
if (darkFrBlock) darkFrBlock.remove();
});
unorderedDivs.forEach(div => (div.innerHTML = ""));
}
}{
const orderedDivs = document.querySelectorAll('.list-chapter-ordered');
const unorderedDivs = document.querySelectorAll('.list-chapter-unordered');
const lang = version_ui; // 'fr' ou 'en'
const otherLang = lang === 'fr' ? 'en' : 'fr';
if (ordre === 'vrac') {
unorderedDivs.forEach((div, i) => div.innerHTML = initialUnordered[i]);
orderedDivs.forEach(div => div.innerHTML = '');
} else {
orderedDivs.forEach((div, i) => {
// Restaurer le contenu initial
div.innerHTML = initialOrdered[i];
// Supprimer le bloc de la langue non sélectionnée
const otherLangBlock = div.querySelector(`.list-chapter-ordered-${otherLang}`);
if (otherLangBlock) otherLangBlock.remove();
});
// Cacher les unordered
unorderedDivs.forEach(div => div.innerHTML = '');
}
}viewof quartoThemeIsDark = {
const span = html`<span style="display: none;"></span>`;
function updateTheme() {
const isDark = document.body.classList.contains("quarto-dark");
span.value = isDark;
span.dispatchEvent(new CustomEvent("input"));
}
// Initial check
updateTheme();
// Observer les changements de classe sur <body>
const observer = new MutationObserver(() => updateTheme());
observer.observe(document.body, {
attributes: true,
attributeFilter: ["class"]
});
// Nettoyage si jamais nécessaire
span.remove = () => observer.disconnect();
return span;
}ToggleSwitch = ({
labels = ["Gauche", "Droite"],
values = ["left", "right"],
value = values[0],
sliderColor = "#275EFE",
textColor = "#444"
} = {}) => {
const [labelLeft, labelRight] = labels;
const [valLeft, valRight] = values;
const node = html`<div class="toggle-wrapper"
style="--slider-color: ${sliderColor}; --text-color: ${textColor}">
<span class="label left">${labelLeft}</span>
<label class="toggle-ios">
<input type="checkbox" ${value === valRight ? "checked" : ""}>
<span class="slider"></span>
</label>
<span class="label right">${labelRight}</span>
<style>
${styleOptions}
</style>
</div>`;
const input = node.querySelector("input");
const updateLabels = () => {
node.querySelector(".label.left").classList.toggle("active", !input.checked);
node.querySelector(".label.right").classList.toggle("active", input.checked);
};
Object.defineProperty(node, "value", {
get: () => (input.checked ? valRight : valLeft),
set: (v) => {
input.checked = (v === valRight);
updateLabels();
}
});
input.addEventListener("change", () => {
updateLabels();
node.dispatchEvent(new CustomEvent("input"));
});
updateLabels();
return node;
}
styleOptions = `
.toggle-wrapper {
display: flex;
align-items: center;
gap: 0.5rem;
font-family: sans-serif;
font-size: 14px;
color: var(--text-color);
}
.label {
transition: font-weight 0.2s;
}
.label.active {
font-weight: bold;
}
.toggle-ios {
position: relative;
display: inline-block;
width: 44px;
height: 24px;
}
.toggle-ios input {
opacity: 0;
width: 0;
height: 0;
}
.slider {
position: absolute;
cursor: pointer;
top: 0;
left: 0;
right: 0;
bottom: 0;
background-color: #ccc;
transition: 0.3s;
border-radius: 34px;
}
.slider:before {
position: absolute;
content: "";
height: 18px;
width: 18px;
left: 3px;
top: 3px;
background-color: white;
transition: 0.3s;
border-radius: 50%;
}
input:checked + .slider {
background-color: var(--slider-color);
}
input:checked + .slider:before {
transform: translateX(20px);
}
`{
const appendix = document.querySelector('#quarto-appendix');
if (appendix) {
const wrapper = document.createElement('div');
wrapper.className = 'content-block py-4';
wrapper.innerHTML = appendix.innerHTML;
// Optional: replace the original appendix in the DOM
appendix.replaceWith(wrapper);
wrapper;
} else {
"appendix not found";
}
}@book{galiana2025,
author = {Galiana, Lino},
title = {Python pour la data science},
date = {2025},
url = {https://pythonds.linogaliana.fr/},
doi = {10.5281/zenodo.8229676},
langid = {fr}
}