Python pour la data science

Lino Galiana

doi:10.5281/zenodo.8229676

1 Introduction

An essential part of the work of a data scientist is to synthesize the information contained in their datasets in order to distinguish what constitutes the signal, which they can focus on, and what constitutes the noise inherent in any dataset. In the work of a data scientist, during an exploratory phase, there is a constant back-and-forth between synthesized information and disaggregated datasets. It is therefore essential to know how to synthesize the information in a dataset before grasping its structure, which can then guide further analyses, whether for a modeling phase or data correction (anomaly detection or bad data retrieval).

We have already explored a key part of this work, namely the construction of relevant and reliable descriptive statistics. However, if we were content to present information using raw outputs from the groupby and agg combo on a Pandas DataFrame, our understanding of the data would be quite limited. The implementation of stylized tables using great tables was already a step forward in this process but, in truth, our brain processes information much more intuitively through simple graphical visualizations than through a table.

1.1 Data visualization, an essential part of communication work

As humans, our cognitive capacities are limited, and we can only grasp a limited amount of information, whereas computers are capable of processing large volumes of information. For a data scientist, this means that using our computational and statistical skills to obtain synthetic representations of our many datasets is essential to meet operational or scientific needs. The range of methods and tools that make up the toolbox of data scientists aims to simplify the understanding and subsequent exploitation of datasets whose volume exceeds our cognitive capacities.

This brings us to the question of data visualization, a set of tools and principles for representing stylized facts or contextualizing individual data in a synthetic manner. Data visualization is the art and science of visually representing complex and abstract information through visual elements. Its primary goal is to synthesize the information contained in a dataset to facilitate the understanding of its key issues for further analysis. Data visualization allows, among other things, to highlight trends, correlations, or anomalies that might be difficult or even impossible to grasp just by looking at raw data, which requires some context to make sense of it.

Data visualization plays a crucial role in the data analysis process by providing visual means to explore, interpret, and communicate information. It facilitates communication between data experts, decision-makers, and the general public, enabling the latter to benefit from the rigorous work of the former to make sense of the data without the need for deep conceptual knowledge that underpins the synthesized information.

1.2 The role of visualization in the data value creation process

Data visualization is not limited to the final phase of a project, which is the communication of results to an audience that does not have access to the data or the means to make use of it. Visualization plays a role at every stage of the data value creation process. It is, in fact, an essential part of the process of transitioning from a record, a snapshot of a phenomenon, to data— a record that has value because it carries information on its own or when combined with other records.

The daily work of a data scientist involves examining a dataset from every angle to identify key value extraction opportunities. Quickly knowing what statistics to represent, and how, is crucial for saving time during this exploratory phase. This is primarily a form of self-communication that can afford to be rough around the edges, as the goal is to sketch the work before refining certain aspects. The challenge at this stage of the process is not to overlook any dimension that could potentially bring value.

The truly time-consuming communication work comes when presenting to an audience with limited data access, unfamiliar with sources, with a limited attention span, or without quantitative skills. These audiences cannot be satisfied with raw outputs like a DataFrame in a notebook or a graph created in seconds with the plot method from Pandas. It is important to adapt to their evolving expectations, and the tools they are familiar with, which explains the growing importance of websites dedicated to data visualizations.

2 Communicating, an opening to data storytelling

Data visualization thus holds a special place among the various techniques of data science. It is involved at all stages of the data production process, from upstream (exploratory analysis) to downstream (presenting results to various audiences), and when well-constructed, it allows us to intuitively grasp the structure of the data or the key issues of its analysis.

As an art of synthesis, data visualization is also the art of storytelling, and when done well, it can even reach the level of artistic production. Data visualization is a profession in its own right, with more and more practitioners found in media outlets or specialized companies (Datawrapper, for example).

Without aiming to create visualizations as sophisticated as those produced by specialists, every data scientist should be able to quickly generate visualizations that synthesize the information in the datasets at hand. A clear and readable visualization, while remaining simple, can be more effective than a speech in conveying a message.

Just like a speech, a visualization is a form of communication in which a speaker—the person constructing the visualization— seeks to convey information to a recipient—potentially the same person as the speaker since a visualization can be created for oneself during exploratory analysis. It is no surprise that during the period when semiology played a significant role in intellectual debates, especially around the figure of Roland Barthes, the concept of graphic semiology emerged, centered around Jacques Bertin (Bertin 1967; Palsky 2017). This approach allows reflection on the relevance of the techniques used to convey a graphic message, and many visualizations, if they followed some of these rules, could be improved at little cost.

Eric Mauvière, a French statistician and a successor to Bertin’s school of graphic semiology, offers excellent content on the subject. Some of his presentations, notably the one for SSPHub, presented in the Note 2.1, should be viewed in all data science training programs as they highlight the numerous pitfalls encountered by data scientists.

An example of two visualizations made from the same dataset by Eric Mauvière, see ?@nte-mauviere — An example of two visualizations made from the same dataset by Eric Mauvière, see **?@nte-mauviere**

Note 2.1: A conference by Eric Mauvière on the subject

html`${slides_button}`

slides = "https://minio.lab.sspcloud.fr/lgaliana/ssphub/replay/20240229-dataviz-mauviere/conf_ssphub_icem7.pdf"

slides_button = html`<p class="text-center">
  <a class="btn btn-primary btn-lg cv-download" href="${slides}" target="_blank">
    <i class="fa-solid fa-file-arrow-down"></i>&ensp;Télécharger les slides
  </a>
</p>`

3 Communicating, an opening to app development

The goal of this course is to introduce the main tools and the approach that data scientists should adopt when working with various datasets. However, it is becoming increasingly common for data scientists to develop and provide interactive applications offering a range of explorations and automated data visualizations. These are more advanced topics than this course covers, but they often serve as an entry point to data science for audiences close to data scientists, such as data engineers, data analysts, or statisticians.

We will mention some of the preferred tools for doing this, especially ecosystems related to web applications and Javascript tools. This need, now fairly standard for data scientists, bridges the gap with production deployment, which is the main focus of a third-year ENSAE course designed by Romain Avouac and myself (course website ensae-reproductibilite.github.io/). This current website, for example, is built on this principle using tools that allow Python code to be reproducibly executed on standardized servers and then made available through a website.

4 The `Python` ecosystem

Returning to our course, in this section we will present some basic libraries and visualizations in Python that provide a good starting point. There are plenty of resources to deepen and advance in the art of visualization, such as this book (Wilke 2019).

4.1 Data visualization packages

The Python ecosystem for data visualization is vast and diverse. Entire books could be dedicated to it (Dale 2022). Python offers numerous libraries to quickly and relatively easily produce data visualizations¹.

The graphical libraries are mainly divided into two families:

Libraries for static representations. These are primarily intended for integration into fixed publications such as PDFs or text documents. We will mainly present Matplotlib and Seaborn, but there are others emerging, such as Plotnine, an adaptation of ggplot2 to the Python ecosystem.
Libraries for interactive representations. These are suited for web representations and allow readers to interact with the displayed graphical representation. Libraries offering these features usually rely on JavaScript, the web development ecosystem, with an entry point through Python. We will primarily discuss Plotly and Folium in this family, but many other frameworks exist in this field².

It is entirely possible to create sophisticated visualizations with an end-to-end Python workflow since it is a versatile language with a very rich ecosystem. However, Python is not a cure-all, and sometimes it can be useful to finalize a perfectly polished product with other languages, such as JavaScript for interactive visualizations or QGIS for cartographic work. This course will provide the basic tools to quickly and enjoyably produce work, but as the saying goes, the devil is in the details, so one should not insist on using Python for every task.

In the realm of visualization, this course takes the approach of exploring a few central libraries through a limited number of examples by replicating charts found on the open data website of the city of Paris. The best training for visualization remains practicing on datasets, so it is recommended to explore the richness of the open data ecosystem to experiment with visualizations.

4.2 Visualization applications

This part of the course focuses on simple synthetic representations. It does not (yet?) cover the construction of data visualization applications where a set of graphs update synchronously based on user interactions.

This indeed exceeds the scope of an introductory course, as building these applications requires mastering more complex concepts like the interaction between a web page and a server, having some knowledge of Linux, etc. The concepts necessary to understand these tools are at the heart of the third-year course “Deploying Data Science Projects” that Romain Avouac and I teach in the third year at ENSAE.

Nevertheless, since data value creation in the form of applications is very common, it is useful, at a minimum, to mention the distinction between static sites and dynamic applications to provide the right approach and point to the appropriate tools. In the world of applications, it is important to distinguish between the front (the page visible to the application’s users) and the back office (the engine that performs actions based on parameters chosen by the user on the page).

There are primarily two paradigms for making these two elements interact. The key difference between these approaches is the servers they rely on. A static site runs on a web server, whereas Streamlit relies on a standard backend server. The main difference between these two types of servers lies in their function and usage:

A web server is specifically designed to store, process, and deliver web pages (the front) to clients. This includes HTML, CSS, JavaScript files, images, etc. Web servers listen for HTTP/HTTPS requests from user browsers and respond by sending the requested data. This doesn’t preclude having complex data processing steps or reactivity by embedding JavaScript in the application, but Python processing steps are done before the application is made available. For Python users, there are several static site generators before deployment via hosting on Github Pages. The two most common ecosystems are Quarto Markdown and Django, with the former being simpler to use and maintain than the latter. This site, for example, is built using Quarto, which ensures reproducibility of the presented examples and ergonomic, customizable formatting of the results.
A standard backend server is designed to perform operations in response to a front, in this case, a web page. In the context of an application built with Python, this is a server with an appropriate Python environment to execute the code required to respond to any action taken by an application user. The code is executed on demand rather than once and for all, as in the previous approach. This paradigm allows for more application complexity but represents an additional challenge during the deployment phase. In the Python ecosystem, the two main tools for building such applications are Streamlit and Dash, with the former being quicker to implement than the latter. More recently, the dominant R equivalent ecosystem, Shiny, has been adapted for Python by Posit.

Is tkinter still used?

The ecosystems presented above for reactive applications are web frameworks. They are distinct from heavier clients like tkinter, the historical tool for building graphical user interfaces. Besides the more rudimentary aspect of tkinter interfaces compared to those of Streamlit, Dash, or Shiny, there are strong reasons to prefer the latter over tkinter.

Tkinter is a heavy client, meaning it is tied to an operating system and requires pre-installation of packages before the interface can run. While it is certainly possible to make it portable, as discussed in the production course, there are many reasons why this approach may lead to errors or unexpected bugs. Web frameworks have the advantage of simplifying this deployment process by separating the front (HTML and CSS pages) from the back (the Python code). They have naturally become more popular, even though many dated online resources still exist for developing applications with tkinter.

When it comes to building applications, the first instinct should be: “Do I need to build a reactive application, or will a static site suffice?” The latter is much easier to implement and has minimal maintenance overhead, making it a rational choice in many cases. If building a static site becomes complex, for example, due to sophisticated calculations that would be difficult to implement without JavaScript skills, you can then consider separating the front from the back by delegating the calculations to an API, for example, built using FastAPI. This can be a practical method to deploy a machine learning model, as will be discussed in the final chapter of the modeling section. If implementing an API seems too complicated or overkill for the task, then you can turn to a reactive application like Streamlit.

Again, building an application involves concepts that go beyond an introductory level in Python. However, being aware of the right practices can save significant time by avoiding pitfalls due to poor initial choices.

4.3 Summary of this section

Returning to the content of this section after this aside, it is divided into two parts, and each chapter is dual in nature, depending on whether we are focused on static or dynamic representations:

First, we will discuss standard graphical representations (histograms, bar charts, etc.) to synthesize quantitative information;
- Static representations will rely on Pandas, Matplotlib, and Seaborn
- Reactive charts will be built using Plotly
Second, we will present cartographic representations:
- Static maps created with Geopandas or plotnine
- Reactive maps using Folium (a Python adaptation of the Leaflet.js library)

4.4 Useful references

Data visualization is an art that is learned primarily through practice, especially at the beginning. However, it is not always easy to produce readable and ergonomic visualizations, so it is helpful to draw inspiration from examples by specialists (major media outlets offer excellent visualizations).

Here are some useful resources on these topics:

Datawrapper offers an excellent blog on best practices for visualization, particularly with articles by Lisa Charlotte Muth. I especially recommend this article on colors and this one on text;
The blog of Eric Mauvière;
“La Sémiologie graphique de Jacques Bertin a cinquante ans”;
The trending visualizations on Observable;
The New York Times (masters of dataviz) reviews the best visualizations of the year annually, often in the vein of data scrollytelling. For example, see the 2022 retrospective.

And a few additional references mentioned in this introduction:

Bertin, Jacques. 1967. Sémiologie Graphique. Paris: Mouton/Gauthier-Villars.

Dale, Kyran. 2022. Data Visualization with Python and JavaScript. " O’Reilly Media, Inc.".

Palsky, Gilles. 2017. “La sémiologie Graphique de Jacques Bertin a Cinquante Ans.” Visions Carto (En Ligne).

Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media.

Informations additionnelles

environment files have been tested on.

Latest built version: 2025-07-25

Python version used:

'3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0]'

Package	Version
affine	2.4.0
aiobotocore	2.22.0
aiohappyeyeballs	2.6.1
aiohttp	3.11.18
aioitertools	0.12.0
aiosignal	1.3.2
altair	5.4.1
annotated-types	0.7.0
anyio	4.9.0
appdirs	1.4.4
argon2-cffi	25.1.0
argon2-cffi-bindings	21.2.0
arrow	1.3.0
asttokens	3.0.0
async-lru	2.0.5
attrs	25.3.0
babel	2.17.0
beautifulsoup4	4.13.4
black	24.8.0
bleach	6.2.0
blis	1.3.0
boto3	1.37.3
botocore	1.37.3
branca	0.8.1
Brotli	1.1.0
bs4	0.0.2
cartiflette	0.0.3
Cartopy	0.24.1
catalogue	2.0.10
cattrs	24.1.3
certifi	2025.7.14
cffi	1.17.1
charset-normalizer	3.4.2
chromedriver-autoinstaller	0.6.4
click	8.2.1
click-plugins	1.1.1
cligj	0.7.2
cloudpathlib	0.21.1
comm	0.2.2
commonmark	0.9.1
confection	0.1.5
contextily	1.6.2
contourpy	1.3.2
cycler	0.12.1
cymem	2.0.11
dataclasses-json	0.6.7
debugpy	1.8.14
decorator	5.2.1
defusedxml	0.7.1
diskcache	5.6.3
duckdb	1.3.0
en_core_web_sm	3.8.0
et_xmlfile	2.0.0
executing	2.2.0
fastexcel	0.14.0
fastjsonschema	2.21.1
fiona	1.10.1
folium	0.19.6
fontawesomefree	6.6.0
fonttools	4.58.0
fqdn	1.5.1
fr_core_news_sm	3.8.0
frozenlist	1.6.0
fsspec	2025.5.0
geographiclib	2.0
geopandas	1.0.1
geoplot	0.5.1
geopy	2.4.1
graphviz	0.20.3
great-tables	0.12.0
greenlet	3.2.2
h11	0.16.0
htmltools	0.6.0
httpcore	1.0.9
httpx	0.28.1
httpx-sse	0.4.0
idna	3.10
imageio	2.37.0
importlib_metadata	8.7.0
importlib_resources	6.5.2
inflate64	1.0.1
ipykernel	6.29.5
ipython	9.3.0
ipython_pygments_lexers	1.1.1
ipywidgets	8.1.7
isoduration	20.11.0
jedi	0.19.2
Jinja2	3.1.6
jmespath	1.0.1
joblib	1.5.1
json5	0.12.0
jsonpatch	1.33
jsonpointer	3.0.0
jsonschema	4.23.0
jsonschema-specifications	2025.4.1
jupyter	1.1.1
jupyter-cache	1.0.0
jupyter_client	8.6.3
jupyter-console	6.6.3
jupyter_core	5.7.2
jupyter-events	0.12.0
jupyter-lsp	2.2.5
jupyter_server	2.16.0
jupyter_server_terminals	0.5.3
jupyterlab	4.4.3
jupyterlab_pygments	0.3.0
jupyterlab_server	2.27.3
jupyterlab_widgets	3.0.15
kaleido	0.2.1
kiwisolver	1.4.8
langchain	0.3.25
langchain-community	0.3.9
langchain-core	0.3.61
langchain-text-splitters	0.3.8
langcodes	3.5.0
langsmith	0.1.147
language_data	1.3.0
lazy_loader	0.4
loguru	0.7.3
lxml	5.4.0
mapclassify	2.8.1
marisa-trie	1.2.1
Markdown	3.8
markdown-it-py	3.0.0
MarkupSafe	3.0.2
marshmallow	3.26.1
matplotlib	3.10.3
matplotlib-inline	0.1.7
mdurl	0.1.2
mercantile	1.2.1
mistune	3.1.3
mizani	0.11.4
multidict	6.4.4
multivolumefile	0.2.3
murmurhash	1.0.13
mypy_extensions	1.1.0
narwhals	1.40.0
nbclient	0.10.0
nbconvert	7.16.6
nbformat	5.10.4
nest-asyncio	1.6.0
networkx	3.4.2
nltk	3.9.1
notebook	7.4.3
notebook_shim	0.2.4
numpy	2.2.6
openpyxl	3.1.5
orjson	3.10.18
outcome	1.3.0.post0
overrides	7.7.0
OWSLib	0.33.0
packaging	24.2
pandas	2.2.3
pandocfilters	1.5.1
parso	0.8.4
pathspec	0.12.1
patsy	1.0.1
Pebble	5.1.1
pexpect	4.9.0
pillow	11.2.1
pip	25.1.1
platformdirs	4.3.8
plotly	6.1.2
plotnine	0.13.6
polars	1.8.2
preshed	3.0.9
prometheus_client	0.22.1
prompt_toolkit	3.0.51
propcache	0.3.1
psutil	7.0.0
ptyprocess	0.7.0
pure_eval	0.2.3
py7zr	0.22.0
pyarrow	17.0.0
pybcj	1.0.6
pycparser	2.22
pycryptodomex	3.23.0
pydantic	2.11.5
pydantic_core	2.33.2
pydantic-settings	2.9.1
Pygments	2.19.1
pynsee	0.1.8
pyogrio	0.11.0
pyparsing	3.2.3
pyppmd	1.1.1
pyproj	3.7.1
pyshp	2.3.1
PySocks	1.7.1
python-dateutil	2.9.0.post0
python-dotenv	1.0.1
python-json-logger	3.3.0
python-magic	0.4.27
pytz	2025.2
pywaffle	1.1.1
PyYAML	6.0.2
pyzmq	26.4.0
pyzstd	0.17.0
rasterio	1.4.3
referencing	0.36.2
regex	2024.11.6
requests	2.32.3
requests-cache	1.2.1
requests-toolbelt	1.0.0
retrying	1.3.4
rfc3339-validator	0.1.4
rfc3986-validator	0.1.1
rich	14.0.0
rpds-py	0.25.1
rtree	1.4.0
s3fs	2025.5.0
s3transfer	0.11.3
scikit-image	0.24.0
scikit-learn	1.6.1
scipy	1.13.0
seaborn	0.13.2
selenium	4.34.2
Send2Trash	1.8.3
setuptools	80.8.0
shapely	2.1.1
shellingham	1.5.4
six	1.17.0
smart-open	7.1.0
sniffio	1.3.1
sortedcontainers	2.4.0
soupsieve	2.7
spacy	3.8.4
spacy-legacy	3.0.12
spacy-loggers	1.0.5
SQLAlchemy	2.0.41
srsly	2.5.1
stack-data	0.6.3
statsmodels	0.14.4
tabulate	0.9.0
tenacity	9.1.2
terminado	0.18.1
texttable	1.7.0
thinc	8.3.6
threadpoolctl	3.6.0
tifffile	2025.5.24
tinycss2	1.4.0
topojson	1.9
tornado	6.5.1
tqdm	4.67.1
traitlets	5.14.3
trio	0.30.0
trio-websocket	0.12.2
typer	0.15.3
types-python-dateutil	2.9.0.20250516
typing_extensions	4.14.1
typing-inspect	0.9.0
typing-inspection	0.4.1
tzdata	2025.2
Unidecode	1.4.0
uri-template	1.3.0
url-normalize	2.2.1
urllib3	2.5.0
wasabi	1.1.3
wcwidth	0.2.13
weasel	0.4.1
webcolors	24.11.1
webdriver-manager	4.0.2
webencodings	0.5.1
websocket-client	1.8.0
widgetsnbextension	4.0.14
wordcloud	1.9.3
wrapt	1.17.2
wsproto	1.2.0
xlrd	2.0.1
xyzservices	2025.4.0
yarl	1.20.0
yellowbrick	1.5
zipp	3.21.0

View file history

md`Ce fichier a été modifié __${table_commit.length}__ fois depuis sa création le ${creation_string} (dernière modification le ${last_modification_string})`

creation = d3.min(
  table_commit.map(d => new Date(d.Date))
)

last_modification = d3.max(
  table_commit.map(d => new Date(d.Date))
)

creation_string = creation.toLocaleString("fr", {
  "day": "numeric",
  "month": "long",
  "year": "numeric"
})

last_modification_string = last_modification.toLocaleString("fr", {
  "day": "numeric",
  "month": "long",
  "year": "numeric"
})

html`<div>${git_history_table}</div>`

html`<div>${git_history_plot}</div>`

SHA	Date	Author	Description
91431fa2	2025-06-09 17:08:00	Lino Galiana	Improve homepage hero banner (#612)
cbe6459f	2024-11-12 07:24:15	lgaliana	Revoir quelques abstracts
593106f1	2024-09-21 14:39:32	lgaliana	Abstract également
1ebb5eed	2024-09-21 14:23:43	lgaliana	Translate the introduction to visualisation
72d44dd6	2024-09-21 12:50:38	lgaliana	Force build for pandas chapters
d02515b4	2024-04-27 21:32:25	Lino Galiana	Eléments sur les applis & évaluation (#495)
005d89b8	2023-12-20 17:23:04	Lino Galiana	Finalise l’affichage des statistiques Git (#478)
1f23de28	2023-12-01 17:25:36	Lino Galiana	Stockage des images sur S3 (#466)
09654c71	2023-11-14 15:16:44	Antoine Palazzolo	Suggestions Git & Visualisation (#449)
80823022	2023-08-25 17:48:36	Lino Galiana	Mise à jour des scripts de construction des notebooks (#395)
3bdf3b06	2023-08-25 11:23:02	Lino Galiana	Simplification de la structure 🤓 (#393)
5d4874a8	2023-08-11 15:09:33	Lino Galiana	Pimp les introductions des trois premières parties (#387)
2dc82e7b	2022-10-18 22:46:47	Lino Galiana	Relec Kim (visualisation + API) (#302)
8e5edba6	2022-09-02 11:59:57	Lino Galiana	Ajoute un chapitre dask (#264)
a4e24263	2022-06-16 19:34:18	Lino Galiana	Improve style (#238)
12965bac	2022-05-25 15:53:27	Lino Galiana	:launch: Bascule vers quarto (#226)
66a52761	2021-11-23 16:13:20	Lino Galiana	Relecture partie visualisation (#181)
4cdb759c	2021-05-12 10:37:23	Lino Galiana	:sparkles: :star2: Nouveau thème hugo :snake: :fire: (#105)
0a0d0348	2021-03-26 20:16:22	Lino Galiana	Ajout d’une section sur S3 (#97)
5ac3cbee	2020-09-28 18:59:24	Lino Galiana	Continue la partie graphiques (#54)
8ed01f45	2020-09-24 21:27:29	Lino Galiana	Ajout d’une partie visualisation

git_history_table = Inputs.table(
  table_commit,
  {
    format: {
      SHA: x => md`[${x}](${github_repo}/commit/${x})`,
      Description: x => md`${replacePullRequestPattern(x, github_repo)}`,
      /*Date: x => x.toLocaleString("fr", {
        "month": "numeric",
        "day": "numeric",
        "year": "numeric"
        })
      */
    }
  }
)

git_history_plot = Plot.plot({
  marks: [
    Plot.ruleY([0], {stroke: "royalblue"}),
    Plot.dot(
          table_commit,
          Plot.pointerX({x: (d) => new Date(d.date), y: 0, stroke: "red"})),
    Plot.dot(table_commit, {x: (d) => new Date(d.Date), y: 0, fill: "royalblue"})
  ]
})

function replacePullRequestPattern(inputString, githubRepo) {
    // Use a regular expression to match the pattern #digit
    var pattern = /#(\d+)/g;

    // Replace the pattern with ${github_repo}/pull/#digit
    var replacedString = inputString.replace(pattern, '[#$1](' + githubRepo + '/pull/$1)');

    return replacedString;
}

github_repo = "https://github.com/linogaliana/python-datascientist"

table_commit = {

// Get the HTML table by its class name
var table = document.querySelector('.commit-table');

// Check if the table exists
if (table) {
    // Initialize an array to store the table data
    var dataArray = [];

    // Extract headers from the first row
    var headers = [];
    for (var i = 0; i < table.rows[0].cells.length; i++) {
        headers.push(table.rows[0].cells[i].textContent.trim());
    }

    // Iterate through the rows, starting from the second row
    for (var i = 1; i < table.rows.length; i++) {
        var row = table.rows[i];
        var rowData = {};

        // Iterate through the cells in the row
        for (var j = 0; j < row.cells.length; j++) {
            // Use headers as keys and cell content as values
            rowData[headers[j]] = row.cells[j].textContent.trim();
        }

        // Push the rowData object to the dataArray
        dataArray.push(rowData);
    }
  }

  return dataArray

}

// Get the element with class 'git-details'
{
  var gitDetails = document.querySelector('.commit-table');

  // Check if the element exists
  if (gitDetails) {
      // Hide the element
      gitDetails.style.display = 'none';
  }
}

Plot = require('@observablehq/plot@0.6.12/dist/plot.umd.min.js')

Back to top

Footnotes

To be honest, for a long time, Python was a bit less enjoyable in this regard compared to R, which benefits from the indispensable library ggplot2.

Not built on the grammar of graphics, the main graphical library in Python, Matplotlib, is more cumbersome to use than ggplot2.

seaborn, which we will present, simplifies graphical representation somewhat, but again, it is difficult to find something more flexible and universal than ggplot2.

The library plotnine aims to provide a similar implementation to ggplot for Python users. Its development is worth following.↩︎
In this regard, I highly recommend keeping up with data visualization news on the platform Observable, which tends to bring together the communities of dataviz specialists and data analysts. The library Plot could become a new standard in the coming years, a sort of intermediate between ggplot and d3.↩︎

Citation

BibTeX citation:

@book{galiana2023,
  author = {Galiana, Lino},
  title = {Python Pour La Data Science},
  date = {2023},
  url = {https://pythonds.linogaliana.fr/},
  doi = {10.5281/zenodo.8229676},
  langid = {en}
}

For attribution, please cite this work as:

Galiana, Lino. 2023. Python Pour La Data Science. https://doi.org/10.5281/zenodo.8229676.

1 Introduction

1.1 Data visualization, an essential part of communication work

1.2 The role of visualization in the data value creation process

2 Communicating, an opening to data storytelling

3 Communicating, an opening to app development

4 The Python ecosystem

4.1 Data visualization packages

4.2 Visualization applications

4.3 Summary of this section

4.4 Useful references

Informations additionnelles

Footnotes

Citation

4 The `Python` ecosystem