1 Introduction
An essential part of the work of a data scientist is to synthesize the information contained in their datasets in order to distinguish what constitutes the signal, which they can focus on, and what constitutes the noise inherent in any dataset. In the work of a data scientist, during an exploratory phase, there is a constant back-and-forth between synthesized information and disaggregated datasets. It is therefore essential to know how to synthesize the information in a dataset before grasping its structure, which can then guide further analyses, whether for a modeling phase or data correction (anomaly detection or bad data retrieval).
We have already explored a key part of this work,
namely the construction of relevant
and reliable descriptive statistics. However, if we were content
to present information using raw outputs from the groupby
and agg
combo on a Pandas
DataFrame, our understanding of the data would be quite
limited. The implementation of stylized tables using
great tables
was already a step forward in this process but, in truth,
our brain processes information much more intuitively
through simple graphical visualizations than through a table.
1.1 Data visualization, an essential part of communication work
As humans, our cognitive capacities are limited, and we can only grasp a limited amount of information, whereas computers are capable of processing large volumes of information. For a data scientist, this means that using our computational and statistical skills to obtain synthetic representations of our many datasets is essential to meet operational or scientific needs. The range of methods and tools that make up the toolbox of data scientists aims to simplify the understanding and subsequent exploitation of datasets whose volume exceeds our cognitive capacities.
This brings us to the question of data visualization, a set of tools and principles for representing stylized facts or contextualizing individual data in a synthetic manner. Data visualization is the art and science of visually representing complex and abstract information through visual elements. Its primary goal is to synthesize the information contained in a dataset to facilitate the understanding of its key issues for further analysis. Data visualization allows, among other things, to highlight trends, correlations, or anomalies that might be difficult or even impossible to grasp just by looking at raw data, which requires some context to make sense of it.
Data visualization plays a crucial role in the data analysis process by providing visual means to explore, interpret, and communicate information. It facilitates communication between data experts, decision-makers, and the general public, enabling the latter to benefit from the rigorous work of the former to make sense of the data without the need for deep conceptual knowledge that underpins the synthesized information.
1.2 The role of visualization in the data value creation process
Data visualization is not limited to the final phase of a project, which is the communication of results to an audience that does not have access to the data or the means to make use of it. Visualization plays a role at every stage of the data value creation process. It is, in fact, an essential part of the process of transitioning from a record, a snapshot of a phenomenon, to data— a record that has value because it carries information on its own or when combined with other records.
The daily work of a data scientist involves examining a dataset from every angle to identify key value extraction opportunities. Quickly knowing what statistics to represent, and how, is crucial for saving time during this exploratory phase. This is primarily a form of self-communication that can afford to be rough around the edges, as the goal is to sketch the work before refining certain aspects. The challenge at this stage of the process is not to overlook any dimension that could potentially bring value.
The truly time-consuming communication work comes
when presenting to an audience with limited data access,
unfamiliar with sources,
with a limited attention span,
or without quantitative skills. These
audiences cannot be satisfied with raw outputs like
a DataFrame in a notebook or a graph created
in seconds with the plot
method from Pandas
.
It is important to adapt to their evolving expectations,
and the tools they are familiar with, which explains the growing importance of
websites dedicated to data visualizations.
2 Communicating, an opening to data storytelling
Data visualization thus holds a special place among the various techniques of data science. It is involved at all stages of the data production process, from upstream (exploratory analysis) to downstream (presenting results to various audiences), and when well-constructed, it allows us to intuitively grasp the structure of the data or the key issues of its analysis.
As an art of synthesis, data visualization
is also the art of storytelling, and
when done well, it can even reach the level of artistic production.
Data visualization is a profession in its own right, with more and more practitioners found in media outlets
or specialized companies (Datawrapper
, for example).
Without aiming to create visualizations as sophisticated as those produced by specialists, every data scientist should be able to quickly generate visualizations that synthesize the information in the datasets at hand. A clear and readable visualization, while remaining simple, can be more effective than a speech in conveying a message.
Just like a speech, a visualization is a form of communication in which a speaker—the person constructing the visualization— seeks to convey information to a recipient—potentially the same person as the speaker since a visualization can be created for oneself during exploratory analysis. It is no surprise that during the period when semiology played a significant role in intellectual debates, especially around the figure of Roland Barthes, the concept of graphic semiology emerged, centered around Jacques Bertin (Bertin 1967; Palsky 2017). This approach allows reflection on the relevance of the techniques used to convey a graphic message, and many visualizations, if they followed some of these rules, could be improved at little cost.
Eric Mauvière, a French statistician and a successor
to Bertin’s school of graphic semiology,
offers excellent content on the subject. Some
of his presentations, notably the one for SSPHub
,
presented in the Note 2.1,
should be viewed in all data science training programs as they
highlight the numerous pitfalls encountered by data scientists.
3 Communicating, an opening to app development
The goal of this course is to introduce the main tools and the approach that data scientists should adopt when working with various datasets. However, it is becoming increasingly common for data scientists to develop and provide interactive applications offering a range of explorations and automated data visualizations. These are more advanced topics than this course covers, but they often serve as an entry point to data science for audiences close to data scientists, such as data engineers, data analysts, or statisticians.
We will mention some of the preferred tools for doing this,
especially ecosystems related to web applications
and Javascript
tools. This need, now fairly standard
for data scientists, bridges the gap with production deployment,
which is the main focus of a third-year ENSAE course
designed by Romain Avouac and myself (course website ensae-reproductibilite.github.io/). This current website, for example, is built
on this principle using tools that allow Python
code to be reproducibly executed
on standardized servers and then made available through a website.
4 The Python
ecosystem
Returning to our course,
in this section we will present some basic libraries
and visualizations in Python
that provide
a good starting point. There are plenty of resources
to deepen and advance in the art of visualization,
such as this book (Wilke 2019).
4.1 Data visualization packages
The Python
ecosystem for data visualization is vast and
diverse.
Entire books could be dedicated to it (Dale 2022).
Python
offers
numerous libraries to quickly and relatively
easily produce data visualizations1.
The graphical libraries are mainly divided into two families:
- Libraries for static representations. These are primarily intended for integration
into fixed publications such as PDFs or text documents. We will mainly present
Matplotlib
andSeaborn
, but there are others emerging, such asPlotnine
, an adaptation ofggplot2
to thePython
ecosystem. - Libraries for interactive representations. These are suited for web representations
and allow readers to interact with the displayed graphical representation.
Libraries offering these features usually rely on
JavaScript
, the web development ecosystem, with an entry point throughPython
. We will primarily discussPlotly
andFolium
in this family, but many other frameworks exist in this field2.
It is entirely possible
to create sophisticated visualizations with an end-to-end Python
workflow since it is a versatile
language with a very
rich ecosystem. However, Python
is not a cure-all, and sometimes
it can be useful to finalize a perfectly polished product with other languages, such as JavaScript
for interactive visualizations or QGIS
for
cartographic work. This course will provide the basic tools
to quickly and enjoyably produce work, but as the saying goes, the devil is in the details, so one should not
insist on using Python
for every task.
In the realm of visualization, this course takes the approach of exploring a few central libraries through a limited number of examples by replicating charts found on the open data website of the city of Paris. The best training for visualization remains practicing on datasets, so it is recommended to explore the richness of the open data ecosystem to experiment with visualizations.
4.2 Visualization applications
This part of the course focuses on simple synthetic representations. It does not (yet?) cover the construction of data visualization applications where a set of graphs update synchronously based on user interactions.
This indeed exceeds the scope of an introductory course, as building
these applications
requires mastering more complex concepts like the interaction between a
web page and a server, having some knowledge of Linux
, etc.
The concepts necessary to understand these tools are at the heart
of the third-year course “Deploying Data Science Projects”
that Romain Avouac and I teach in the third year at ENSAE.
Nevertheless, since data value creation in the form of applications is very common, it is useful, at a minimum, to mention the distinction between static sites and dynamic applications to provide the right approach and point to the appropriate tools. In the world of applications, it is important to distinguish between the front (the page visible to the application’s users) and the back office (the engine that performs actions based on parameters chosen by the user on the page).
There are primarily two paradigms for making
these two elements interact. The key difference between these approaches is the servers they rely on. A static site runs on a web server, whereas Streamlit
relies on a standard backend server. The main difference between these two types of servers lies in their function and usage:
- A web server is specifically designed to store, process, and deliver web pages (the front) to clients. This includes HTML, CSS, JavaScript files, images, etc. Web servers listen for HTTP/HTTPS requests from user browsers and respond by sending the requested data. This doesn’t preclude having complex data processing steps or reactivity by embedding
JavaScript
in the application, butPython
processing steps are done before the application is made available. ForPython
users, there are several static site generators before deployment via hosting onGithub Pages
. The two most common ecosystems areQuarto Markdown
andDjango
, with the former being simpler to use and maintain than the latter. This site, for example, is built usingQuarto
, which ensures reproducibility of the presented examples and ergonomic, customizable formatting of the results. - A standard backend server is designed to perform operations in response to a front, in this case, a web page. In the context of an application built with
Python
, this is a server with an appropriatePython
environment to execute the code required to respond to any action taken by an application user. The code is executed on demand rather than once and for all, as in the previous approach. This paradigm allows for more application complexity but represents an additional challenge during the deployment phase. In thePython
ecosystem, the two main tools for building such applications areStreamlit
andDash
, with the former being quicker to implement than the latter. More recently, the dominantR
equivalent ecosystem,Shiny
, has been adapted forPython
byPosit
.
tkinter
still used?
The ecosystems presented above for reactive applications are web frameworks. They are distinct from heavier clients like tkinter
,
the historical tool for building graphical user interfaces. Besides the more rudimentary aspect of
tkinter
interfaces compared to those of Streamlit
, Dash
, or Shiny
, there are
strong reasons to prefer the latter over tkinter
.
Tkinter
is a heavy client, meaning it is tied to an operating system
and requires pre-installation of packages before the interface can run.
While it is certainly possible to make it portable, as discussed in the
production course,
there are many reasons why this approach may lead to errors
or unexpected bugs. Web frameworks have the advantage of simplifying
this deployment process by separating the front (HTML and CSS pages) from the back (the
Python
code). They have naturally become more popular, even though many
dated online resources still exist for developing applications with tkinter
.
When it comes to building applications, the first instinct should be: “Do I need to build a reactive application, or will a static site suffice?” The latter is much easier to implement and has minimal maintenance overhead, making it a rational choice in many cases. If building a static site becomes complex, for example, due to sophisticated calculations that would be difficult to implement without JavaScript
skills, you can then consider separating the front from the back by delegating the calculations to an API, for example, built using FastAPI
. This can be a practical method to deploy a machine learning model, as will be discussed in the final chapter of the modeling section. If implementing an API seems too complicated or overkill for the task, then you can turn to a reactive application like Streamlit
.
Again, building an application involves concepts that go beyond an introductory level in Python
. However, being aware of the right practices can save significant time by avoiding pitfalls due to poor initial choices.
4.3 Summary of this section
Returning to the content of this section after this aside, it is divided into two parts, and each chapter is dual in nature, depending on whether we are focused on static or dynamic representations:
- First, we will discuss
standard graphical representations (histograms, bar charts, etc.) to synthesize quantitative information;
- Static representations will rely on
Pandas
,Matplotlib
, andSeaborn
- Reactive charts will be built using
Plotly
- Static representations will rely on
- Second, we will present cartographic representations:
- Static maps created with
Geopandas
orplotnine
- Reactive maps using
Folium
(aPython
adaptation of theLeaflet.js
library)
- Static maps created with
4.4 Useful references
Data visualization is an art that is learned primarily through practice, especially at the beginning. However, it is not always easy to produce readable and ergonomic visualizations, so it is helpful to draw inspiration from examples by specialists (major media outlets offer excellent visualizations).
Here are some useful resources on these topics:
Datawrapper
offers an excellent blog on best practices for visualization, particularly with articles by Lisa Charlotte Muth. I especially recommend this article on colors and this one on text;- The blog of Eric Mauvière;
- “La Sémiologie graphique de Jacques Bertin a cinquante ans”;
- The trending visualizations on
Observable
; - The New York Times (masters of dataviz) reviews the best visualizations of the year annually, often in the vein of data scrollytelling. For example, see the 2022 retrospective.
And a few additional references mentioned in this introduction:
Informations additionnelles
environment files have been tested on.
Latest built version: 2025-03-19
Python version used:
'3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0]'
Package | Version |
---|---|
affine | 2.4.0 |
aiobotocore | 2.21.1 |
aiohappyeyeballs | 2.6.1 |
aiohttp | 3.11.13 |
aioitertools | 0.12.0 |
aiosignal | 1.3.2 |
alembic | 1.13.3 |
altair | 5.4.1 |
aniso8601 | 9.0.1 |
annotated-types | 0.7.0 |
anyio | 4.8.0 |
appdirs | 1.4.4 |
archspec | 0.2.3 |
asttokens | 2.4.1 |
attrs | 25.3.0 |
babel | 2.17.0 |
bcrypt | 4.2.0 |
beautifulsoup4 | 4.12.3 |
black | 24.8.0 |
blinker | 1.8.2 |
blis | 1.2.0 |
bokeh | 3.5.2 |
boltons | 24.0.0 |
boto3 | 1.37.1 |
botocore | 1.37.1 |
branca | 0.7.2 |
Brotli | 1.1.0 |
bs4 | 0.0.2 |
cachetools | 5.5.0 |
cartiflette | 0.0.2 |
Cartopy | 0.24.1 |
catalogue | 2.0.10 |
cattrs | 24.1.2 |
certifi | 2025.1.31 |
cffi | 1.17.1 |
charset-normalizer | 3.4.1 |
chromedriver-autoinstaller | 0.6.4 |
click | 8.1.8 |
click-plugins | 1.1.1 |
cligj | 0.7.2 |
cloudpathlib | 0.21.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
comm | 0.2.2 |
commonmark | 0.9.1 |
conda | 24.9.1 |
conda-libmamba-solver | 24.7.0 |
conda-package-handling | 2.3.0 |
conda_package_streaming | 0.10.0 |
confection | 0.1.5 |
contextily | 1.6.2 |
contourpy | 1.3.1 |
cryptography | 43.0.1 |
cycler | 0.12.1 |
cymem | 2.0.11 |
cytoolz | 1.0.0 |
dask | 2024.9.1 |
dask-expr | 1.1.15 |
databricks-sdk | 0.33.0 |
dataclasses-json | 0.6.7 |
debugpy | 1.8.6 |
decorator | 5.1.1 |
Deprecated | 1.2.14 |
diskcache | 5.6.3 |
distributed | 2024.9.1 |
distro | 1.9.0 |
docker | 7.1.0 |
duckdb | 1.2.1 |
en_core_web_sm | 3.8.0 |
entrypoints | 0.4 |
et_xmlfile | 2.0.0 |
exceptiongroup | 1.2.2 |
executing | 2.1.0 |
fastexcel | 0.11.6 |
fastjsonschema | 2.21.1 |
fiona | 1.10.1 |
Flask | 3.0.3 |
folium | 0.17.0 |
fontawesomefree | 6.6.0 |
fonttools | 4.56.0 |
fr_core_news_sm | 3.8.0 |
frozendict | 2.4.4 |
frozenlist | 1.5.0 |
fsspec | 2023.12.2 |
geographiclib | 2.0 |
geopandas | 1.0.1 |
geoplot | 0.5.1 |
geopy | 2.4.1 |
gitdb | 4.0.11 |
GitPython | 3.1.43 |
google-auth | 2.35.0 |
graphene | 3.3 |
graphql-core | 3.2.4 |
graphql-relay | 3.2.0 |
graphviz | 0.20.3 |
great-tables | 0.12.0 |
greenlet | 3.1.1 |
gunicorn | 22.0.0 |
h11 | 0.14.0 |
h2 | 4.1.0 |
hpack | 4.0.0 |
htmltools | 0.6.0 |
httpcore | 1.0.7 |
httpx | 0.28.1 |
httpx-sse | 0.4.0 |
hyperframe | 6.0.1 |
idna | 3.10 |
imageio | 2.37.0 |
importlib_metadata | 8.6.1 |
importlib_resources | 6.5.2 |
inflate64 | 1.0.1 |
ipykernel | 6.29.5 |
ipython | 8.28.0 |
itsdangerous | 2.2.0 |
jedi | 0.19.1 |
Jinja2 | 3.1.6 |
jmespath | 1.0.1 |
joblib | 1.4.2 |
jsonpatch | 1.33 |
jsonpointer | 3.0.0 |
jsonschema | 4.23.0 |
jsonschema-specifications | 2024.10.1 |
jupyter-cache | 1.0.0 |
jupyter_client | 8.6.3 |
jupyter_core | 5.7.2 |
kaleido | 0.2.1 |
kiwisolver | 1.4.8 |
langchain | 0.3.20 |
langchain-community | 0.3.9 |
langchain-core | 0.3.45 |
langchain-text-splitters | 0.3.6 |
langcodes | 3.5.0 |
langsmith | 0.1.147 |
language_data | 1.3.0 |
lazy_loader | 0.4 |
libmambapy | 1.5.9 |
locket | 1.0.0 |
loguru | 0.7.3 |
lxml | 5.3.1 |
lz4 | 4.3.3 |
Mako | 1.3.5 |
mamba | 1.5.9 |
mapclassify | 2.8.1 |
marisa-trie | 1.2.1 |
Markdown | 3.6 |
markdown-it-py | 3.0.0 |
MarkupSafe | 3.0.2 |
marshmallow | 3.26.1 |
matplotlib | 3.10.1 |
matplotlib-inline | 0.1.7 |
mdurl | 0.1.2 |
menuinst | 2.1.2 |
mercantile | 1.2.1 |
mizani | 0.11.4 |
mlflow | 2.16.2 |
mlflow-skinny | 2.16.2 |
msgpack | 1.1.0 |
multidict | 6.1.0 |
multivolumefile | 0.2.3 |
munkres | 1.1.4 |
murmurhash | 1.0.12 |
mypy-extensions | 1.0.0 |
narwhals | 1.30.0 |
nbclient | 0.10.0 |
nbformat | 5.10.4 |
nest_asyncio | 1.6.0 |
networkx | 3.4.2 |
nltk | 3.9.1 |
numpy | 2.2.3 |
opencv-python-headless | 4.10.0.84 |
openpyxl | 3.1.5 |
opentelemetry-api | 1.16.0 |
opentelemetry-sdk | 1.16.0 |
opentelemetry-semantic-conventions | 0.37b0 |
orjson | 3.10.15 |
outcome | 1.3.0.post0 |
OWSLib | 0.28.1 |
packaging | 24.2 |
pandas | 2.2.3 |
paramiko | 3.5.0 |
parso | 0.8.4 |
partd | 1.4.2 |
pathspec | 0.12.1 |
patsy | 1.0.1 |
Pebble | 5.1.0 |
pexpect | 4.9.0 |
pickleshare | 0.7.5 |
pillow | 11.1.0 |
pip | 24.2 |
platformdirs | 4.3.6 |
plotly | 5.24.1 |
plotnine | 0.13.6 |
pluggy | 1.5.0 |
polars | 1.8.2 |
preshed | 3.0.9 |
prometheus_client | 0.21.0 |
prometheus_flask_exporter | 0.23.1 |
prompt_toolkit | 3.0.48 |
propcache | 0.3.0 |
protobuf | 4.25.3 |
psutil | 7.0.0 |
ptyprocess | 0.7.0 |
pure_eval | 0.2.3 |
py7zr | 0.20.8 |
pyarrow | 17.0.0 |
pyarrow-hotfix | 0.6 |
pyasn1 | 0.6.1 |
pyasn1_modules | 0.4.1 |
pybcj | 1.0.3 |
pycosat | 0.6.6 |
pycparser | 2.22 |
pycryptodomex | 3.21.0 |
pydantic | 2.10.6 |
pydantic_core | 2.27.2 |
pydantic-settings | 2.8.1 |
Pygments | 2.19.1 |
PyNaCl | 1.5.0 |
pynsee | 0.1.8 |
pyogrio | 0.10.0 |
pyOpenSSL | 24.2.1 |
pyparsing | 3.2.1 |
pyppmd | 1.1.1 |
pyproj | 3.7.1 |
pyshp | 2.3.1 |
PySocks | 1.7.1 |
python-dateutil | 2.9.0.post0 |
python-dotenv | 1.0.1 |
python-magic | 0.4.27 |
pytz | 2025.1 |
pyu2f | 0.1.5 |
pywaffle | 1.1.1 |
PyYAML | 6.0.2 |
pyzmq | 26.3.0 |
pyzstd | 0.16.2 |
querystring_parser | 1.2.4 |
rasterio | 1.4.3 |
referencing | 0.36.2 |
regex | 2024.9.11 |
requests | 2.32.3 |
requests-cache | 1.2.1 |
requests-toolbelt | 1.0.0 |
retrying | 1.3.4 |
rich | 13.9.4 |
rpds-py | 0.23.1 |
rsa | 4.9 |
rtree | 1.4.0 |
ruamel.yaml | 0.18.6 |
ruamel.yaml.clib | 0.2.8 |
s3fs | 2023.12.2 |
s3transfer | 0.11.3 |
scikit-image | 0.24.0 |
scikit-learn | 1.6.1 |
scipy | 1.13.0 |
seaborn | 0.13.2 |
selenium | 4.29.0 |
setuptools | 76.0.0 |
shapely | 2.0.7 |
shellingham | 1.5.4 |
six | 1.17.0 |
smart-open | 7.1.0 |
smmap | 5.0.0 |
sniffio | 1.3.1 |
sortedcontainers | 2.4.0 |
soupsieve | 2.5 |
spacy | 3.8.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
SQLAlchemy | 2.0.39 |
sqlparse | 0.5.1 |
srsly | 2.5.1 |
stack-data | 0.6.2 |
statsmodels | 0.14.4 |
tabulate | 0.9.0 |
tblib | 3.0.0 |
tenacity | 9.0.0 |
texttable | 1.7.0 |
thinc | 8.3.4 |
threadpoolctl | 3.6.0 |
tifffile | 2025.3.13 |
toolz | 1.0.0 |
topojson | 1.9 |
tornado | 6.4.2 |
tqdm | 4.67.1 |
traitlets | 5.14.3 |
trio | 0.29.0 |
trio-websocket | 0.12.2 |
truststore | 0.9.2 |
typer | 0.15.2 |
typing_extensions | 4.12.2 |
typing-inspect | 0.9.0 |
tzdata | 2025.1 |
Unidecode | 1.3.8 |
url-normalize | 1.4.3 |
urllib3 | 1.26.20 |
uv | 0.6.8 |
wasabi | 1.1.3 |
wcwidth | 0.2.13 |
weasel | 0.4.1 |
webdriver-manager | 4.0.2 |
websocket-client | 1.8.0 |
Werkzeug | 3.0.4 |
wheel | 0.44.0 |
wordcloud | 1.9.3 |
wrapt | 1.17.2 |
wsproto | 1.2.0 |
xgboost | 2.1.1 |
xlrd | 2.0.1 |
xyzservices | 2025.1.0 |
yarl | 1.18.3 |
yellowbrick | 1.5 |
zict | 3.0.0 |
zipp | 3.21.0 |
zstandard | 0.23.0 |
View file history
SHA | Date | Author | Description |
---|---|---|---|
cbe6459f | 2024-11-12 07:24:15 | lgaliana | Revoir quelques abstracts |
593106f1 | 2024-09-21 14:39:32 | lgaliana | Abstract également |
1ebb5eed | 2024-09-21 14:23:43 | lgaliana | Translate the introduction to visualisation |
72d44dd6 | 2024-09-21 12:50:38 | lgaliana | Force build for pandas chapters |
d02515b4 | 2024-04-27 21:32:25 | Lino Galiana | Eléments sur les applis & évaluation (#495) |
005d89b8 | 2023-12-20 17:23:04 | Lino Galiana | Finalise l’affichage des statistiques Git (#478) |
1f23de28 | 2023-12-01 17:25:36 | Lino Galiana | Stockage des images sur S3 (#466) |
09654c71 | 2023-11-14 15:16:44 | Antoine Palazzolo | Suggestions Git & Visualisation (#449) |
80823022 | 2023-08-25 17:48:36 | Lino Galiana | Mise à jour des scripts de construction des notebooks (#395) |
3bdf3b06 | 2023-08-25 11:23:02 | Lino Galiana | Simplification de la structure 🤓 (#393) |
5d4874a8 | 2023-08-11 15:09:33 | Lino Galiana | Pimp les introductions des trois premières parties (#387) |
2dc82e7b | 2022-10-18 22:46:47 | Lino Galiana | Relec Kim (visualisation + API) (#302) |
8e5edba6 | 2022-09-02 11:59:57 | Lino Galiana | Ajoute un chapitre dask (#264) |
a4e24263 | 2022-06-16 19:34:18 | Lino Galiana | Improve style (#238) |
12965bac | 2022-05-25 15:53:27 | Lino Galiana | :launch: Bascule vers quarto (#226) |
66a52761 | 2021-11-23 16:13:20 | Lino Galiana | Relecture partie visualisation (#181) |
4cdb759c | 2021-05-12 10:37:23 | Lino Galiana | :sparkles: :star2: Nouveau thème hugo :snake: :fire: (#105) |
0a0d0348 | 2021-03-26 20:16:22 | Lino Galiana | Ajout d’une section sur S3 (#97) |
5ac3cbee | 2020-09-28 18:59:24 | Lino Galiana | Continue la partie graphiques (#54) |
8ed01f45 | 2020-09-24 21:27:29 | Lino Galiana | Ajout d’une partie visualisation |
Footnotes
To be honest, for a long time,
Python
was a bit less enjoyable in this regard compared toR
, which benefits from the indispensable libraryggplot2
.Not built on the grammar of graphics, the main graphical library in
Python
,Matplotlib
, is more cumbersome to use thanggplot2
.seaborn
, which we will present, simplifies graphical representation somewhat, but again, it is difficult to find something more flexible and universal thanggplot2
.The library
plotnine
aims to provide a similar implementation toggplot
forPython
users. Its development is worth following.↩︎In this regard, I highly recommend keeping up with data visualization news on the platform
Observable
, which tends to bring together the communities of dataviz specialists and data analysts. The libraryPlot
could become a new standard in the coming years, a sort of intermediate betweenggplot
andd3
.↩︎
Citation
@book{galiana2023,
author = {Galiana, Lino},
title = {Python Pour La Data Science},
date = {2023},
url = {https://pythonds.linogaliana.fr/},
doi = {10.5281/zenodo.8229676},
langid = {en}
}