Numpy, the foundation of data science

Numpy is the cornerstone of the data science ecosystem in Python. All data manipulation, modeling, and visualization libraries rely, directly or indirectly, on Numpy. It is therefore essential to review some concepts of this package before moving forward.

Tutoriel
Manipulation
Author

Lino Galiana

Published

2025-03-19

If you want to try the examples in this tutorial:
View on GitHub Onyxia Onyxia Open In Colab

1 Introduction

This chapter serves as an introduction to Numpy to ensure that the basics of vector calculations with Python are mastered. The first part of the chapter presents small exercises to practice some basic functions of Numpy. The end of the chapter presents more in-depth practical exercises using Numpy.

It is recommended to regularly refer to the numpy cheatsheet and the official documentation if you have any doubts about a function.

In this chapter, we will adhere to the convention of importing Numpy as follows:

import numpy as np

We will also set the seed of the random number generator to obtain reproducible results:

np.random.seed(12345)

2 Concept of array

In the world of data science, as will be discussed in more depth in the upcoming chapters, the central object is the two-dimensional data table. The first dimension corresponds to rows and the second to columns. If we only consider one dimension, we refer to a variable (a column) of our data table. It is therefore natural to link data tables to the mathematical objects of matrices and vectors.

NumPy (Numerical Python) is the foundational brick for processing numerical lists or strings of text as matrices. NumPy comes into play to offer this type of object and the associated standardized operations that do not exist in the basic Python language.

The central object of NumPy is the array, which is a multidimensional data table. A Numpy array can be one-dimensional and considered as a vector (1d-array), two-dimensional and considered as a matrix (2d-array), or, more generally, take the form of a multidimensional object (Nd-array), a sort of nested table.

Simple arrays (one or two-dimensional) are easy to represent and cover most of the use-case related to Numpy. We will discover in the next chapter on Pandas that, in practice, we usually don’t directly use Numpy since it is a low-level library. A Pandas DataFrame is constructed from a collection of one-dimensional arrays (the variables of the table), which allows performing coherent (and optimized) operations with the variable type. Having some Numpy knowledge is useful for understanding the logic of vector manipulation, making data processing more readable, efficient, and reliable.

Compared to a list,

  • an array can only contain one type of data (integer, string, etc.), unlike a list.
  • operations implemented by Numpy will be more efficient and require less memory.

Geographical data will constitute a slightly more complex construction than a traditional DataFrame. The geographical dimension takes the form of a deeper table, at least two-dimensional (coordinates of a point). However, geographical data manipulation libraries will handle this increased complexity.

2.1 Creating an array

We can create an array in several ways. To create an array from a list, simply use the array method:

np.array([1,2,5])
array([1, 2, 5])

It is possible to add a dtype argument to constrain the array type:

np.array([["a","z","e"],["r","t"],["y"]], dtype="object")
array([list(['a', 'z', 'e']), list(['r', 't']), list(['y'])], dtype=object)

There are also practical methods for creating arrays:

  • Logical sequences: np.arange (sequence) or np.linspace (linear interpolation between two bounds)
  • Ordered sequences: array filled with zeros, ones, or a desired number: np.zeros, np.ones, or np.full
  • Random sequences: random number generation functions: np.rand.uniform, np.rand.normal, etc.
  • Matrix in the form of an identity matrix: np.eye

This gives, for logical sequences:

np.arange(0,10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(0,10,3)
array([0, 3, 6, 9])
np.linspace(0, 1, 5)
array([0.  , 0.25, 0.5 , 0.75, 1.  ])

For an array initialized to 0:

np.zeros(10, dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

or initialized to 1:

np.ones((3, 5), dtype=float)
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

or even initialized to 3.14:

np.full((3, 5), 3.14)
array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

Finally, to create the matrix \(I_3\):

np.eye(3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
Exercise 1

Generate:

  • \(X\) a random variable, 1000 repetitions of a \(U(0,1)\) distribution
  • \(Y\) a random variable, 1000 repetitions of a normal distribution with zero mean and variance equal to 2
  • Verify the variance of \(Y\) with np.var

3 Indexing and slicing

3.1 Logic illustrated with a one-dimensional array

The simplest structure is the one-dimensional array:

x = np.arange(10)
print(x)
[0 1 2 3 4 5 6 7 8 9]

Indexing in this case is similar to that of a list:

  • The first element is 0
  • The nth element is accessible at position \(n-1\)

The logic for accessing elements is as follows:

x[start:stop:step]

With a one-dimensional array, the slicing operation (keeping a slice of the array) is very simple. For example, to keep the first K elements of an array, you would do:

x[:(K-1)]

In this case, you select the K\(^{th}\) element using:

x[K-1]

To select only one element, you would do:

x = np.arange(10)
x[2]
np.int64(2)

The syntax for selecting particular indices from a list also works with arrays.

Exercise 2

Take x = np.arange(10) and…

  • Select elements 0, 3, 5 from x
  • Select even elements
  • Select all elements except the first
  • Select the first 5 elements
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

3.2 Regarding performance

A key element in the performance of Numpy compared to lists, when it comes to slicing, is that an array does not return a copy of the element in question (a copy that costs memory and time) but simply a view of it.

When it is necessary to make a copy, for example to avoid altering the underlying array, you can use the copy method:

x_sub_copy = x[:2, :2].copy()

It is also possible, and more practical, to select data based on logical conditions (an operation called a boolean mask). This functionality will mainly be used to perform data filtering operations.

For simple comparison operations, logical comparators may be sufficient. These comparisons also work on multidimensional arrays thanks to broadcasting, which we will discuss later:

x = np.arange(10)
x2 = np.array([[-1,1,-2],[-3,2,0]])
print(x)
print(x2)
[0 1 2 3 4 5 6 7 8 9]
[[-1  1 -2]
 [-3  2  0]]
x==2
x2<0
array([[ True, False,  True],
       [ True, False, False]])

To select the observations related to the logical condition, just use the numpy slicing logic that works with logical conditions.

Exercise 3

Given

x = np.random.normal(size=10000)
  1. Keep only the values whose absolute value is greater than 1.96
  2. Count the number of values greater than 1.96 in absolute value and their proportion in the whole set
  3. Sum the absolute values of all observations greater (in absolute value) than 1.96 and relate them to the sum of the values of x (in absolute value)

Whenever possible, it is recommended to use numpy’s logical functions (optimized and well-handling dimensions). Among them are:

  • count_nonzero ;
  • isnan ;
  • any or all especially with the axis argument ;
  • np.array_equal to check element-by-element equality.

Let’s create x a multidimensional array and y a one-dimensional array with a missing value.

x = np.random.normal(0, size=(3, 4))
y = np.array([np.nan, 0, 1])

4 Manipulating an array

4.1 Manipulation functions

Numpy provides standardized methods or functions for modifying here’s a table showing some of them:

Here are some functions to modify an array:

Operation Implementation
Flatten an array x.flatten() (method)
Transpose an array x.T (method) or np.transpose(x) (function)
Append elements to the end np.append(x, [1,2])
Insert elements at a given position (at positions 1 and 2) np.insert(x, [1,2], 3)
Delete elements (at positions 0 and 3) np.delete(x, [0,3])

To combine arrays, you can use, depending on the case, the functions np.concatenate, np.vstack or the method .r_ (row-wise concatenation). np.hstack or the method .column_stack or .c_ (column-wise concatenation).

x = np.random.normal(size = 10)

To sort an array, use np.sort

x = np.array([7, 2, 3, 1, 6, 5, 4])

np.sort(x)
array([1, 2, 3, 4, 5, 6, 7])

If you want to perform a partial reordering to find the k smallest values in an array without sorting them, use partition:

np.partition(x, 3)
array([1, 2, 3, 4, 5, 6, 7])

For classical descriptive statistics, Numpy offers a number of already implemented functions, which can be combined with the axis argument.

x = np.random.normal(0, size=(3, 4))
Exercise 5
  1. Sum all the elements of an array, the elements by row, and the elements by column. Verify the consistency.
  2. Write a function statdesc to return the following values: mean, median, standard deviation, minimum, and maximum. Apply it to x using the axis argument.

5 Broadcasting

Broadcasting refers to a set of rules for applying operations to arrays of different dimensions. In practice, it generally consists of applying a single operation to all members of a numpy array.

The difference can be understood from the following example. Broadcasting allows the scalar 5 to be transformed into a 3-dimensional array:

a = np.array([0, 1, 2])
b = np.array([5, 5, 5])

a + b
a + 5
array([5, 6, 7])

Broadcasting can be very practical for efficiently performing operations on data with a complex structure. For more details, visit here or here.

5.1 Application: programming your own k-nearest neighbors

Exercise (a bit more challenging)
  1. Create X, a two-dimensional array (i.e., a matrix) with 10 rows and 2 columns. The numbers in the array are random.
  2. Import the matplotlib.pyplot module as plt. Use plt.scatter to plot the data as a scatter plot.
  3. Construct a 10x10 matrix storing, at element \((i,j)\), the Euclidean distance between points \(X[i,]\) and \(X[j,]\). To do this, you will need to work with dimensions by creating nested arrays using np.newaxis:
  • First, use X1 = X[:, np.newaxis, :] to transform the matrix into a nested array. Check the dimensions.
  • Create X2 of dimension (1, 10, 2) using the same logic.
  • Deduce, for each point, the distance with other points for each coordinate. Square this distance.
  • At this stage, you should have an array of dimension (10, 10, 2). The reduction to a matrix is obtained by summing over the last axis. Check the help of np.sum on how to sum over the last axis.
  • Finally, apply the square root to obtain a proper Euclidean distance.
  1. Verify that the diagonal elements are zero (distance of a point to itself…).
  2. Now, sort for each point the points with the most similar values. Use np.argsort to get the ranking of the closest points for each row.
  3. We are interested in the k-nearest neighbors. For now, set k=2. Use argpartition to reorder each row so that the 2 closest neighbors of each point come first, followed by the rest of the row.
  4. Use the code snippet below to graphically represent the nearest neighbors.
A hint for graphically representing the nearest neighbors
plt.scatter(X[:, 0], X[:, 1], s=100)

# draw lines from each point to its two nearest neighbors
K = 2

for i in range(X.shape[0]):
    for j in nearest_partition[i, :K+1]:
        # plot a line from X[i] to X[j]
        # use some zip magic to make it happen:
        plt.plot(*zip(X[j], X[i]), color='black')

Question 7 result is :

Did I invent this challenging exercise? Not at all, it comes from the book Python Data Science Handbook. But if I had told you this immediately, would you have tried to answer the questions?

Moreover, it would not be a good idea to generalize this algorithm to large datasets. The complexity of our approach is \(O(N^2)\). The algorithm implemented by Scikit-Learn is \(O[NlogN]\).

Additionally, computing matrix distances using the power of GPU (graphics cards) would be faster. In this regard, the library faiss, or the dedicated frameworks for computing distance between high-dimensional vectors like ChromaDB offer much more satisfactory performance than Numpy for this specific problem.

6 Additional Exercises

Google became famous thanks to its PageRank algorithm. This algorithm allows, from links between websites, to give an importance score to a website which will be used to evaluate its centrality in a network. The objective of this exercise is to use Numpy to implement such an algorithm from an adjacency matrix that links the sites together.

Comprendre le principe de l’algorithme PageRank

Google est devenu célèbre grâce à son algorithme PageRank. Celui-ci permet, à partir de liens entre sites web, de donner un score d’importance à un site web qui va être utilisé pour évaluer sa centralité dans un réseau. L’objectif de cet exercice est d’utiliser Numpy pour mettre en oeuvre un tel algorithme à partir d’une matrice d’adjacence qui relie les sites entre eux.

  1. Créer la matrice suivante avec numpy. L’appeler M:

\[ \begin{bmatrix} 0 & 0 & 0 & 0 & 1 \\ 0.5 & 0 & 0 & 0 & 0 \\ 0.5 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0.5 & 0 & 0 \\ 0 & 0 & 0.5 & 1 & 0 \end{bmatrix} \]

  1. Pour représenter visuellement ce web minimaliste, convertir en objet networkx (une librairie spécialisée dans l’analyse de réseau) et utiliser la fonction draw de ce package.

Il s’agit de la transposée de la matrice d’adjacence qui permet de relier les sites entre eux. Par exemple, le site 1 (première colonne) est référencé par les sites 2 et 3. Celui-ci ne référence que le site 5.

  1. A partir de la page wikipedia anglaise de PageRank, tester sur votre matrice.

Site 1 is quite central because it is referenced twice. Site 5 is also central since it is referenced by site 1.

array([[0.25419178],
       [0.13803151],
       [0.13803151],
       [0.20599017],
       [0.26375504]])

Informations additionnelles

environment files have been tested on.

Latest built version: 2025-03-19

Python version used:

'3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0]'
Package Version
affine 2.4.0
aiobotocore 2.21.1
aiohappyeyeballs 2.6.1
aiohttp 3.11.13
aioitertools 0.12.0
aiosignal 1.3.2
alembic 1.13.3
altair 5.4.1
aniso8601 9.0.1
annotated-types 0.7.0
anyio 4.8.0
appdirs 1.4.4
archspec 0.2.3
asttokens 2.4.1
attrs 25.3.0
babel 2.17.0
bcrypt 4.2.0
beautifulsoup4 4.12.3
black 24.8.0
blinker 1.8.2
blis 1.2.0
bokeh 3.5.2
boltons 24.0.0
boto3 1.37.1
botocore 1.37.1
branca 0.7.2
Brotli 1.1.0
bs4 0.0.2
cachetools 5.5.0
cartiflette 0.0.2
Cartopy 0.24.1
catalogue 2.0.10
cattrs 24.1.2
certifi 2025.1.31
cffi 1.17.1
charset-normalizer 3.4.1
chromedriver-autoinstaller 0.6.4
click 8.1.8
click-plugins 1.1.1
cligj 0.7.2
cloudpathlib 0.21.0
cloudpickle 3.0.0
colorama 0.4.6
comm 0.2.2
commonmark 0.9.1
conda 24.9.1
conda-libmamba-solver 24.7.0
conda-package-handling 2.3.0
conda_package_streaming 0.10.0
confection 0.1.5
contextily 1.6.2
contourpy 1.3.1
cryptography 43.0.1
cycler 0.12.1
cymem 2.0.11
cytoolz 1.0.0
dask 2024.9.1
dask-expr 1.1.15
databricks-sdk 0.33.0
dataclasses-json 0.6.7
debugpy 1.8.6
decorator 5.1.1
Deprecated 1.2.14
diskcache 5.6.3
distributed 2024.9.1
distro 1.9.0
docker 7.1.0
duckdb 1.2.1
en_core_web_sm 3.8.0
entrypoints 0.4
et_xmlfile 2.0.0
exceptiongroup 1.2.2
executing 2.1.0
fastexcel 0.11.6
fastjsonschema 2.21.1
fiona 1.10.1
Flask 3.0.3
folium 0.17.0
fontawesomefree 6.6.0
fonttools 4.56.0
fr_core_news_sm 3.8.0
frozendict 2.4.4
frozenlist 1.5.0
fsspec 2023.12.2
geographiclib 2.0
geopandas 1.0.1
geoplot 0.5.1
geopy 2.4.1
gitdb 4.0.11
GitPython 3.1.43
google-auth 2.35.0
graphene 3.3
graphql-core 3.2.4
graphql-relay 3.2.0
graphviz 0.20.3
great-tables 0.12.0
greenlet 3.1.1
gunicorn 22.0.0
h11 0.14.0
h2 4.1.0
hpack 4.0.0
htmltools 0.6.0
httpcore 1.0.7
httpx 0.28.1
httpx-sse 0.4.0
hyperframe 6.0.1
idna 3.10
imageio 2.37.0
importlib_metadata 8.6.1
importlib_resources 6.5.2
inflate64 1.0.1
ipykernel 6.29.5
ipython 8.28.0
itsdangerous 2.2.0
jedi 0.19.1
Jinja2 3.1.6
jmespath 1.0.1
joblib 1.4.2
jsonpatch 1.33
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
jupyter-cache 1.0.0
jupyter_client 8.6.3
jupyter_core 5.7.2
kaleido 0.2.1
kiwisolver 1.4.8
langchain 0.3.20
langchain-community 0.3.9
langchain-core 0.3.45
langchain-text-splitters 0.3.6
langcodes 3.5.0
langsmith 0.1.147
language_data 1.3.0
lazy_loader 0.4
libmambapy 1.5.9
locket 1.0.0
loguru 0.7.3
lxml 5.3.1
lz4 4.3.3
Mako 1.3.5
mamba 1.5.9
mapclassify 2.8.1
marisa-trie 1.2.1
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 3.0.2
marshmallow 3.26.1
matplotlib 3.10.1
matplotlib-inline 0.1.7
mdurl 0.1.2
menuinst 2.1.2
mercantile 1.2.1
mizani 0.11.4
mlflow 2.16.2
mlflow-skinny 2.16.2
msgpack 1.1.0
multidict 6.1.0
multivolumefile 0.2.3
munkres 1.1.4
murmurhash 1.0.12
mypy-extensions 1.0.0
narwhals 1.30.0
nbclient 0.10.0
nbformat 5.10.4
nest_asyncio 1.6.0
networkx 3.4.2
nltk 3.9.1
numpy 2.2.3
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
opentelemetry-api 1.16.0
opentelemetry-sdk 1.16.0
opentelemetry-semantic-conventions 0.37b0
orjson 3.10.15
outcome 1.3.0.post0
OWSLib 0.28.1
packaging 24.2
pandas 2.2.3
paramiko 3.5.0
parso 0.8.4
partd 1.4.2
pathspec 0.12.1
patsy 1.0.1
Pebble 5.1.0
pexpect 4.9.0
pickleshare 0.7.5
pillow 11.1.0
pip 24.2
platformdirs 4.3.6
plotly 5.24.1
plotnine 0.13.6
pluggy 1.5.0
polars 1.8.2
preshed 3.0.9
prometheus_client 0.21.0
prometheus_flask_exporter 0.23.1
prompt_toolkit 3.0.48
propcache 0.3.0
protobuf 4.25.3
psutil 7.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
py7zr 0.20.8
pyarrow 17.0.0
pyarrow-hotfix 0.6
pyasn1 0.6.1
pyasn1_modules 0.4.1
pybcj 1.0.3
pycosat 0.6.6
pycparser 2.22
pycryptodomex 3.21.0
pydantic 2.10.6
pydantic_core 2.27.2
pydantic-settings 2.8.1
Pygments 2.19.1
PyNaCl 1.5.0
pynsee 0.1.8
pyogrio 0.10.0
pyOpenSSL 24.2.1
pyparsing 3.2.1
pyppmd 1.1.1
pyproj 3.7.1
pyshp 2.3.1
PySocks 1.7.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-magic 0.4.27
pytz 2025.1
pyu2f 0.1.5
pywaffle 1.1.1
PyYAML 6.0.2
pyzmq 26.3.0
pyzstd 0.16.2
querystring_parser 1.2.4
rasterio 1.4.3
referencing 0.36.2
regex 2024.9.11
requests 2.32.3
requests-cache 1.2.1
requests-toolbelt 1.0.0
retrying 1.3.4
rich 13.9.4
rpds-py 0.23.1
rsa 4.9
rtree 1.4.0
ruamel.yaml 0.18.6
ruamel.yaml.clib 0.2.8
s3fs 2023.12.2
s3transfer 0.11.3
scikit-image 0.24.0
scikit-learn 1.6.1
scipy 1.13.0
seaborn 0.13.2
selenium 4.29.0
setuptools 76.0.0
shapely 2.0.7
shellingham 1.5.4
six 1.17.0
smart-open 7.1.0
smmap 5.0.0
sniffio 1.3.1
sortedcontainers 2.4.0
soupsieve 2.5
spacy 3.8.4
spacy-legacy 3.0.12
spacy-loggers 1.0.5
SQLAlchemy 2.0.39
sqlparse 0.5.1
srsly 2.5.1
stack-data 0.6.2
statsmodels 0.14.4
tabulate 0.9.0
tblib 3.0.0
tenacity 9.0.0
texttable 1.7.0
thinc 8.3.4
threadpoolctl 3.6.0
tifffile 2025.3.13
toolz 1.0.0
topojson 1.9
tornado 6.4.2
tqdm 4.67.1
traitlets 5.14.3
trio 0.29.0
trio-websocket 0.12.2
truststore 0.9.2
typer 0.15.2
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2025.1
Unidecode 1.3.8
url-normalize 1.4.3
urllib3 1.26.20
uv 0.6.8
wasabi 1.1.3
wcwidth 0.2.13
weasel 0.4.1
webdriver-manager 4.0.2
websocket-client 1.8.0
Werkzeug 3.0.4
wheel 0.44.0
wordcloud 1.9.3
wrapt 1.17.2
wsproto 1.2.0
xgboost 2.1.1
xlrd 2.0.1
xyzservices 2025.1.0
yarl 1.18.3
yellowbrick 1.5
zict 3.0.0
zipp 3.21.0
zstandard 0.23.0

View file history

SHA Date Author Description
488780a4 2024-09-25 14:32:16 Lino Galiana Change badge (#556)
4640e6da 2024-09-18 11:53:05 linogaliana corrections
88b030e8 2024-08-08 17:45:56 Lino Galiana Replace by English metadata when relevant (#535)
580cba77 2024-08-07 18:59:35 Lino Galiana Multilingual version as quarto profile (#533)
72f42bb7 2024-07-25 19:06:38 Lino Galiana Language message on notebooks (#529)
195dc9e9 2024-07-25 11:59:19 linogaliana Switch language button
6bf883d9 2024-07-08 15:09:21 Lino Galiana Rename files (#518)
56b6442d 2024-07-08 15:05:57 Lino Galiana Version anglaise du chapitre numpy (#516)
065b0abd 2024-07-08 11:19:43 Lino Galiana Nouveaux callout dans la partie manipulation (#513)
d75641d7 2024-04-22 18:59:01 Lino Galiana Editorialisation des chapitres de manipulation de données (#491)
005d89b8 2023-12-20 17:23:04 Lino Galiana Finalise l’affichage des statistiques Git (#478)
16842200 2023-12-02 12:06:40 Antoine Palazzolo Première partie de relecture de fin du cours (#467)
1f23de28 2023-12-01 17:25:36 Lino Galiana Stockage des images sur S3 (#466)
a06a2689 2023-11-23 18:23:28 Antoine Palazzolo 2ème relectures chapitres ML (#457)
889a71ba 2023-11-10 11:40:51 Antoine Palazzolo Modification TP 3 (#443)
a7711832 2023-10-09 11:27:45 Antoine Palazzolo Relecture TD2 par Antoine (#418)
a63319ad 2023-10-04 15:29:04 Lino Galiana Correction du TP numpy (#419)
e8d0062d 2023-09-26 15:54:49 Kim A Relecture KA 25/09/2023 (#412)
154f09e4 2023-09-26 14:59:11 Antoine Palazzolo Des typos corrigées par Antoine (#411)
a8f90c2f 2023-08-28 09:26:12 Lino Galiana Update featured paths (#396)
80823022 2023-08-25 17:48:36 Lino Galiana Mise à jour des scripts de construction des notebooks (#395)
3bdf3b06 2023-08-25 11:23:02 Lino Galiana Simplification de la structure 🤓 (#393)
9e1e6e41 2023-07-20 02:27:22 Lino Galiana Change launch script (#379)
130ed717 2023-07-18 19:37:11 Lino Galiana Restructure les titres (#374)
ef28fefd 2023-07-07 08:14:42 Lino Galiana Listing pour la première partie (#369)
f21a24d3 2023-07-02 10:58:15 Lino Galiana Pipeline Quarto & Pages 🚀 (#365)
7e15843a 2023-02-13 18:57:28 Lino Galiana from_numpy_array no longer in networkx 3.0 (#353)
a408cc96 2023-02-01 09:07:27 Lino Galiana Ajoute bouton suggérer modification (#347)
3c880d59 2022-12-27 17:34:59 Lino Galiana Chapitre regex + Change les boites dans plusieurs chapitres (#339)
e2b53ac9 2022-09-28 17:09:31 Lino Galiana Retouche les chapitres pandas (#287)
d068cb6d 2022-09-24 14:58:07 Lino Galiana Corrections avec echo true (#279)
b2d48237 2022-09-21 17:36:29 Lino Galiana Relec KA 21/09 (#273)
a56dd451 2022-09-20 15:27:56 Lino Galiana Fix SSPCloud links (#270)
f10815b5 2022-08-25 16:00:03 Lino Galiana Notebooks should now look more beautiful (#260)
494a85ae 2022-08-05 14:49:56 Lino Galiana Images featured ✨ (#252)
d201e3cd 2022-08-03 15:50:34 Lino Galiana Pimp la homepage ✨ (#249)
1ca1a8a7 2022-05-31 11:44:23 Lino Galiana Retour du chapitre API (#228)
4fc58e52 2022-05-25 18:29:25 Lino Galiana Change deployment on SSP Cloud with new filesystem organization (#227)
12965bac 2022-05-25 15:53:27 Lino Galiana :launch: Bascule vers quarto (#226)
9c71d6e7 2022-03-08 10:34:26 Lino Galiana Plus d’éléments sur S3 (#218)
6777f038 2021-10-29 09:38:09 Lino Galiana Notebooks corrections (#171)
2a8809fb 2021-10-27 12:05:34 Lino Galiana Simplification des hooks pour gagner en flexibilité et clarté (#166)
26ea709d 2021-09-27 19:11:00 Lino Galiana Règle quelques problèmes np (#154)
2fa78c9f 2021-09-27 11:24:19 Lino Galiana Relecture de la partie numpy/pandas (#152)
85ba1194 2021-09-16 11:27:56 Lino Galiana Relectures des TP KA avant 1er cours (#142)
2e4d5862 2021-09-02 12:03:39 Lino Galiana Simplify badges generation (#130)
2f7b52d9 2021-07-20 17:37:03 Lino Galiana Improve notebooks automatic creation (#120)
80877d20 2021-06-28 11:34:24 Lino Galiana Ajout d’un exercice de NLP à partir openfood database (#98)
6729a724 2021-06-22 18:07:05 Lino Galiana Mise à jour badge onyxia (#115)
4cdb759c 2021-05-12 10:37:23 Lino Galiana :sparkles: :star2: Nouveau thème hugo :snake: :fire: (#105)
7f9f97bc 2021-04-30 21:44:04 Lino Galiana 🐳 + 🐍 New workflow (docker 🐳) and new dataset for modelization (2020 🇺🇸 elections) (#99)
0a0d0348 2021-03-26 20:16:22 Lino Galiana Ajout d’une section sur S3 (#97)
6d010fa2 2020-09-29 18:45:34 Lino Galiana Simplifie l’arborescence du site, partie 1 (#57)
66f9f87a 2020-09-24 19:23:04 Lino Galiana Introduction des figures générées par python dans le site (#52)
edca3916 2020-09-21 19:31:02 Lino Galiana Change np.is_nan to np.isnan
f9f00cc0 2020-09-15 21:05:54 Lino Galiana enlève quelques TO DO
4677769b 2020-09-15 18:19:24 Lino Galiana Nettoyage des coquilles pour premiers TP (#37)
d48e68fa 2020-09-08 18:35:07 Lino Galiana Continuer la partie pandas (#13)
913047d3 2020-09-08 14:44:41 Lino Galiana Harmonisation des niveaux de titre (#17)
c452b832 2020-07-28 17:32:06 Lino Galiana TP Numpy (#9)
200b6c1f 2020-07-27 12:50:33 Lino Galiana Encore une coquille
5041b280 2020-07-27 12:44:10 Lino Galiana Une coquille à cause d’un bloc jupyter
e8db4cf0 2020-07-24 12:56:38 Lino Galiana modif des markdown
b24a1fe7 2020-07-23 18:20:09 Lino Galiana Add notebook
4f8f1caa 2020-07-23 18:19:28 Lino Galiana fix typo
434d20e8 2020-07-23 18:18:46 Lino Galiana Essai de yaml header
5ac02efd 2020-07-23 18:05:12 Lino Galiana Essai de md généré avec jupytext
Back to top

Citation

BibTeX citation:
@book{galiana2023,
  author = {Galiana, Lino},
  title = {Python Pour La Data Science},
  date = {2023},
  url = {https://pythonds.linogaliana.fr/},
  doi = {10.5281/zenodo.8229676},
  langid = {en}
}
For attribution, please cite this work as:
Galiana, Lino. 2023. Python Pour La Data Science. https://doi.org/10.5281/zenodo.8229676.