Statement of Completion#6ed2b9b2
Data Science Tools
easy
Jupyter Notebook Tutorial
Resolution
Activities
Project.ipynb
Everything is a cell!¶
And here are some empty cells:
In [ ]:
In [ ]:
In [ ]:
Multi mode:
In [1]:
2 + 2
Out[1]:
4
Cell Types¶
This is a markdown cell. It can contain different formatting options as bold, or italic text, code blocks:
def my_function(x, y):
pass
Or even images:
In [2]:
# this is a code cell
2 + 2
Out[2]:
4
Now try it yourself:
In [3]:
Turn this cell into a markdown cell
Cell In[3], line 1 Turn this cell into a markdown cell ^ SyntaxError: invalid syntax
Turn this cell into a code cell
Working with code¶
Execute the following cells in order:
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
x = np.linspace(0, 10, 500)
y = np.cumsum(np.random.randn(500, 6), 0)
In [ ]:
plt.figure(figsize=(12, 7))
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left')
Working with data¶
In [ ]:
import pandas as pd
In [ ]:
btc = pd.read_csv('btc.csv', index_col='timestamp', parse_dates=True)
In [ ]:
eth = pd.read_csv('eth.csv', index_col='timestamp', parse_dates=True)
Bitcoin:
In [ ]:
btc.head()
In [ ]:
btc.tail()
In [ ]:
btc.info()
In [ ]:
btc['close'].plot(figsize=(15, 7))
Ether:
In [ ]:
eth.head()
In [ ]:
eth['close'].plot(figsize=(15, 7))
In [ ]:
eth.head()
Activities¶
1) Fill your name in the placeholder cell and execute it¶
In [5]:
name = "Ashish Khare"
name
Out[5]:
'Ashish Khare'
2) Complete the function add
to produce the right result¶
In [7]:
def add(x, y):
return x + y
add(2, 3)
Out[7]:
5
Untitled.ipynb
In [ ]:
In [1]:
import pandas as pd
In [22]:
%%time
df = pd.read_csv(
"/Users/santiagobasulto/code/datawars/datasources/crypto/ethusd_historic.csv", #engine='pyarrow', dtype_backend='pyarrow',
index_col='timestamp', parse_dates=True)
CPU times: user 41.7 s, sys: 1.09 s, total: 42.8 s Wall time: 42.8 s
In [23]:
df.head()
Out[23]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2017-09-01 00:00:00+00:00 | 387.98 | 387.98 | 387.98 | 387.98 | 16.283653 |
2017-09-01 00:01:00+00:00 | 387.99 | 388.00 | 387.98 | 387.98 | 6.020751 |
2017-09-01 00:02:00+00:00 | 387.27 | 388.00 | 386.80 | 388.00 | 32.201542 |
2017-09-01 00:03:00+00:00 | 388.00 | 388.00 | 388.00 | 388.00 | 16.804457 |
2017-09-01 00:04:00+00:00 | 388.00 | 388.00 | 388.00 | 388.00 | 0.391802 |
In [24]:
df.tail()
Out[24]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2023-12-09 15:23:00+00:00 | 2354.4 | 2354.4 | 2353.0 | 2353.0 | 1.010736 |
2023-12-09 15:24:00+00:00 | 2352.7 | 2352.9 | 2352.4 | 2352.4 | 1.481674 |
2023-12-09 15:25:00+00:00 | 2353.1 | 2353.1 | 2352.8 | 2352.8 | 0.248465 |
2023-12-09 15:26:00+00:00 | 2353.6 | 2354.1 | 2353.6 | 2354.1 | 0.880320 |
2023-12-09 15:27:00+00:00 | 2354.5 | 2354.7 | 2353.7 | 2353.7 | 0.144378 |
In [25]:
df.shape
Out[25]:
(3298528, 5)
In [26]:
df.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3298528 entries, 2017-09-01 00:00:00+00:00 to 2023-12-09 15:27:00+00:00 Data columns (total 5 columns): # Column Dtype --- ------ ----- 0 open float64 1 high float64 2 low float64 3 close float64 4 volume float64 dtypes: float64(5) memory usage: 151.0 MB
In [27]:
df.head()
Out[27]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2017-09-01 00:00:00+00:00 | 387.98 | 387.98 | 387.98 | 387.98 | 16.283653 |
2017-09-01 00:01:00+00:00 | 387.99 | 388.00 | 387.98 | 387.98 | 6.020751 |
2017-09-01 00:02:00+00:00 | 387.27 | 388.00 | 386.80 | 388.00 | 32.201542 |
2017-09-01 00:03:00+00:00 | 388.00 | 388.00 | 388.00 | 388.00 | 16.804457 |
2017-09-01 00:04:00+00:00 | 388.00 | 388.00 | 388.00 | 388.00 | 0.391802 |
In [28]:
df.loc["2023-12-01":].to_csv('eth.csv')
In [21]:
!ls -lah btc.csv
-rw-r--r-- 1 santiagobasulto staff 914K Dec 12 18:34 btc.csv
In [4]:
!head btcusd_historic.csv
timestamp,open,high,low,close,volume 2020-01-01 00:00:00+00:00,7160.69,7160.69,7159.64,7159.64,5.50169101 2020-01-01 00:01:00+00:00,7161.51,7161.51,7155.09,7161.2,3.77692446 2020-01-01 00:02:00+00:00,7158.82,7158.82,7158.82,7158.82,0.02927792 2020-01-01 00:03:00+00:00,7158.82,7158.82,7156.9,7156.9,0.06581935 2020-01-01 00:04:00+00:00,7158.5,7158.5,7154.97,7157.2,0.97138671 2020-01-01 00:05:00+00:00,7156.52,7159.51,7150.1,7158.5,0.88693164 2020-01-01 00:06:00+00:00,7157.22,7158.54,7153.01,7153.01,0.10981518 2020-01-01 00:07:00+00:00,7154.11,7154.11,7154.11,7154.11,0.9111215 2020-01-01 00:08:00+00:00,7153.67,7159.25,7153.67,7159.25,0.35482069
In [5]:
!tail btcusd_historic.csv
2023-12-07 14:32:00+00:00,43413.0,43437.0,43413.0,43437.0,1.4995582 2023-12-07 14:33:00+00:00,43425.0,43437.0,43425.0,43427.0,0.10291892 2023-12-07 14:34:00+00:00,43431.0,43443.0,43429.0,43443.0,0.21622808 2023-12-07 14:35:00+00:00,43449.0,43475.0,43449.0,43475.0,0.33804039 2023-12-07 14:36:00+00:00,43484.0,43491.0,43448.0,43449.0,0.27382568 2023-12-07 14:37:00+00:00,43447.0,43447.0,43413.0,43413.0,0.08949208 2023-12-07 14:38:00+00:00,43435.0,43442.0,43422.0,43434.0,0.05946911 2023-12-07 14:39:00+00:00,43433.0,43433.0,43415.0,43415.0,0.04544965 2023-12-07 14:40:00+00:00,43441.0,43441.0,43423.0,43423.0,0.02397361 2023-12-07 14:42:00+00:00,43434.0,43434.0,43434.0,43434.0,0.0
In [ ]:
In [16]:
df = pd.read_csv("ethusd_historic.csv", dtype_backend="pyarrow", parse_dates=['timestamp'])
df.head()
Out[16]:
timestamp | open | high | low | close | volume | |
---|---|---|---|---|---|---|
0 | 2017-09-01 00:00:00+00:00 | 387.98 | 387.98 | 387.98 | 387.98 | 16.283653 |
1 | 2017-09-01 00:01:00+00:00 | 387.99 | 388.00 | 387.98 | 387.98 | 6.020751 |
2 | 2017-09-01 00:02:00+00:00 | 387.27 | 388.00 | 386.80 | 388.00 | 32.201542 |
3 | 2017-09-01 00:03:00+00:00 | 388.00 | 388.00 | 388.00 | 388.00 | 16.804457 |
4 | 2017-09-01 00:04:00+00:00 | 388.00 | 388.00 | 388.00 | 388.00 | 0.391802 |
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2068722 entries, 0 to 2068721 Data columns (total 6 columns): # Column Dtype --- ------ ----- 0 timestamp datetime64[ns, UTC] 1 open double[pyarrow] 2 high double[pyarrow] 3 low double[pyarrow] 4 close double[pyarrow] 5 volume double[pyarrow] dtypes: datetime64[ns, UTC](1), double[pyarrow](5) memory usage: 94.7 MB
In [7]:
df.head()
Out[7]:
timestamp | open | high | low | close | volume | |
---|---|---|---|---|---|---|
0 | 2020-01-01 00:00:00+00:00 | 7160.69 | 7160.69 | 7159.64 | 7159.64 | 5.501691 |
1 | 2020-01-01 00:01:00+00:00 | 7161.51 | 7161.51 | 7155.09 | 7161.20 | 3.776924 |
2 | 2020-01-01 00:02:00+00:00 | 7158.82 | 7158.82 | 7158.82 | 7158.82 | 0.029278 |
3 | 2020-01-01 00:03:00+00:00 | 7158.82 | 7158.82 | 7156.90 | 7156.90 | 0.065819 |
4 | 2020-01-01 00:04:00+00:00 | 7158.50 | 7158.50 | 7154.97 | 7157.20 | 0.971387 |
In [8]:
df.set_index('timestamp', inplace=True)
In [9]:
df.head()
Out[9]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2020-01-01 00:00:00+00:00 | 7160.69 | 7160.69 | 7159.64 | 7159.64 | 5.501691 |
2020-01-01 00:01:00+00:00 | 7161.51 | 7161.51 | 7155.09 | 7161.20 | 3.776924 |
2020-01-01 00:02:00+00:00 | 7158.82 | 7158.82 | 7158.82 | 7158.82 | 0.029278 |
2020-01-01 00:03:00+00:00 | 7158.82 | 7158.82 | 7156.90 | 7156.90 | 0.065819 |
2020-01-01 00:04:00+00:00 | 7158.50 | 7158.50 | 7154.97 | 7157.20 | 0.971387 |
In [11]:
df.index.is_monotonic_increasing
Out[11]:
True
In [15]:
df['high'].resample("D").max().plot(figsize=(14, 7))
Out[15]:
<Axes: xlabel='timestamp'>
Exercises.ipynb
Example 1: Solve the add function¶
Complete the code of the function add
, that receives 2 numbers and should return the sum of them:
In [ ]:
def add(x, y):
pass
You can use the +
operator to sum numbers.
In [ ]:
def add(x, y):
return x + y
Original.ipynb
Welcome to our Jupyter Tutorial. This is an interactive tutorial focused on getting you familiarized with Notebooks and interactive labs.
Part 1: everything is a cell¶
Jupyter Notebooks are organized as a set of "cells". Each cell can contain different types of content: like Python code (or R, Julia, etc), images or even human readable text (markdown), like the one you're currently reading.
I've left a couple of empty cells below for you to see them:
In [ ]:
In [ ]:
In [ ]:
This is another cell containing Markdown (human readable) code. And below, another empty cell:
In [ ]:
You can edit these cells just by double clicking on them. Try editing the following cell:
👉 Double click on me 👈
When you double click the cell, it should open an "edit mode", and you should see something similar to:
If you're seeing those asterisks, it's because you've correctly entered "Edit Mode". Once you've made the changes, you have to "execute", or "run" the cell to reflect the changes. To do that just click on the little play button on the top menu bar:
Jupyter notebooks are optimized for an efficient workflow. There are many keyboard shortcuts that will let you interact with your documents, run code and make other changes; mastering these shortcuts will speed up your work. For example, there are two shortcuts to execute a cell:
shift + return
: Run cell and advance to the next one.ctrl + return
: Run the cell but don't change focus.
Try them with the following cell:
In [ ]:
2 + 2
You can try executing these cells as many times as you want, it won't break anything
ctrl + Return
effect:¶
As you can see in the following animation, the code is correctly executed (it returns 4) and the focus (the blue line at the left side of the cell) stays in the same cell.
Now compare it to the next shortcut, shift + return
:
shift + Return
effect:¶
As you can see, every time I execute code the focus changes to the cell below.
Part 2: Working with code¶
Jupyter notebooks have amazing features to include text and images and create beautiful, human readable documents as you've just seen. But their main benefit is working with code. Now we're going to import a few libraries and start experimenting with Python code. We've already done the simple 2 + 2
before, so let's do something a little bit more interesting. First, we need to import numpy
and matplotlib
:
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Notebooks.ai include all the most popular Data Science and Deep Learning libraries already installed. And even if there's one missing, you can always install it in your own environment (more on that later). We've just imported these two libraries:
numpy
the most popular Python library for array manipulation and numeric computingmatplotlib
the most popular visualization library in the Python ecosystem.
Let's now execute a few lines of code and generate some plots:
In [2]:
x = np.linspace(0, 10, 500)
y = np.cumsum(np.random.randn(500, 6), 0)
In [3]:
plt.figure(figsize=(12, 7))
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left')
Out[3]:
<matplotlib.legend.Legend at 0x12187e690>
But what is that 😱? Just random generated datapoints, but you can clearly see how simple is to do numeric processing and plotting with Notebooks.ai.
Part 3: Interacting with data¶
Jupyter and Python make it really simple to intereact with files in your local storage. These files are securely stored in the cloud and you can access them from anywhere in the world.
As an example, we're going to load two CSV files (that you can preview in this public spreadsheet) into our notebook and play with them.
The CSVs contain price of Bitcoin and Ether for the first two weeks of December 2023, with a granularity of one minute. This is what's usually referred to as OHLC
(Open, High, Low, Close).
In [12]:
import pandas as pd
In [13]:
btc = pd.read_csv('btc.csv', index_col='timestamp', parse_dates=True)
In [16]:
eth = pd.read_csv('eth.csv', index_col='timestamp', parse_dates=True)
Bitcoin:
In [17]:
btc.head()
Out[17]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2023-12-01 00:00:00+00:00 | 37731.0 | 37741.0 | 37728.0 | 37741.0 | 4.349545 |
2023-12-01 00:01:00+00:00 | 37739.0 | 37739.0 | 37724.0 | 37724.0 | 1.000053 |
2023-12-01 00:02:00+00:00 | 37721.0 | 37721.0 | 37701.0 | 37701.0 | 0.169017 |
2023-12-01 00:03:00+00:00 | 37703.0 | 37704.0 | 37703.0 | 37704.0 | 0.099991 |
2023-12-01 00:04:00+00:00 | 37703.0 | 37703.0 | 37697.0 | 37697.0 | 0.135919 |
In [18]:
btc.tail()
Out[18]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2023-12-10 10:49:00+00:00 | 43683.0 | 43683.0 | 43683.0 | 43683.0 | 0.000000 |
2023-12-10 10:50:00+00:00 | 43671.0 | 43675.0 | 43671.0 | 43675.0 | 0.008716 |
2023-12-10 10:51:00+00:00 | 43694.0 | 43694.0 | 43694.0 | 43694.0 | 0.003141 |
2023-12-10 10:52:00+00:00 | 43711.0 | 43711.0 | 43709.0 | 43709.0 | 0.029046 |
2023-12-10 10:53:00+00:00 | 43709.0 | 43709.0 | 43709.0 | 43709.0 | 0.000000 |
In [19]:
btc.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 13698 entries, 2023-12-01 00:00:00+00:00 to 2023-12-10 10:53:00+00:00 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 13698 non-null float64 1 high 13698 non-null float64 2 low 13698 non-null float64 3 close 13698 non-null float64 4 volume 13698 non-null float64 dtypes: float64(5) memory usage: 642.1 KB
In [20]:
btc['close'].plot(figsize=(15, 7))
Out[20]:
<Axes: xlabel='timestamp'>
Ether:
In [21]:
eth.head()
Out[21]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2023-12-01 00:00:00+00:00 | 2053.1 | 2054.3 | 2053.1 | 2053.8 | 98.816206 |
2023-12-01 00:01:00+00:00 | 2054.0 | 2054.0 | 2052.9 | 2052.9 | 101.136155 |
2023-12-01 00:02:00+00:00 | 2052.6 | 2052.6 | 2051.2 | 2051.2 | 284.303968 |
2023-12-01 00:03:00+00:00 | 2051.5 | 2052.3 | 2051.3 | 2052.3 | 67.405196 |
2023-12-01 00:04:00+00:00 | 2052.2 | 2052.6 | 2052.1 | 2052.1 | 96.918102 |
In [22]:
eth['close'].plot(figsize=(15, 7))
Out[22]:
<Axes: xlabel='timestamp'>
As you can see, we're able to pull data from the internet with just a few lines, create a DataFrame and plot it all within Jupyter Lab.
In [23]:
eth.head()
Out[23]:
open | high | low | close | volume | |
---|---|---|---|---|---|
timestamp | |||||
2023-12-01 00:00:00+00:00 | 2053.1 | 2054.3 | 2053.1 | 2053.8 | 98.816206 |
2023-12-01 00:01:00+00:00 | 2054.0 | 2054.0 | 2052.9 | 2052.9 | 101.136155 |
2023-12-01 00:02:00+00:00 | 2052.6 | 2052.6 | 2051.2 | 2051.2 | 284.303968 |
2023-12-01 00:03:00+00:00 | 2051.5 | 2052.3 | 2051.3 | 2052.3 | 67.405196 |
2023-12-01 00:04:00+00:00 | 2052.2 | 2052.6 | 2052.1 | 2052.1 | 96.918102 |