Ashish Khare has successfully completed this project.

Data Science Tools

easy

4.63

Jupyter Notebook Tutorial

Finished

February 28, 2025 3:12 PM

Elapsed time (min)

Completed activities

Resolution

Activities

Project.ipynb

Notebook

Everything is a cell!¶

And here are some empty cells:

In [ ]:

Multi mode:

In [1]:

2 + 2

Out[1]:

Cell Types¶

This is a markdown cell. It can contain different formatting options as bold, or italic text, code blocks:

def my_function(x, y):
    pass

Or even images:

In [2]:

# this is a code cell
2 + 2

Out[2]:

Now try it yourself:

In [3]:

Turn this cell into a markdown cell

  Cell In[3], line 1
    Turn this cell into a markdown cell
         ^
SyntaxError: invalid syntax

Turn this cell into a code cell

Working with code¶

Execute the following cells in order:

In [ ]:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [ ]:

x = np.linspace(0, 10, 500)
y = np.cumsum(np.random.randn(500, 6), 0)

In [ ]:

plt.figure(figsize=(12, 7))
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left')

Working with data¶

In [ ]:

import pandas as pd

In [ ]:

btc = pd.read_csv('btc.csv', index_col='timestamp', parse_dates=True)

In [ ]:

eth = pd.read_csv('eth.csv', index_col='timestamp', parse_dates=True)

Bitcoin:

In [ ]:

btc.head()

In [ ]:

btc.tail()

In [ ]:

btc.info()

In [ ]:

btc['close'].plot(figsize=(15, 7))

Ether:

In [ ]:

eth.head()

In [ ]:

eth['close'].plot(figsize=(15, 7))

In [ ]:

eth.head()

Activities¶

1) Fill your name in the placeholder cell and execute it¶

In [5]:

name = "Ashish Khare"
name

Out[5]:

'Ashish Khare'

2) Complete the function `add` to produce the right result¶

In [7]:

def add(x, y):
    return x + y

add(2, 3)

Out[7]:

Untitled.ipynb

Notebook

In [ ]:

In [1]:

import pandas as pd

In [22]:

%%time
df = pd.read_csv(
    "/Users/santiagobasulto/code/datawars/datasources/crypto/ethusd_historic.csv", #engine='pyarrow', dtype_backend='pyarrow',
    index_col='timestamp', parse_dates=True)

CPU times: user 41.7 s, sys: 1.09 s, total: 42.8 s
Wall time: 42.8 s

In [23]:

df.head()

Out[23]:

	open	high	low	close	volume
timestamp
2017-09-01 00:00:00+00:00	387.98	387.98	387.98	387.98	16.283653
2017-09-01 00:01:00+00:00	387.99	388.00	387.98	387.98	6.020751
2017-09-01 00:02:00+00:00	387.27	388.00	386.80	388.00	32.201542
2017-09-01 00:03:00+00:00	388.00	388.00	388.00	388.00	16.804457
2017-09-01 00:04:00+00:00	388.00	388.00	388.00	388.00	0.391802

In [24]:

df.tail()

Out[24]:

	open	high	low	close	volume
timestamp
2023-12-09 15:23:00+00:00	2354.4	2354.4	2353.0	2353.0	1.010736
2023-12-09 15:24:00+00:00	2352.7	2352.9	2352.4	2352.4	1.481674
2023-12-09 15:25:00+00:00	2353.1	2353.1	2352.8	2352.8	0.248465
2023-12-09 15:26:00+00:00	2353.6	2354.1	2353.6	2354.1	0.880320
2023-12-09 15:27:00+00:00	2354.5	2354.7	2353.7	2353.7	0.144378

In [25]:

df.shape

Out[25]:

(3298528, 5)

In [26]:

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3298528 entries, 2017-09-01 00:00:00+00:00 to 2023-12-09 15:27:00+00:00
Data columns (total 5 columns):
 #   Column  Dtype  
---  ------  -----  
 0   open    float64
 1   high    float64
 2   low     float64
 3   close   float64
 4   volume  float64
dtypes: float64(5)
memory usage: 151.0 MB

In [27]:

df.head()

Out[27]:

	open	high	low	close	volume
timestamp
2017-09-01 00:00:00+00:00	387.98	387.98	387.98	387.98	16.283653
2017-09-01 00:01:00+00:00	387.99	388.00	387.98	387.98	6.020751
2017-09-01 00:02:00+00:00	387.27	388.00	386.80	388.00	32.201542
2017-09-01 00:03:00+00:00	388.00	388.00	388.00	388.00	16.804457
2017-09-01 00:04:00+00:00	388.00	388.00	388.00	388.00	0.391802

In [28]:

df.loc["2023-12-01":].to_csv('eth.csv')

In [21]:

!ls -lah btc.csv

-rw-r--r--  1 santiagobasulto  staff   914K Dec 12 18:34 btc.csv

In [4]:

!head btcusd_historic.csv

timestamp,open,high,low,close,volume
2020-01-01 00:00:00+00:00,7160.69,7160.69,7159.64,7159.64,5.50169101
2020-01-01 00:01:00+00:00,7161.51,7161.51,7155.09,7161.2,3.77692446
2020-01-01 00:02:00+00:00,7158.82,7158.82,7158.82,7158.82,0.02927792
2020-01-01 00:03:00+00:00,7158.82,7158.82,7156.9,7156.9,0.06581935
2020-01-01 00:04:00+00:00,7158.5,7158.5,7154.97,7157.2,0.97138671
2020-01-01 00:05:00+00:00,7156.52,7159.51,7150.1,7158.5,0.88693164
2020-01-01 00:06:00+00:00,7157.22,7158.54,7153.01,7153.01,0.10981518
2020-01-01 00:07:00+00:00,7154.11,7154.11,7154.11,7154.11,0.9111215
2020-01-01 00:08:00+00:00,7153.67,7159.25,7153.67,7159.25,0.35482069

In [5]:

!tail btcusd_historic.csv

2023-12-07 14:32:00+00:00,43413.0,43437.0,43413.0,43437.0,1.4995582
2023-12-07 14:33:00+00:00,43425.0,43437.0,43425.0,43427.0,0.10291892
2023-12-07 14:34:00+00:00,43431.0,43443.0,43429.0,43443.0,0.21622808
2023-12-07 14:35:00+00:00,43449.0,43475.0,43449.0,43475.0,0.33804039
2023-12-07 14:36:00+00:00,43484.0,43491.0,43448.0,43449.0,0.27382568
2023-12-07 14:37:00+00:00,43447.0,43447.0,43413.0,43413.0,0.08949208
2023-12-07 14:38:00+00:00,43435.0,43442.0,43422.0,43434.0,0.05946911
2023-12-07 14:39:00+00:00,43433.0,43433.0,43415.0,43415.0,0.04544965
2023-12-07 14:40:00+00:00,43441.0,43441.0,43423.0,43423.0,0.02397361
2023-12-07 14:42:00+00:00,43434.0,43434.0,43434.0,43434.0,0.0

In [ ]:

In [16]:

df = pd.read_csv("ethusd_historic.csv", dtype_backend="pyarrow", parse_dates=['timestamp'])
df.head()

Out[16]:

	timestamp	open	high	low	close	volume
0	2017-09-01 00:00:00+00:00	387.98	387.98	387.98	387.98	16.283653
1	2017-09-01 00:01:00+00:00	387.99	388.00	387.98	387.98	6.020751
2	2017-09-01 00:02:00+00:00	387.27	388.00	386.80	388.00	32.201542
3	2017-09-01 00:03:00+00:00	388.00	388.00	388.00	388.00	16.804457
4	2017-09-01 00:04:00+00:00	388.00	388.00	388.00	388.00	0.391802

In [6]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2068722 entries, 0 to 2068721
Data columns (total 6 columns):
 #   Column     Dtype              
---  ------     -----              
 0   timestamp  datetime64[ns, UTC]
 1   open       double[pyarrow]    
 2   high       double[pyarrow]    
 3   low        double[pyarrow]    
 4   close      double[pyarrow]    
 5   volume     double[pyarrow]    
dtypes: datetime64[ns, UTC](1), double[pyarrow](5)
memory usage: 94.7 MB

In [7]:

df.head()

Out[7]:

	timestamp	open	high	low	close	volume
0	2020-01-01 00:00:00+00:00	7160.69	7160.69	7159.64	7159.64	5.501691
1	2020-01-01 00:01:00+00:00	7161.51	7161.51	7155.09	7161.20	3.776924
2	2020-01-01 00:02:00+00:00	7158.82	7158.82	7158.82	7158.82	0.029278
3	2020-01-01 00:03:00+00:00	7158.82	7158.82	7156.90	7156.90	0.065819
4	2020-01-01 00:04:00+00:00	7158.50	7158.50	7154.97	7157.20	0.971387

In [8]:

df.set_index('timestamp', inplace=True)

In [9]:

df.head()

Out[9]:

	open	high	low	close	volume
timestamp
2020-01-01 00:00:00+00:00	7160.69	7160.69	7159.64	7159.64	5.501691
2020-01-01 00:01:00+00:00	7161.51	7161.51	7155.09	7161.20	3.776924
2020-01-01 00:02:00+00:00	7158.82	7158.82	7158.82	7158.82	0.029278
2020-01-01 00:03:00+00:00	7158.82	7158.82	7156.90	7156.90	0.065819
2020-01-01 00:04:00+00:00	7158.50	7158.50	7154.97	7157.20	0.971387

In [11]:

df.index.is_monotonic_increasing

Out[11]:

True

In [15]:

df['high'].resample("D").max().plot(figsize=(14, 7))

Out[15]:

<Axes: xlabel='timestamp'>

No description has been provided for this image

Exercises.ipynb

Notebook

rmotr

Exercises¶

This is a quick demonstration of how exercises work on Notebooks.ai.

green-divider

Example 1: Solve the add function¶

Complete the code of the function add, that receives 2 numbers and should return the sum of them:

In [ ]:

def add(x, y):
    pass

You can use the + operator to sum numbers.

In [ ]:

def add(x, y):
    return x + y

purple-divider

Original.ipynb

Notebook

Welcome to our Jupyter Tutorial. This is an interactive tutorial focused on getting you familiarized with Notebooks and interactive labs.

Part 1: everything is a cell¶

Jupyter Notebooks are organized as a set of "cells". Each cell can contain different types of content: like Python code (or R, Julia, etc), images or even human readable text (markdown), like the one you're currently reading.

I've left a couple of empty cells below for you to see them:

In [ ]:

This is another cell containing Markdown (human readable) code. And below, another empty cell:

In [ ]:

You can edit these cells just by double clicking on them. Try editing the following cell:

👉 Double click on me 👈

When you double click the cell, it should open an "edit mode", and you should see something similar to:

If you're seeing those asterisks, it's because you've correctly entered "Edit Mode". Once you've made the changes, you have to "execute", or "run" the cell to reflect the changes. To do that just click on the little play button on the top menu bar:

Jupyter notebooks are optimized for an efficient workflow. There are many keyboard shortcuts that will let you interact with your documents, run code and make other changes; mastering these shortcuts will speed up your work. For example, there are two shortcuts to execute a cell:

shift + return: Run cell and advance to the next one.
ctrl + return: Run the cell but don't change focus.

Try them with the following cell:

In [ ]:

2 + 2

You can try executing these cells as many times as you want, it won't break anything

`ctrl + Return` effect:¶

As you can see in the following animation, the code is correctly executed (it returns 4) and the focus (the blue line at the left side of the cell) stays in the same cell.

ctrl+enter effect

Now compare it to the next shortcut, shift + return:

`shift + Return` effect:¶

shift+enter effect

As you can see, every time I execute code the focus changes to the cell below.

green-divider

Part 2: Working with code¶

Jupyter notebooks have amazing features to include text and images and create beautiful, human readable documents as you've just seen. But their main benefit is working with code. Now we're going to import a few libraries and start experimenting with Python code. We've already done the simple 2 + 2 before, so let's do something a little bit more interesting. First, we need to import numpy and matplotlib:

In [1]:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Notebooks.ai include all the most popular Data Science and Deep Learning libraries already installed. And even if there's one missing, you can always install it in your own environment (more on that later). We've just imported these two libraries:

numpy the most popular Python library for array manipulation and numeric computing
matplotlib the most popular visualization library in the Python ecosystem.

Let's now execute a few lines of code and generate some plots:

In [2]:

x = np.linspace(0, 10, 500)
y = np.cumsum(np.random.randn(500, 6), 0)

In [3]:

plt.figure(figsize=(12, 7))
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left')

Out[3]:

<matplotlib.legend.Legend at 0x12187e690>

But what is that 😱? Just random generated datapoints, but you can clearly see how simple is to do numeric processing and plotting with Notebooks.ai.

green-divider

Part 3: Interacting with data¶

Jupyter and Python make it really simple to intereact with files in your local storage. These files are securely stored in the cloud and you can access them from anywhere in the world.

As an example, we're going to load two CSV files (that you can preview in this public spreadsheet) into our notebook and play with them.

The CSVs contain price of Bitcoin and Ether for the first two weeks of December 2023, with a granularity of one minute. This is what's usually referred to as OHLC (Open, High, Low, Close).

In [12]:

import pandas as pd

In [13]:

btc = pd.read_csv('btc.csv', index_col='timestamp', parse_dates=True)

In [16]:

eth = pd.read_csv('eth.csv', index_col='timestamp', parse_dates=True)

Bitcoin:

In [17]:

btc.head()

Out[17]:

	open	high	low	close	volume
timestamp
2023-12-01 00:00:00+00:00	37731.0	37741.0	37728.0	37741.0	4.349545
2023-12-01 00:01:00+00:00	37739.0	37739.0	37724.0	37724.0	1.000053
2023-12-01 00:02:00+00:00	37721.0	37721.0	37701.0	37701.0	0.169017
2023-12-01 00:03:00+00:00	37703.0	37704.0	37703.0	37704.0	0.099991
2023-12-01 00:04:00+00:00	37703.0	37703.0	37697.0	37697.0	0.135919

In [18]:

btc.tail()

Out[18]:

	open	high	low	close	volume
timestamp
2023-12-10 10:49:00+00:00	43683.0	43683.0	43683.0	43683.0	0.000000
2023-12-10 10:50:00+00:00	43671.0	43675.0	43671.0	43675.0	0.008716
2023-12-10 10:51:00+00:00	43694.0	43694.0	43694.0	43694.0	0.003141
2023-12-10 10:52:00+00:00	43711.0	43711.0	43709.0	43709.0	0.029046
2023-12-10 10:53:00+00:00	43709.0	43709.0	43709.0	43709.0	0.000000

In [19]:

btc.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 13698 entries, 2023-12-01 00:00:00+00:00 to 2023-12-10 10:53:00+00:00
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    13698 non-null  float64
 1   high    13698 non-null  float64
 2   low     13698 non-null  float64
 3   close   13698 non-null  float64
 4   volume  13698 non-null  float64
dtypes: float64(5)
memory usage: 642.1 KB

In [20]:

btc['close'].plot(figsize=(15, 7))

Out[20]:

<Axes: xlabel='timestamp'>

Ether:

In [21]:

eth.head()

Out[21]:

	open	high	low	close	volume
timestamp
2023-12-01 00:00:00+00:00	2053.1	2054.3	2053.1	2053.8	98.816206
2023-12-01 00:01:00+00:00	2054.0	2054.0	2052.9	2052.9	101.136155
2023-12-01 00:02:00+00:00	2052.6	2052.6	2051.2	2051.2	284.303968
2023-12-01 00:03:00+00:00	2051.5	2052.3	2051.3	2052.3	67.405196
2023-12-01 00:04:00+00:00	2052.2	2052.6	2052.1	2052.1	96.918102

In [22]:

eth['close'].plot(figsize=(15, 7))

Out[22]:

<Axes: xlabel='timestamp'>

As you can see, we're able to pull data from the internet with just a few lines, create a DataFrame and plot it all within Jupyter Lab.

In [23]:

eth.head()

Out[23]:

	open	high	low	close	volume
timestamp
2023-12-01 00:00:00+00:00	2053.1	2054.3	2053.1	2053.8	98.816206
2023-12-01 00:01:00+00:00	2054.0	2054.0	2052.9	2052.9	101.136155
2023-12-01 00:02:00+00:00	2052.6	2052.6	2051.2	2051.2	284.303968
2023-12-01 00:03:00+00:00	2051.5	2052.3	2051.3	2052.3	67.405196
2023-12-01 00:04:00+00:00	2052.2	2052.6	2052.1	2052.1	96.918102

Statement of Completion#6ed2b9b2

Data Science Tools

Jupyter Notebook Tutorial

Everything is a cell!¶

Cell Types¶

Working with code¶

Working with data¶

Activities¶

1) Fill your name in the placeholder cell and execute it¶

2) Complete the function add to produce the right result¶

Exercises¶

Example 1: Solve the add function¶

Part 1: everything is a cell¶

ctrl + Return effect:¶

shift + Return effect:¶

Part 2: Working with code¶

Part 3: Interacting with data¶

2) Complete the function `add` to produce the right result¶

`ctrl + Return` effect:¶

`shift + Return` effect:¶