Statement of Completion#07bbfb13
Intro to Pandas for Data Analysis
medium
Exploring DataFrames with Currency Data
Resolution
Activities
Project.ipynb
In [1]:
import pandas as pd
df = pd.read_csv("currencies.csv")
df
Out[1]:
Name | Symbol | Code | Countries | Digits | Number | |
---|---|---|---|---|---|---|
0 | UK Pound | £ | GBP | Guernsey,Isle Of Man,Jersey,United Kingdom Of ... | 2.0 | 826.0 |
1 | Czech Koruna | Kč | CZK | Czechia | 2.0 | 203.0 |
2 | Latvian Lat | Ls | LVL | NaN | NaN | NaN |
3 | Swiss Franc | CHF | CHF | Liechtenstein,Switzerland | 2.0 | 756.0 |
4 | Croatian Kruna | kn | HRK | Croatia | 2.0 | 191.0 |
5 | Danish Krone | kr | DKK | Denmark,Faroe Islands (The),Greenland | 2.0 | 208.0 |
6 | Korean Won | ₩ | KRW | Korea (The Republic Of) | 0.0 | 410.0 |
7 | Swedish Krona | kr | SEK | Sweden | 2.0 | 752.0 |
8 | Turkish Lira | ₤ | TRY | Turkey | 2.0 | 949.0 |
9 | Hungarian Forint | Ft | HUF | Hungary | 2.0 | 348.0 |
10 | Brazilian Real | R$ | BRL | Brazil | 2.0 | 986.0 |
11 | Lithuanian Litas | Lt | LTL | NaN | NaN | NaN |
12 | Bulgarian Lev | лB | BGN | Bulgaria | 2.0 | 975.0 |
13 | Polish Zloty | zł | PLN | Poland | 2.0 | 985.0 |
14 | US Dollar | $ | USD | American Samoa,Bonaire, Sint Eustatius And Sab... | 2.0 | 840.0 |
15 | Russian Ruble | руб | RUB | Russian Federation (The) | 2.0 | 643.0 |
16 | Japanese Yen | ¥ | JPY | Japan | 0.0 | 392.0 |
17 | Romanian New Leu | lei | RON | Romania | 2.0 | 946.0 |
18 | Norwegian Krone | kr | NOK | Bouvet Island,Norway,Svalbard And Jan Mayen | 2.0 | 578.0 |
19 | Australian Dollar | $ | AUD | Australia,Christmas Island,Cocos (Keeling) Isl... | 2.0 | 36.0 |
20 | Israeli Shekel | ₪ | ILS | Israel | 2.0 | 376.0 |
21 | New Zealand Dollar | $ | NZD | Cook Islands (The),New Zealand,Niue,Pitcairn,T... | 2.0 | 554.0 |
22 | Indonesian Rupiah | Rp | IDR | Indonesia | 2.0 | 360.0 |
23 | Philippine Peso | ₱ | PHP | Philippines (The) | 2.0 | 608.0 |
24 | Euro | € | EUR | Andorra,Austria,Belgium,Cyprus,Estonia,Europea... | 2.0 | 978.0 |
25 | Canadian Dollar | $ | CAD | Canada | 2.0 | 124.0 |
26 | Chinese Yuan | ¥ | CNY | China | 2.0 | 156.0 |
27 | Hong Kong Dollar | $ | HKD | Hong Kong | 2.0 | 344.0 |
28 | Indian Rupee | ₨ | INR | Bhutan,India | 2.0 | 356.0 |
29 | Mexican Peso | $ | MXN | Mexico | 2.0 | 484.0 |
30 | Malaysian Ringgit | RM | MYR | Malaysia | 2.0 | 458.0 |
31 | South African Rand | R | ZAR | Lesotho,Namibia,South Africa | 2.0 | 710.0 |
32 | Thai Baht | ฿ | THB | Thailand | 2.0 | 764.0 |
33 | Singaporean Dollar | $ | SGD | Singapore | 2.0 | 702.0 |
Activities¶
1. Getting an Overview of the dataset¶
In [2]:
# try your code here
df.dtypes
Out[2]:
Name object Symbol object Code object Countries object Digits float64 Number float64 dtype: object
2. Select the correct ansewrs for dataset information¶
In [3]:
# try your code here
df['Digits'].mean(), df['Digits'].std()
Out[3]:
(1.875, 0.4918693768379647)
3. Get the statistics of the dataset¶
In [4]:
# try your code here
Out[4]:
34
4. Count the number of unique currencies in the dataset¶
In [ ]:
# try your code here
5. Identify the number of missing values in each column¶
In [6]:
# try your code here
sum(df.isna().sum())
Out[6]:
6
6. Determine the highest currency number in the dataset¶
In [7]:
# try your code here
df['Number'].max()
Out[7]:
986.0
7. Select Currency Names¶
In [8]:
# try your code here
names = df['Name']
8. Get the details of the 3rd row¶
In [10]:
df.head()
Out[10]:
Name | Symbol | Code | Countries | Digits | Number | |
---|---|---|---|---|---|---|
0 | UK Pound | £ | GBP | Guernsey,Isle Of Man,Jersey,United Kingdom Of ... | 2.0 | 826.0 |
1 | Czech Koruna | Kč | CZK | Czechia | 2.0 | 203.0 |
2 | Latvian Lat | Ls | LVL | NaN | NaN | NaN |
3 | Swiss Franc | CHF | CHF | Liechtenstein,Switzerland | 2.0 | 756.0 |
4 | Croatian Kruna | kn | HRK | Croatia | 2.0 | 191.0 |
In [11]:
# try your code here
row_3 = df.iloc[2]
row_3
Out[11]:
Name Latvian Lat Symbol Ls Code LVL Countries NaN Digits NaN Number NaN Name: 2, dtype: object
9. Select rows 10th to 15th (inclusive) from the DataFrame¶
In [13]:
# try your code here
rows = df.iloc[9:15]
10. Extract Alternating Rows from DataFrame¶
In [15]:
# try your code here
rows_every_other = df.iloc[0:len(df):2]
11. Select specific columns¶
In [19]:
df.head(7)
Out[19]:
Name | Symbol | Code | Countries | Digits | Number | |
---|---|---|---|---|---|---|
0 | UK Pound | £ | GBP | Guernsey,Isle Of Man,Jersey,United Kingdom Of ... | 2.0 | 826.0 |
1 | Czech Koruna | Kč | CZK | Czechia | 2.0 | 203.0 |
2 | Latvian Lat | Ls | LVL | NaN | NaN | NaN |
3 | Swiss Franc | CHF | CHF | Liechtenstein,Switzerland | 2.0 | 756.0 |
4 | Croatian Kruna | kn | HRK | Croatia | 2.0 | 191.0 |
5 | Danish Krone | kr | DKK | Denmark,Faroe Islands (The),Greenland | 2.0 | 208.0 |
6 | Korean Won | ₩ | KRW | Korea (The Republic Of) | 0.0 | 410.0 |
In [27]:
# try your code here
cols = df.iloc[:, [2,4,5]]
cols
Out[27]:
Code | Digits | Number | |
---|---|---|---|
0 | GBP | 2.0 | 826.0 |
1 | CZK | 2.0 | 203.0 |
2 | LVL | NaN | NaN |
3 | CHF | 2.0 | 756.0 |
4 | HRK | 2.0 | 191.0 |
5 | DKK | 2.0 | 208.0 |
6 | KRW | 0.0 | 410.0 |
7 | SEK | 2.0 | 752.0 |
8 | TRY | 2.0 | 949.0 |
9 | HUF | 2.0 | 348.0 |
10 | BRL | 2.0 | 986.0 |
11 | LTL | NaN | NaN |
12 | BGN | 2.0 | 975.0 |
13 | PLN | 2.0 | 985.0 |
14 | USD | 2.0 | 840.0 |
15 | RUB | 2.0 | 643.0 |
16 | JPY | 0.0 | 392.0 |
17 | RON | 2.0 | 946.0 |
18 | NOK | 2.0 | 578.0 |
19 | AUD | 2.0 | 36.0 |
20 | ILS | 2.0 | 376.0 |
21 | NZD | 2.0 | 554.0 |
22 | IDR | 2.0 | 360.0 |
23 | PHP | 2.0 | 608.0 |
24 | EUR | 2.0 | 978.0 |
25 | CAD | 2.0 | 124.0 |
26 | CNY | 2.0 | 156.0 |
27 | HKD | 2.0 | 344.0 |
28 | INR | 2.0 | 356.0 |
29 | MXN | 2.0 | 484.0 |
30 | MYR | 2.0 | 458.0 |
31 | ZAR | 2.0 | 710.0 |
32 | THB | 2.0 | 764.0 |
33 | SGD | 2.0 | 702.0 |
12. Select first three columns¶
In [26]:
# try your code here
cols_first_three = df.loc[:, ['Name', 'Symbol']]
cols_first_three
Out[26]:
Name | Symbol | |
---|---|---|
0 | UK Pound | £ |
1 | Czech Koruna | Kč |
2 | Latvian Lat | Ls |
3 | Swiss Franc | CHF |
4 | Croatian Kruna | kn |
5 | Danish Krone | kr |
6 | Korean Won | ₩ |
7 | Swedish Krona | kr |
8 | Turkish Lira | ₤ |
9 | Hungarian Forint | Ft |
10 | Brazilian Real | R$ |
11 | Lithuanian Litas | Lt |
12 | Bulgarian Lev | лB |
13 | Polish Zloty | zł |
14 | US Dollar | $ |
15 | Russian Ruble | руб |
16 | Japanese Yen | ¥ |
17 | Romanian New Leu | lei |
18 | Norwegian Krone | kr |
19 | Australian Dollar | $ |
20 | Israeli Shekel | ₪ |
21 | New Zealand Dollar | $ |
22 | Indonesian Rupiah | Rp |
23 | Philippine Peso | ₱ |
24 | Euro | € |
25 | Canadian Dollar | $ |
26 | Chinese Yuan | ¥ |
27 | Hong Kong Dollar | $ |
28 | Indian Rupee | ₨ |
29 | Mexican Peso | $ |
30 | Malaysian Ringgit | RM |
31 | South African Rand | R |
32 | Thai Baht | ฿ |
33 | Singaporean Dollar | $ |
The End!¶
Strip Solutions.ipynb
In [1]:
import re
import json
from pathlib import Path
In [2]:
pattern = re.compile("(#*)\s*")
In [3]:
def is_control_cell(source):
control_lines = ["solution", "assertion"]
for control_line in control_lines:
if source.lower().startswith(control_line):
return True
return False
Configurations:
In [4]:
CHECK_OVERWRITE = True
SOLUTION_NOTEBOOK_NAME = 'Solution.ipynb'
NEW_NOTEBOOK_NAME = 'Project.ipynb'
In [5]:
assert Path(SOLUTION_NOTEBOOK_NAME).exists(), f"The solution notebook '{SOLUTION_NOTEBOOK_NAME}' doesn't exist"
with open(SOLUTION_NOTEBOOK_NAME) as fp:
notebook = json.load(fp)
cells = notebook['cells']
new_cells = []
control_cut = False
for cell in cells:
cell_type = cell['cell_type']
if control_cut:
if cell_type == 'markdown' and cell['source'] and pattern.match(cell['source'][0]):
control_cut = False
else:
continue
if cell_type == 'markdown' and cell['source']:
if is_control_cell(cell['source'][0]):
control_cut = True
continue
new_cells.append(cell)
notebook['cells'] = new_cells
if Path(NEW_NOTEBOOK_NAME).exists():
answer = input(f"You're about to overwrite {NEW_NOTEBOOK_NAME}. Are you sure? (y/N)") or 'n'
if answer.lower() != 'y':
assert False, "Cancelled."
with open(NEW_NOTEBOOK_NAME, 'w') as fp:
json.dump(notebook, fp, indent=2)
print(f"\nNew notebook saved in: {NEW_NOTEBOOK_NAME}")
New notebook saved in: Project.ipynb
In [ ]:
Solution.ipynb
In [1]:
import pandas as pd
df = pd.read_csv("currencies.csv")
df
Out[1]:
Name | Symbol | Code | Countries | Digits | Number | |
---|---|---|---|---|---|---|
0 | UK Pound | £ | GBP | Guernsey,Isle Of Man,Jersey,United Kingdom Of ... | 2.0 | 826.0 |
1 | Czech Koruna | Kč | CZK | Czechia | 2.0 | 203.0 |
2 | Latvian Lat | Ls | LVL | NaN | NaN | NaN |
3 | Swiss Franc | CHF | CHF | Liechtenstein,Switzerland | 2.0 | 756.0 |
4 | Croatian Kruna | kn | HRK | Croatia | 2.0 | 191.0 |
5 | Danish Krone | kr | DKK | Denmark,Faroe Islands (The),Greenland | 2.0 | 208.0 |
6 | Korean Won | ₩ | KRW | Korea (The Republic Of) | 0.0 | 410.0 |
7 | Swedish Krona | kr | SEK | Sweden | 2.0 | 752.0 |
8 | Turkish Lira | ₤ | TRY | Turkey | 2.0 | 949.0 |
9 | Hungarian Forint | Ft | HUF | Hungary | 2.0 | 348.0 |
10 | Brazilian Real | R$ | BRL | Brazil | 2.0 | 986.0 |
11 | Lithuanian Litas | Lt | LTL | NaN | NaN | NaN |
12 | Bulgarian Lev | лB | BGN | Bulgaria | 2.0 | 975.0 |
13 | Polish Zloty | zł | PLN | Poland | 2.0 | 985.0 |
14 | US Dollar | $ | USD | American Samoa,Bonaire, Sint Eustatius And Sab... | 2.0 | 840.0 |
15 | Russian Ruble | руб | RUB | Russian Federation (The) | 2.0 | 643.0 |
16 | Japanese Yen | ¥ | JPY | Japan | 0.0 | 392.0 |
17 | Romanian New Leu | lei | RON | Romania | 2.0 | 946.0 |
18 | Norwegian Krone | kr | NOK | Bouvet Island,Norway,Svalbard And Jan Mayen | 2.0 | 578.0 |
19 | Australian Dollar | $ | AUD | Australia,Christmas Island,Cocos (Keeling) Isl... | 2.0 | 36.0 |
20 | Israeli Shekel | ₪ | ILS | Israel | 2.0 | 376.0 |
21 | New Zealand Dollar | $ | NZD | Cook Islands (The),New Zealand,Niue,Pitcairn,T... | 2.0 | 554.0 |
22 | Indonesian Rupiah | Rp | IDR | Indonesia | 2.0 | 360.0 |
23 | Philippine Peso | ₱ | PHP | Philippines (The) | 2.0 | 608.0 |
24 | Euro | € | EUR | Andorra,Austria,Belgium,Cyprus,Estonia,Europea... | 2.0 | 978.0 |
25 | Canadian Dollar | $ | CAD | Canada | 2.0 | 124.0 |
26 | Chinese Yuan | ¥ | CNY | China | 2.0 | 156.0 |
27 | Hong Kong Dollar | $ | HKD | Hong Kong | 2.0 | 344.0 |
28 | Indian Rupee | ₨ | INR | Bhutan,India | 2.0 | 356.0 |
29 | Mexican Peso | $ | MXN | Mexico | 2.0 | 484.0 |
30 | Malaysian Ringgit | RM | MYR | Malaysia | 2.0 | 458.0 |
31 | South African Rand | R | ZAR | Lesotho,Namibia,South Africa | 2.0 | 710.0 |
32 | Thai Baht | ฿ | THB | Thailand | 2.0 | 764.0 |
33 | Singaporean Dollar | $ | SGD | Singapore | 2.0 | 702.0 |
Activities¶
1. Getting an Overview of the dataset¶
In [2]:
# try your code here
Solution:
In [3]:
df.head()
Out[3]:
Name | Symbol | Code | Countries | Digits | Number | |
---|---|---|---|---|---|---|
0 | UK Pound | £ | GBP | Guernsey,Isle Of Man,Jersey,United Kingdom Of ... | 2.0 | 826.0 |
1 | Czech Koruna | Kč | CZK | Czechia | 2.0 | 203.0 |
2 | Latvian Lat | Ls | LVL | NaN | NaN | NaN |
3 | Swiss Franc | CHF | CHF | Liechtenstein,Switzerland | 2.0 | 756.0 |
4 | Croatian Kruna | kn | HRK | Croatia | 2.0 | 191.0 |
Assertions:
In [4]:
# Multiple Choice Question
2. Select the correct ansewrs for dataset information¶
In [5]:
# try your code here
Solution:
In [16]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 34 entries, 0 to 33 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 34 non-null object 1 Symbol 34 non-null object 2 Code 34 non-null object 3 Countries 32 non-null object 4 Digits 32 non-null float64 5 Number 32 non-null float64 dtypes: float64(2), object(4) memory usage: 1.7+ KB
Assertions:
In [ ]:
3. Get the statistics of the dataset¶
In [3]:
# try your code here
Solution:
In [17]:
df.describe()
Out[17]:
Digits | Number | |
---|---|---|
count | 32.000000 | 32.000000 |
mean | 1.875000 | 562.437500 |
std | 0.491869 | 290.540313 |
min | 0.000000 | 36.000000 |
25% | 2.000000 | 354.000000 |
50% | 2.000000 | 566.000000 |
75% | 2.000000 | 779.500000 |
max | 2.000000 | 986.000000 |
Assertions:
In [ ]:
4. Count the number of unique currencies in the dataset¶
In [4]:
# try your code here
Solution:
In [ ]:
Assertions:
In [ ]:
5. Identify the number of missing values in each column¶
In [5]:
# try your code here
Solution:
In [6]:
df.isnull().sum()
Out[6]:
Name 0 Symbol 0 Code 0 Countries 2 Digits 2 Number 2 dtype: int64
Assertions:
In [ ]:
6. Determine the highest currency number in the dataset¶
In [6]:
# try your code here
Solution:
In [ ]:
Assertions:
In [ ]:
7. Select Currency Names¶
In [7]:
# try your code here
names = ...
Solution:
In [ ]:
Assertions:
In [ ]:
8. Get the details of the 3rd row¶
In [8]:
# try your code here
row_3 = ...
Solution:
In [ ]:
Assertions:
In [ ]:
9. Select rows 10 to 15 (inclusive) from the DataFrame¶
In [9]:
# try your code here
rows = ...
Solution:
In [7]:
df
Out[7]:
Name | Symbol | Code | Countries | Digits | Number | |
---|---|---|---|---|---|---|
0 | UK Pound | £ | GBP | Guernsey,Isle Of Man,Jersey,United Kingdom Of ... | 2.0 | 826.0 |
1 | Czech Koruna | Kč | CZK | Czechia | 2.0 | 203.0 |
2 | Latvian Lat | Ls | LVL | NaN | NaN | NaN |
3 | Swiss Franc | CHF | CHF | Liechtenstein,Switzerland | 2.0 | 756.0 |
4 | Croatian Kruna | kn | HRK | Croatia | 2.0 | 191.0 |
5 | Danish Krone | kr | DKK | Denmark,Faroe Islands (The),Greenland | 2.0 | 208.0 |
6 | Korean Won | ₩ | KRW | Korea (The Republic Of) | 0.0 | 410.0 |
7 | Swedish Krona | kr | SEK | Sweden | 2.0 | 752.0 |
8 | Turkish Lira | ₤ | TRY | Turkey | 2.0 | 949.0 |
9 | Hungarian Forint | Ft | HUF | Hungary | 2.0 | 348.0 |
10 | Brazilian Real | R$ | BRL | Brazil | 2.0 | 986.0 |
11 | Lithuanian Litas | Lt | LTL | NaN | NaN | NaN |
12 | Bulgarian Lev | лB | BGN | Bulgaria | 2.0 | 975.0 |
13 | Polish Zloty | zł | PLN | Poland | 2.0 | 985.0 |
14 | US Dollar | $ | USD | American Samoa,Bonaire, Sint Eustatius And Sab... | 2.0 | 840.0 |
15 | Russian Ruble | руб | RUB | Russian Federation (The) | 2.0 | 643.0 |
16 | Japanese Yen | ¥ | JPY | Japan | 0.0 | 392.0 |
17 | Romanian New Leu | lei | RON | Romania | 2.0 | 946.0 |
18 | Norwegian Krone | kr | NOK | Bouvet Island,Norway,Svalbard And Jan Mayen | 2.0 | 578.0 |
19 | Australian Dollar | $ | AUD | Australia,Christmas Island,Cocos (Keeling) Isl... | 2.0 | 36.0 |
20 | Israeli Shekel | ₪ | ILS | Israel | 2.0 | 376.0 |
21 | New Zealand Dollar | $ | NZD | Cook Islands (The),New Zealand,Niue,Pitcairn,T... | 2.0 | 554.0 |
22 | Indonesian Rupiah | Rp | IDR | Indonesia | 2.0 | 360.0 |
23 | Philippine Peso | ₱ | PHP | Philippines (The) | 2.0 | 608.0 |
24 | Euro | € | EUR | Andorra,Austria,Belgium,Cyprus,Estonia,Europea... | 2.0 | 978.0 |
25 | Canadian Dollar | $ | CAD | Canada | 2.0 | 124.0 |
26 | Chinese Yuan | ¥ | CNY | China | 2.0 | 156.0 |
27 | Hong Kong Dollar | $ | HKD | Hong Kong | 2.0 | 344.0 |
28 | Indian Rupee | ₨ | INR | Bhutan,India | 2.0 | 356.0 |
29 | Mexican Peso | $ | MXN | Mexico | 2.0 | 484.0 |
30 | Malaysian Ringgit | RM | MYR | Malaysia | 2.0 | 458.0 |
31 | South African Rand | R | ZAR | Lesotho,Namibia,South Africa | 2.0 | 710.0 |
32 | Thai Baht | ฿ | THB | Thailand | 2.0 | 764.0 |
33 | Singaporean Dollar | $ | SGD | Singapore | 2.0 | 702.0 |
In [10]:
rows = df.loc[9:14]
In [11]:
rows
Out[11]:
Name | Symbol | Code | Countries | Digits | Number | |
---|---|---|---|---|---|---|
9 | Hungarian Forint | Ft | HUF | Hungary | 2.0 | 348.0 |
10 | Brazilian Real | R$ | BRL | Brazil | 2.0 | 986.0 |
11 | Lithuanian Litas | Lt | LTL | NaN | NaN | NaN |
12 | Bulgarian Lev | лB | BGN | Bulgaria | 2.0 | 975.0 |
13 | Polish Zloty | zł | PLN | Poland | 2.0 | 985.0 |
14 | US Dollar | $ | USD | American Samoa,Bonaire, Sint Eustatius And Sab... | 2.0 | 840.0 |
Assertions:
In [12]:
assert 'rows' in globals(), "Variable 'rows' does not exist."
assert isinstance(rows, pd.DataFrame), "Variable 'rows' is not a DataFrame."
expected_rows = df.loc[9:14]
try:
pd.testing.assert_frame_equal(rows, expected_rows)
except AssertionError:
raise AssertionError("Variable 'rows' does not contain the correct values.")
finally:
del expected_rows
10. Extract Alternating Rows from DataFrame¶
In [10]:
# try your code here
rows_every_other = ...
Solution:
In [ ]:
Assertions:
In [ ]:
11. Select specific columns¶
In [11]:
# try your code here
cols = ...
Solution:
In [ ]:
Assertions:
In [ ]:
12. Select first three columns¶
In [12]:
# try your code here
cols_first_three = ...
Solution:
In [18]:
cols_first_three = df.iloc[:, :3]
Assertions:
In [19]:
assert 'cols_first_three' in globals(), "Variable 'cols_first_three' does not exist."
assert isinstance(cols_first_three, pd.DataFrame), "Variable 'cols_first_three' is not a DataFrame."
expected_cols_first_three = df.iloc[:, :3]
try:
pd.testing.assert_frame_equal(cols_first_three, expected_cols_first_three)
except AssertionError:
raise AssertionError("Variable 'cols_first_three' does not contain the correct values.")
finally:
del expected_cols_first_three