Take a look at raw data¶

In [1]:

!head world_data.csv

Country Name,Region Code,Country Code,"GDP, PPP (current international $)"," Population, total ",Population CGR 1960-2015,Internet users (per 100 people),Popltn Largest City % of Urban Pop,"2014 Life expectancy at birth, total (years)","Literacy rate, adult female (% of females ages 15 and above)",Exports of goods and services (% of GDP)
Aruba,MA,ABW,," 103,889 ",1.19%,88.7,,75.5,97.5139617919922,
Andorra,EU,AND,," 70,473 ",3.06%,96.9,,,,
Afghanistan,ME,AFG," 62,912,669,167 "," 32,526,562 ",2.36%,8.3,53.4%,60.4,23.8738498687744,0.073278411818003
Angola,AF,AGO," 184,437,662,368 "," 25,021,974 ",2.87%,12.4,50.0%,52.3,60.744800567627,0.373074223085945
Albania,EU,ALB," 32,663,238,936 "," 2,889,167 ",1.07%,63.3,27.3%,77.8,96.7696914672852,0.271049844901716
Arab World,,ARB," 6,435,291,560,152 "," 392,022,276 ",2.66%,39.5,29.8%,70.6,,
United Arab Emirates,ME,ARE," 643,166,288,737 "," 9,156,963 ",8.71%,91.2,30.8%,77.4,95.0763397216797,
Argentina,SA,ARG," 882,358,844,160 "," 43,416,755 ",1.36%,69.4,38.1%,76.2,98.1347808837891,0.110578189784346
Armenia,RU,ARM," 25,329,201,238 "," 3,017,712 ",0.88%,58.2,55.2%,74.7,99.73046875,0.297333847463774

In [2]:

import pandas as pd
df = pd.read_csv('world_data.csv')
df

Out[2]:

	Country Name	Region Code	Country Code	GDP, PPP (current international $)	Population, total	Population CGR 1960-2015	Internet users (per 100 people)	Popltn Largest City % of Urban Pop	2014 Life expectancy at birth, total (years)	Literacy rate, adult female (% of females ages 15 and above)	Exports of goods and services (% of GDP)
0	Aruba	MA	ABW	NaN	103,889	1.19%	88.7	NaN	75.5	97.513962	NaN
1	Andorra	EU	AND	NaN	70,473	3.06%	96.9	NaN	NaN	NaN	NaN
2	Afghanistan	ME	AFG	62,912,669,167	32,526,562	2.36%	8.3	53.4%	60.4	23.873850	0.073278
3	Angola	AF	AGO	184,437,662,368	25,021,974	2.87%	12.4	50.0%	52.3	60.744801	0.373074
4	Albania	EU	ALB	32,663,238,936	2,889,167	1.07%	63.3	27.3%	77.8	96.769691	0.271050
...	...	...	...	...	...	...	...	...	...	...	...
259	Yemen, Rep.	ME	YEM	NaN	26,832,215	3.04%	25.1	31.9%	63.8	54.850632	NaN
260	South Africa	AF	ZAF	723,515,991,686	54,956,920	2.11%	51.9	26.4%	57.2	93.428932	0.308972
261	Congo, Dem. Rep.	AF	COD	60,482,256,092	77,266,814	2.99%	3.8	35.3%	58.7	65.897346	0.294904
262	Zambia	AF	ZMB	62,458,409,612	16,211,767	3.08%	21.0	32.9%	60.0	80.566971	NaN
263	Zimbabwe	AF	ZWE	27,984,877,195	15,602,751	2.62%	16.4	29.7%	57.5	85.285133	0.262450

264 rows × 11 columns

In [3]:

df.columns

Out[3]:

Index(['Country Name', 'Region Code', 'Country Code',
       'GDP, PPP (current international $)', ' Population, total ',
       'Population CGR 1960-2015', 'Internet users (per 100 people)',
       'Popltn Largest City % of Urban Pop',
       '2014 Life expectancy at birth, total (years)',
       'Literacy rate, adult female (% of females ages 15 and above)',
       'Exports of goods and services (% of GDP)'],
      dtype='object')

Creating a pandas series from a dataframe df

In [4]:

# Converting columns to pandas series
country_name = pd.Series(df['Country Name'])
country_code = pd.Series(df['Country Code'])
population = pd.Series(df[' Population, total '])
gdp = pd.Series(df['GDP, PPP (current international $)'])
internet_users = pd.Series(df['Internet users (per 100 people)'])
life_expectancy = pd.Series(df['2014 Life expectancy at birth, total (years)'])
literacy_rate = pd.Series(df['Literacy rate, adult female (% of females ages 15 and above)'])
exports = pd.Series(df['Exports of goods and services (% of GDP)'])

In [5]:

country_name_sorted = country_name.sort_values()

In [10]:

literacy_rate_sorted = 123#literacy_rate.sort_values()
country_name_sorted_by_literacy_rate = 13#country_name[literacy_rate_sorted.index]

In [ ]:

country_name.head()

In [ ]:

country_code.head()

In [ ]:

population.head()

In [ ]:

gdp.head()

In [ ]:

internet_users.head()

In [ ]:

life_expectancy.head()

In [ ]:

literacy_rate.head()

In [ ]:

exports.head()

Activities¶

1: What is the data type?¶

In [ ]:

# try to find the dtype of `country_name`

2: What is the size of the series?¶

In [ ]:

# try to get the shape of `gdp`

3: What is the data type?¶

In [ ]:

# try to get the dtype of `internet_users`

4: What is the value of the first element?¶

In [ ]:

# try to get the value of the first element in the `population` series

5: What is the value of the last element?¶

In [ ]:

# try your code here

6: What is the value of the element with index 29?¶

In [ ]:

# try your code here

7: What is the value of the last element in the series?¶

In [ ]:

# try your code here

8: What is the mean of the series?¶

In [ ]:

# try your code here

9: What is the standard deviation?¶

In [ ]:

# try your code here

10: What is the median of the series?¶

In [ ]:

# try your code here

11: What is the minimum value of the series?¶

In [ ]:

# try your code here

12: What is the average literacy rate?¶

In [ ]:

# try to find the average literacy rate

Sorting¶

13: Sort the series in ascending order¶

In [ ]:

# try your code here
country_name_sorted = ...

14: Sort multiple series at once¶

In [1]:

# try your code here
literacy_rate_sorted = ...
country_name_sorted_by_literacy_rate = ...

Statement of Completion#954df172

Intro to Pandas for Data Analysis

Series Practice with World Bank's data

Take a look at raw data¶

Activities¶

1: What is the data type?¶

2: What is the size of the series?¶

3: What is the data type?¶

4: What is the value of the first element?¶

5: What is the value of the last element?¶

6: What is the value of the element with index 29?¶

7: What is the value of the last element in the series?¶

8: What is the mean of the series?¶

9: What is the standard deviation?¶

10: What is the median of the series?¶

11: What is the minimum value of the series?¶

12: What is the average literacy rate?¶

Sorting¶

13: Sort the series in ascending order¶

14: Sort multiple series at once¶

The End!¶