Take a look at raw data¶

In [1]:

!head world_data.csv

Country Name,Region Code,Country Code,"GDP, PPP (current international $)"," Population, total ",Population CGR 1960-2015,Internet users (per 100 people),Popltn Largest City % of Urban Pop,"2014 Life expectancy at birth, total (years)","Literacy rate, adult female (% of females ages 15 and above)",Exports of goods and services (% of GDP)
Aruba,MA,ABW,," 103,889 ",1.19%,88.7,,75.5,97.5139617919922,
Andorra,EU,AND,," 70,473 ",3.06%,96.9,,,,
Afghanistan,ME,AFG," 62,912,669,167 "," 32,526,562 ",2.36%,8.3,53.4%,60.4,23.8738498687744,0.073278411818003
Angola,AF,AGO," 184,437,662,368 "," 25,021,974 ",2.87%,12.4,50.0%,52.3,60.744800567627,0.373074223085945
Albania,EU,ALB," 32,663,238,936 "," 2,889,167 ",1.07%,63.3,27.3%,77.8,96.7696914672852,0.271049844901716
Arab World,,ARB," 6,435,291,560,152 "," 392,022,276 ",2.66%,39.5,29.8%,70.6,,
United Arab Emirates,ME,ARE," 643,166,288,737 "," 9,156,963 ",8.71%,91.2,30.8%,77.4,95.0763397216797,
Argentina,SA,ARG," 882,358,844,160 "," 43,416,755 ",1.36%,69.4,38.1%,76.2,98.1347808837891,0.110578189784346
Armenia,RU,ARM," 25,329,201,238 "," 3,017,712 ",0.88%,58.2,55.2%,74.7,99.73046875,0.297333847463774

In [2]:

import pandas as pd
df = pd.read_csv('world_data.csv')
df

Out[2]:

	Country Name	Region Code	Country Code	GDP, PPP (current international $)	Population, total	Population CGR 1960-2015	Internet users (per 100 people)	Popltn Largest City % of Urban Pop	2014 Life expectancy at birth, total (years)	Literacy rate, adult female (% of females ages 15 and above)	Exports of goods and services (% of GDP)
0	Aruba	MA	ABW	NaN	103,889	1.19%	88.7	NaN	75.5	97.513962	NaN
1	Andorra	EU	AND	NaN	70,473	3.06%	96.9	NaN	NaN	NaN	NaN
2	Afghanistan	ME	AFG	62,912,669,167	32,526,562	2.36%	8.3	53.4%	60.4	23.873850	0.073278
3	Angola	AF	AGO	184,437,662,368	25,021,974	2.87%	12.4	50.0%	52.3	60.744801	0.373074
4	Albania	EU	ALB	32,663,238,936	2,889,167	1.07%	63.3	27.3%	77.8	96.769691	0.271050
...	...	...	...	...	...	...	...	...	...	...	...
259	Yemen, Rep.	ME	YEM	NaN	26,832,215	3.04%	25.1	31.9%	63.8	54.850632	NaN
260	South Africa	AF	ZAF	723,515,991,686	54,956,920	2.11%	51.9	26.4%	57.2	93.428932	0.308972
261	Congo, Dem. Rep.	AF	COD	60,482,256,092	77,266,814	2.99%	3.8	35.3%	58.7	65.897346	0.294904
262	Zambia	AF	ZMB	62,458,409,612	16,211,767	3.08%	21.0	32.9%	60.0	80.566971	NaN
263	Zimbabwe	AF	ZWE	27,984,877,195	15,602,751	2.62%	16.4	29.7%	57.5	85.285133	0.262450

264 rows × 11 columns

In [3]:

df.columns

Out[3]:

Index(['Country Name', 'Region Code', 'Country Code',
       'GDP, PPP (current international $)', ' Population, total ',
       'Population CGR 1960-2015', 'Internet users (per 100 people)',
       'Popltn Largest City % of Urban Pop',
       '2014 Life expectancy at birth, total (years)',
       'Literacy rate, adult female (% of females ages 15 and above)',
       'Exports of goods and services (% of GDP)'],
      dtype='object')

Creating a pandas series from a dataframe df

In [4]:

# Converting columns to pandas series
country_name = pd.Series(df['Country Name'])
country_code = pd.Series(df['Country Code'])
population = pd.Series(df[' Population, total '])
gdp = pd.Series(df['GDP, PPP (current international $)'])
internet_users = pd.Series(df['Internet users (per 100 people)'])
life_expectancy = pd.Series(df['2014 Life expectancy at birth, total (years)'])
literacy_rate = pd.Series(df['Literacy rate, adult female (% of females ages 15 and above)'])
exports = pd.Series(df['Exports of goods and services (% of GDP)'])

In [5]:

country_name.head()

Out[5]:

0          Aruba
1        Andorra
2    Afghanistan
3         Angola
4        Albania
Name: Country Name, dtype: object

In [6]:

country_code.head()

Out[6]:

0    ABW
1    AND
2    AFG
3    AGO
4    ALB
Name: Country Code, dtype: object

In [7]:

population.head()

Out[7]:

0        103,889 
1         70,473 
2     32,526,562 
3     25,021,974 
4      2,889,167 
Name:  Population, total , dtype: object

In [8]:

gdp.head()

Out[8]:

0                  NaN
1                  NaN
2      62,912,669,167 
3     184,437,662,368 
4      32,663,238,936 
Name: GDP, PPP (current international $), dtype: object

In [9]:

internet_users.head()

Out[9]:

0    88.7
1    96.9
2     8.3
3    12.4
4    63.3
Name: Internet users (per 100 people), dtype: float64

In [10]:

life_expectancy.head()

Out[10]:

0    75.5
1     NaN
2    60.4
3    52.3
4    77.8
Name: 2014 Life expectancy at birth, total (years), dtype: float64

In [11]:

literacy_rate.head()

Out[11]:

0    97.513962
1          NaN
2    23.873850
3    60.744801
4    96.769691
Name: Literacy rate, adult female (% of females ages 15 and above), dtype: float64

In [12]:

exports.head()

Out[12]:

0         NaN
1         NaN
2    0.073278
3    0.373074
4    0.271050
Name: Exports of goods and services (% of GDP), dtype: float64

Activities¶

1: What is the data type?¶

In [ ]:

# try to find the dtype of `country_name`

2: What is the size of the series?¶

In [ ]:

# try to get the shape of `gdp`

3: What is the data type?¶

In [ ]:

# try to get the dtype of `internet_users`

4: What is the value of the first element?¶

In [ ]:

# try to get the value of the first element in the `population` series

5: What is the value of the last element?¶

In [ ]:

# try to get the value of the last element in the `life_expectancy` series

6: What is the value of the element with index 29?¶

In [7]:

import pandas as pd
df = pd.read_csv('world_data.csv')
df.loc[29,'Literacy rate, adult female (% of females ages 15 and above)']

Out[7]:

95.4420318603516

7: What is the value of the last element in the series?¶

In [13]:

# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df.tail(1)[['GDP, PPP (current international $)']]

Out[13]:

	GDP, PPP (current international $)
263	27,984,877,195

8: What is the mean of the series?¶

In [5]:

# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df['Internet users (per 100 people)'].describe()

Out[5]:

count    248.000000
mean      47.557258
std       27.690496
min        1.100000
25%       21.950000
50%       46.850000
75%       71.475000
max       98.300000
Name: Internet users (per 100 people), dtype: float64

9: What is the standard deviation?¶

In [ ]:

# try your code here

10: What is the median of the series?¶

In [10]:

# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df['Exports of goods and services (% of GDP)'].median()

Out[10]:

0.30183071080490154

11: What is the minimum value of the series?¶

In [13]:

# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df['2014 Life expectancy at birth, total (years)'].min()

Out[13]:

48.9

12: What is the average literacy rate?¶

In [14]:

# try to find the average literacy rate
import pandas as pd
df = pd.read_csv('world_data.csv')
df['Literacy rate, adult female (% of females ages 15 and above)'].mean()

Out[14]:

80.91936549162253

Sorting¶

13: Sort the series in ascending order¶

In [5]:

# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
country_name = df['Country Name']
country_name_sorted = country_name.sort_values(ascending=True)

14: Sort multiple series at once¶

In [8]:

# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
literacy_rate = df['Literacy rate, adult female (% of females ages 15 and above)']
country_name = df['Country Name']
literacy_rate_sorted = literacy_rate.sort_values()
country_name_sorted_by_literacy_rate = country_name[literacy_rate_sorted.index]

The End!¶

In [ ]:

Statement of Completion#9b8c5270

Intro to Pandas for Data Analysis

Series Practice with World Bank's data

Take a look at raw data¶

Activities¶

1: What is the data type?¶

2: What is the size of the series?¶

3: What is the data type?¶

4: What is the value of the first element?¶

5: What is the value of the last element?¶

6: What is the value of the element with index 29?¶

7: What is the value of the last element in the series?¶

8: What is the mean of the series?¶

9: What is the standard deviation?¶

10: What is the median of the series?¶

11: What is the minimum value of the series?¶

12: What is the average literacy rate?¶

Sorting¶

13: Sort the series in ascending order¶

14: Sort multiple series at once¶

The End!¶