Statement of Completion#9b8c5270
Intro to Pandas for Data Analysis
easy
Series Practice with World Bank's data
Resolution
Activities
Project.ipynb
Take a look at raw data¶
In [1]:
!head world_data.csv
Country Name,Region Code,Country Code,"GDP, PPP (current international $)"," Population, total ",Population CGR 1960-2015,Internet users (per 100 people),Popltn Largest City % of Urban Pop,"2014 Life expectancy at birth, total (years)","Literacy rate, adult female (% of females ages 15 and above)",Exports of goods and services (% of GDP) Aruba,MA,ABW,," 103,889 ",1.19%,88.7,,75.5,97.5139617919922, Andorra,EU,AND,," 70,473 ",3.06%,96.9,,,, Afghanistan,ME,AFG," 62,912,669,167 "," 32,526,562 ",2.36%,8.3,53.4%,60.4,23.8738498687744,0.073278411818003 Angola,AF,AGO," 184,437,662,368 "," 25,021,974 ",2.87%,12.4,50.0%,52.3,60.744800567627,0.373074223085945 Albania,EU,ALB," 32,663,238,936 "," 2,889,167 ",1.07%,63.3,27.3%,77.8,96.7696914672852,0.271049844901716 Arab World,,ARB," 6,435,291,560,152 "," 392,022,276 ",2.66%,39.5,29.8%,70.6,, United Arab Emirates,ME,ARE," 643,166,288,737 "," 9,156,963 ",8.71%,91.2,30.8%,77.4,95.0763397216797, Argentina,SA,ARG," 882,358,844,160 "," 43,416,755 ",1.36%,69.4,38.1%,76.2,98.1347808837891,0.110578189784346 Armenia,RU,ARM," 25,329,201,238 "," 3,017,712 ",0.88%,58.2,55.2%,74.7,99.73046875,0.297333847463774
In [2]:
import pandas as pd
df = pd.read_csv('world_data.csv')
df
Out[2]:
Country Name | Region Code | Country Code | GDP, PPP (current international $) | Population, total | Population CGR 1960-2015 | Internet users (per 100 people) | Popltn Largest City % of Urban Pop | 2014 Life expectancy at birth, total (years) | Literacy rate, adult female (% of females ages 15 and above) | Exports of goods and services (% of GDP) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | MA | ABW | NaN | 103,889 | 1.19% | 88.7 | NaN | 75.5 | 97.513962 | NaN |
1 | Andorra | EU | AND | NaN | 70,473 | 3.06% | 96.9 | NaN | NaN | NaN | NaN |
2 | Afghanistan | ME | AFG | 62,912,669,167 | 32,526,562 | 2.36% | 8.3 | 53.4% | 60.4 | 23.873850 | 0.073278 |
3 | Angola | AF | AGO | 184,437,662,368 | 25,021,974 | 2.87% | 12.4 | 50.0% | 52.3 | 60.744801 | 0.373074 |
4 | Albania | EU | ALB | 32,663,238,936 | 2,889,167 | 1.07% | 63.3 | 27.3% | 77.8 | 96.769691 | 0.271050 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
259 | Yemen, Rep. | ME | YEM | NaN | 26,832,215 | 3.04% | 25.1 | 31.9% | 63.8 | 54.850632 | NaN |
260 | South Africa | AF | ZAF | 723,515,991,686 | 54,956,920 | 2.11% | 51.9 | 26.4% | 57.2 | 93.428932 | 0.308972 |
261 | Congo, Dem. Rep. | AF | COD | 60,482,256,092 | 77,266,814 | 2.99% | 3.8 | 35.3% | 58.7 | 65.897346 | 0.294904 |
262 | Zambia | AF | ZMB | 62,458,409,612 | 16,211,767 | 3.08% | 21.0 | 32.9% | 60.0 | 80.566971 | NaN |
263 | Zimbabwe | AF | ZWE | 27,984,877,195 | 15,602,751 | 2.62% | 16.4 | 29.7% | 57.5 | 85.285133 | 0.262450 |
264 rows × 11 columns
In [3]:
df.columns
Out[3]:
Index(['Country Name', 'Region Code', 'Country Code', 'GDP, PPP (current international $)', ' Population, total ', 'Population CGR 1960-2015', 'Internet users (per 100 people)', 'Popltn Largest City % of Urban Pop', '2014 Life expectancy at birth, total (years)', 'Literacy rate, adult female (% of females ages 15 and above)', 'Exports of goods and services (% of GDP)'], dtype='object')
Creating a pandas series from a dataframe df
In [4]:
# Converting columns to pandas series
country_name = pd.Series(df['Country Name'])
country_code = pd.Series(df['Country Code'])
population = pd.Series(df[' Population, total '])
gdp = pd.Series(df['GDP, PPP (current international $)'])
internet_users = pd.Series(df['Internet users (per 100 people)'])
life_expectancy = pd.Series(df['2014 Life expectancy at birth, total (years)'])
literacy_rate = pd.Series(df['Literacy rate, adult female (% of females ages 15 and above)'])
exports = pd.Series(df['Exports of goods and services (% of GDP)'])
In [5]:
country_name.head()
Out[5]:
0 Aruba 1 Andorra 2 Afghanistan 3 Angola 4 Albania Name: Country Name, dtype: object
In [6]:
country_code.head()
Out[6]:
0 ABW 1 AND 2 AFG 3 AGO 4 ALB Name: Country Code, dtype: object
In [7]:
population.head()
Out[7]:
0 103,889 1 70,473 2 32,526,562 3 25,021,974 4 2,889,167 Name: Population, total , dtype: object
In [8]:
gdp.head()
Out[8]:
0 NaN 1 NaN 2 62,912,669,167 3 184,437,662,368 4 32,663,238,936 Name: GDP, PPP (current international $), dtype: object
In [9]:
internet_users.head()
Out[9]:
0 88.7 1 96.9 2 8.3 3 12.4 4 63.3 Name: Internet users (per 100 people), dtype: float64
In [10]:
life_expectancy.head()
Out[10]:
0 75.5 1 NaN 2 60.4 3 52.3 4 77.8 Name: 2014 Life expectancy at birth, total (years), dtype: float64
In [11]:
literacy_rate.head()
Out[11]:
0 97.513962 1 NaN 2 23.873850 3 60.744801 4 96.769691 Name: Literacy rate, adult female (% of females ages 15 and above), dtype: float64
In [12]:
exports.head()
Out[12]:
0 NaN 1 NaN 2 0.073278 3 0.373074 4 0.271050 Name: Exports of goods and services (% of GDP), dtype: float64
Activities¶
1: What is the data type?¶
In [ ]:
# try to find the dtype of `country_name`
2: What is the size of the series?¶
In [ ]:
# try to get the shape of `gdp`
3: What is the data type?¶
In [ ]:
# try to get the dtype of `internet_users`
4: What is the value of the first element?¶
In [ ]:
# try to get the value of the first element in the `population` series
5: What is the value of the last element?¶
In [ ]:
# try to get the value of the last element in the `life_expectancy` series
6: What is the value of the element with index 29?¶
In [7]:
import pandas as pd
df = pd.read_csv('world_data.csv')
df.loc[29,'Literacy rate, adult female (% of females ages 15 and above)']
Out[7]:
95.4420318603516
7: What is the value of the last element in the series?¶
In [13]:
# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df.tail(1)[['GDP, PPP (current international $)']]
Out[13]:
GDP, PPP (current international $) | |
---|---|
263 | 27,984,877,195 |
8: What is the mean of the series?¶
In [5]:
# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df['Internet users (per 100 people)'].describe()
Out[5]:
count 248.000000 mean 47.557258 std 27.690496 min 1.100000 25% 21.950000 50% 46.850000 75% 71.475000 max 98.300000 Name: Internet users (per 100 people), dtype: float64
9: What is the standard deviation?¶
In [ ]:
# try your code here
10: What is the median of the series?¶
In [10]:
# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df['Exports of goods and services (% of GDP)'].median()
Out[10]:
0.30183071080490154
11: What is the minimum value of the series?¶
In [13]:
# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
df['2014 Life expectancy at birth, total (years)'].min()
Out[13]:
48.9
12: What is the average literacy rate?¶
In [14]:
# try to find the average literacy rate
import pandas as pd
df = pd.read_csv('world_data.csv')
df['Literacy rate, adult female (% of females ages 15 and above)'].mean()
Out[14]:
80.91936549162253
Sorting¶
13: Sort the series in ascending order¶
In [5]:
# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
country_name = df['Country Name']
country_name_sorted = country_name.sort_values(ascending=True)
14: Sort multiple series at once¶
In [8]:
# try your code here
import pandas as pd
df = pd.read_csv('world_data.csv')
literacy_rate = df['Literacy rate, adult female (% of females ages 15 and above)']
country_name = df['Country Name']
literacy_rate_sorted = literacy_rate.sort_values()
country_name_sorted_by_literacy_rate = country_name[literacy_rate_sorted.index]
The End!¶
In [ ]: