Statement of Completion#954df172
Intro to Pandas for Data Analysis
easy
Series Practice with World Bank's data
Resolution
Activities
Take a look at raw data¶
In [1]:
!head world_data.csv
Country Name,Region Code,Country Code,"GDP, PPP (current international $)"," Population, total ",Population CGR 1960-2015,Internet users (per 100 people),Popltn Largest City % of Urban Pop,"2014 Life expectancy at birth, total (years)","Literacy rate, adult female (% of females ages 15 and above)",Exports of goods and services (% of GDP) Aruba,MA,ABW,," 103,889 ",1.19%,88.7,,75.5,97.5139617919922, Andorra,EU,AND,," 70,473 ",3.06%,96.9,,,, Afghanistan,ME,AFG," 62,912,669,167 "," 32,526,562 ",2.36%,8.3,53.4%,60.4,23.8738498687744,0.073278411818003 Angola,AF,AGO," 184,437,662,368 "," 25,021,974 ",2.87%,12.4,50.0%,52.3,60.744800567627,0.373074223085945 Albania,EU,ALB," 32,663,238,936 "," 2,889,167 ",1.07%,63.3,27.3%,77.8,96.7696914672852,0.271049844901716 Arab World,,ARB," 6,435,291,560,152 "," 392,022,276 ",2.66%,39.5,29.8%,70.6,, United Arab Emirates,ME,ARE," 643,166,288,737 "," 9,156,963 ",8.71%,91.2,30.8%,77.4,95.0763397216797, Argentina,SA,ARG," 882,358,844,160 "," 43,416,755 ",1.36%,69.4,38.1%,76.2,98.1347808837891,0.110578189784346 Armenia,RU,ARM," 25,329,201,238 "," 3,017,712 ",0.88%,58.2,55.2%,74.7,99.73046875,0.297333847463774
In [2]:
import pandas as pd
df = pd.read_csv('world_data.csv')
df
Out[2]:
Country Name | Region Code | Country Code | GDP, PPP (current international $) | Population, total | Population CGR 1960-2015 | Internet users (per 100 people) | Popltn Largest City % of Urban Pop | 2014 Life expectancy at birth, total (years) | Literacy rate, adult female (% of females ages 15 and above) | Exports of goods and services (% of GDP) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | MA | ABW | NaN | 103,889 | 1.19% | 88.7 | NaN | 75.5 | 97.513962 | NaN |
1 | Andorra | EU | AND | NaN | 70,473 | 3.06% | 96.9 | NaN | NaN | NaN | NaN |
2 | Afghanistan | ME | AFG | 62,912,669,167 | 32,526,562 | 2.36% | 8.3 | 53.4% | 60.4 | 23.873850 | 0.073278 |
3 | Angola | AF | AGO | 184,437,662,368 | 25,021,974 | 2.87% | 12.4 | 50.0% | 52.3 | 60.744801 | 0.373074 |
4 | Albania | EU | ALB | 32,663,238,936 | 2,889,167 | 1.07% | 63.3 | 27.3% | 77.8 | 96.769691 | 0.271050 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
259 | Yemen, Rep. | ME | YEM | NaN | 26,832,215 | 3.04% | 25.1 | 31.9% | 63.8 | 54.850632 | NaN |
260 | South Africa | AF | ZAF | 723,515,991,686 | 54,956,920 | 2.11% | 51.9 | 26.4% | 57.2 | 93.428932 | 0.308972 |
261 | Congo, Dem. Rep. | AF | COD | 60,482,256,092 | 77,266,814 | 2.99% | 3.8 | 35.3% | 58.7 | 65.897346 | 0.294904 |
262 | Zambia | AF | ZMB | 62,458,409,612 | 16,211,767 | 3.08% | 21.0 | 32.9% | 60.0 | 80.566971 | NaN |
263 | Zimbabwe | AF | ZWE | 27,984,877,195 | 15,602,751 | 2.62% | 16.4 | 29.7% | 57.5 | 85.285133 | 0.262450 |
264 rows × 11 columns
In [3]:
df.columns
Out[3]:
Index(['Country Name', 'Region Code', 'Country Code', 'GDP, PPP (current international $)', ' Population, total ', 'Population CGR 1960-2015', 'Internet users (per 100 people)', 'Popltn Largest City % of Urban Pop', '2014 Life expectancy at birth, total (years)', 'Literacy rate, adult female (% of females ages 15 and above)', 'Exports of goods and services (% of GDP)'], dtype='object')
Creating a pandas series from a dataframe df
In [4]:
# Converting columns to pandas series
country_name = pd.Series(df['Country Name'])
country_code = pd.Series(df['Country Code'])
population = pd.Series(df[' Population, total '])
gdp = pd.Series(df['GDP, PPP (current international $)'])
internet_users = pd.Series(df['Internet users (per 100 people)'])
life_expectancy = pd.Series(df['2014 Life expectancy at birth, total (years)'])
literacy_rate = pd.Series(df['Literacy rate, adult female (% of females ages 15 and above)'])
exports = pd.Series(df['Exports of goods and services (% of GDP)'])
In [5]:
country_name_sorted = country_name.sort_values()
In [10]:
literacy_rate_sorted = 123#literacy_rate.sort_values()
country_name_sorted_by_literacy_rate = 13#country_name[literacy_rate_sorted.index]
In [ ]:
country_name.head()
In [ ]:
country_code.head()
In [ ]:
population.head()
In [ ]:
gdp.head()
In [ ]:
internet_users.head()
In [ ]:
life_expectancy.head()
In [ ]:
literacy_rate.head()
In [ ]:
exports.head()
Activities¶
1: What is the data type?¶
In [ ]:
# try to find the dtype of `country_name`
2: What is the size of the series?¶
In [ ]:
# try to get the shape of `gdp`
3: What is the data type?¶
In [ ]:
# try to get the dtype of `internet_users`
4: What is the value of the first element?¶
In [ ]:
# try to get the value of the first element in the `population` series
5: What is the value of the last element?¶
In [ ]:
# try your code here
6: What is the value of the element with index 29?¶
In [ ]:
# try your code here
7: What is the value of the last element in the series?¶
In [ ]:
# try your code here
8: What is the mean of the series?¶
In [ ]:
# try your code here
9: What is the standard deviation?¶
In [ ]:
# try your code here
10: What is the median of the series?¶
In [ ]:
# try your code here
11: What is the minimum value of the series?¶
In [ ]:
# try your code here
12: What is the average literacy rate?¶
In [ ]:
# try to find the average literacy rate
Sorting¶
13: Sort the series in ascending order¶
In [ ]:
# try your code here
country_name_sorted = ...
14: Sort multiple series at once¶
In [1]:
# try your code here
literacy_rate_sorted = ...
country_name_sorted_by_literacy_rate = ...