Statement of Completion#8676a763
Intro to Pandas for Data Analysis
easy
Practicing Series Filtering with S&P500 and Census Data
Resolution
Activities
In [1]:
import pandas as pd
In [2]:
# for visualizations, don't worry about these for now
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
Datasets¶
Age of First Marriage¶
In [3]:
age_marriage = pd.read_csv("age_at_mar.csv", index_col=0).squeeze("columns")
age_marriage.head()
Out[3]:
1 32 2 25 3 24 4 26 5 32 Name: age, dtype: int64
In [4]:
age_marriage.shape
Out[4]:
(5534,)
In [5]:
age_marriage.info()
<class 'pandas.core.series.Series'> Index: 5534 entries, 1 to 5534 Series name: age Non-Null Count Dtype -------------- ----- 5534 non-null int64 dtypes: int64(1) memory usage: 86.5 KB
In [ ]:
In [6]:
fig, ax = plt.subplots(figsize=(14, 7))
sns.histplot(age_marriage, ax=ax)
Out[6]:
<Axes: xlabel='age', ylabel='Count'>
S&P Returns 1990's¶
In [7]:
sp500 = pd.read_csv('SP500.csv', index_col=0).squeeze("columns")
sp500.head()
Out[7]:
1 -0.258891 2 -0.865031 3 -0.980414 4 0.450432 5 -1.185667 Name: dat, dtype: float64
In [8]:
sp500.shape
Out[8]:
(2780,)
In [9]:
fig, ax = plt.subplots(figsize=(14, 7))
sns.histplot(sp500, ax=ax)
Out[9]:
<Axes: xlabel='dat', ylabel='Count'>
Activities¶
1. Rename the series¶
In [10]:
age_marriage.name="Age of First Marriage"
sp500.name="S&P500 Returns 90s"
Basic Analysis¶
2. What's the maximum Age of marriage?¶
In [11]:
age_marriage.max()
Out[11]:
43
3. What's the median Age of Marriage?¶
In [12]:
age_marriage.median()
Out[12]:
23.0
4. What's the minimum return from S&P500?¶
In [13]:
sp500.min()
Out[13]:
-7.11274461287603
Simple Selection and Filtering¶
5. How many Women marry at age 21?¶
In [14]:
age_marriage.loc[age_marriage == 21].shape
Out[14]:
(495,)
6. How many Women marry at 39y/o or older?¶
In [15]:
age_marriage.loc[age_marriage >= 39].shape
Out[15]:
(39,)
7. How many positive S&P500 returns are there?¶
The following visualization shows a red vertical line at the point 0
, we're looking for everything at the right of that line:
In [16]:
ax = sns.histplot(sp500)
ax.axvline(0, color='red')
Out[16]:
<matplotlib.lines.Line2D at 0x7f068fd3e710>
In [19]:
sp500.loc[sp500 > 0].shape
Out[19]:
(1474,)
8. How many returns are less or equals than -2?¶
(Left to the red line)
In [20]:
ax = sns.histplot(sp500)
ax.axvline(-2, color='red')
Out[20]:
<matplotlib.lines.Line2D at 0x7f068d2544d0>
In [21]:
sp500.loc[sp500 <= -2].shape
Out[21]:
(63,)
Advanced Selection with Boolean Operators¶
9. Select all women below 20 or above 39¶
The segments depicted below:
In [22]:
fig, ax = plt.subplots(figsize=(14, 7))
sns.histplot(age_marriage, ax=ax)
ax.add_patch(Rectangle((10, 0), 9, 450, alpha=.3, color='red'))
ax.add_patch(Rectangle((39, 0), 5, 450, alpha=.3, color='red'))
Out[22]:
<matplotlib.patches.Rectangle at 0x7f068d1f5e10>
In [25]:
age_20_39 = age_marriage.loc[(age_marriage < 20) | (age_marriage > 39)]
print(age_20_39)
14 40 35 19 74 19 76 17 84 19 .. 5517 16 5520 18 5527 19 5531 19 5534 19 Name: Age of First Marriage, Length: 1206, dtype: int64
10. Select all women whose ages are even, and are older than 30 y/o¶
In [27]:
age_30_even = age_marriage.loc[(age_marriage > 30) & (age_marriage % 2 == 0)]
print(age_30_even)
1 32 5 32 14 40 24 34 55 32 .. 5477 32 5488 40 5510 32 5516 34 5528 34 Name: Age of First Marriage, Length: 172, dtype: int64
10. Select the S&P500 returns between 1.5 and 3¶
The ones depicted below:
In [29]:
fig, ax = plt.subplots(figsize=(14, 7))
sns.histplot(sp500, ax=ax)
ax.add_patch(Rectangle((1, 0), 1.5, 250, alpha=.3, color='red'))
Out[29]:
<matplotlib.patches.Rectangle at 0x7f068cb11510>
In [30]:
sp_15_to_3 = sp500.loc[(sp500 > 1.5) & (sp500 < 3)]
print(sp_15_to_3)
21 1.871048 91 2.351291 102 1.697397 188 1.673790 189 2.863366 ... 2715 2.199155 2738 2.174014 2748 2.318141 2765 1.941508 2775 2.409438 Name: S&P500 Returns 90s, Length: 123, dtype: float64
In [ ]: