Statement of Completion#b5a037c1
Intro to Pandas for Data Analysis
easy
DataFrames practice: working with English Words
Resolution
Activities
Project.ipynb
In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('words.csv', index_col='Word')
In [3]:
df.head()
Out[3]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [5]:
df.shape
Out[5]:
(172821, 2)
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 172821 entries, aa to zyzzyvas Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Char Count 172821 non-null int64 1 Value 172821 non-null int64 dtypes: int64(2) memory usage: 4.0+ MB
In [7]:
df.count()
Out[7]:
Char Count 172821 Value 172821 dtype: int64
In [ ]:
In [ ]:
What is the value of the word microspectrophotometries
?¶
In [12]:
df[df.index=='microspectrophotometries']
Out[12]:
Char Count | Value | |
---|---|---|
Word | ||
microspectrophotometries | 24 | 317 |
In [14]:
df.loc['microspectrophotometries', 'Char Count']
Out[14]:
24
In [16]:
df.index.max()
Out[16]:
'zyzzyvas'
In [17]:
df['Value'].max()
Out[17]:
319
In [18]:
df.max()
Out[18]:
Char Count 28 Value 319 dtype: int64
In [19]:
df.describe()
Out[19]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [21]:
df.loc[['enfold',
'pinfish',
'superheterodyne',
'microbrew',
'glowing']]
Out[21]:
Char Count | Value | |
---|---|---|
Word | ||
enfold | 6 | 56 |
pinfish | 7 | 81 |
superheterodyne | 15 | 198 |
microbrew | 9 | 106 |
glowing | 7 | 87 |
What is the highest possible value of a word?¶
In [50]:
len(max(df.index, key=len))
Out[50]:
28
Which of the following words have a Char Count of 15
?¶
In [20]:
df[df['Char Count']==15]
Out[20]:
Char Count | Value | |
---|---|---|
Word | ||
absorbabilities | 15 | 143 |
abstractionisms | 15 | 182 |
abstractionists | 15 | 189 |
acanthocephalan | 15 | 122 |
acceptabilities | 15 | 134 |
... | ... | ... |
worthlessnesses | 15 | 220 |
wrongheadedness | 15 | 161 |
xerographically | 15 | 174 |
xeroradiography | 15 | 184 |
zoogeographical | 15 | 158 |
3192 rows × 2 columns
What is the highest possible length of a word?¶
In [22]:
df.index.sort_values
Out[22]:
<bound method Index.sort_values of Index(['aa', 'aah', 'aahed', 'aahing', 'aahs', 'aal', 'aalii', 'aaliis', 'aals', 'aardvark', ... 'zymology', 'zymosan', 'zymosans', 'zymoses', 'zymosis', 'zymotic', 'zymurgies', 'zymurgy', 'zyzzyva', 'zyzzyvas'], dtype='object', name='Word', length=172821)>
What is the word with the value of 319
?¶
In [25]:
df[df['Value']==319]
Out[25]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
What is the most common value?¶
In [30]:
df['Value'].groupby(df['Value']).count().sort_values()
Out[30]:
Value 319 1 317 1 309 1 291 1 2 1 ... 92 1902 99 1907 95 1915 100 1921 93 1965 Name: Value, Length: 303, dtype: int64
What is the shortest word with value 274
?¶
In [ ]:
df.mode()
In [41]:
df[df['Value']==274].sort_values(by='Char Count')
Out[41]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
countercountermeasure | 21 | 274 |
psychophysiologically | 21 | 274 |
In [40]:
df['Value'].value_counts()
Out[40]:
Value 93 1965 100 1921 95 1915 99 1907 92 1902 ... 317 1 304 1 300 1 319 1 278 1 Name: count, Length: 303, dtype: int64
Create a column Ratio
which represents the 'Value Ratio' of a word¶
In [ ]:
df['Ratio']=df['Value']/df['Char Count']
In [55]:
df.sort_values(by='Ratio')
Out[55]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
aa | 2 | 2 | 1.000000 |
aba | 3 | 4 | 1.333333 |
baa | 3 | 4 | 1.333333 |
ab | 2 | 3 | 1.500000 |
abba | 4 | 6 | 1.500000 |
... | ... | ... | ... |
pyx | 3 | 65 | 21.666667 |
xyst | 4 | 88 | 22.000000 |
wry | 3 | 66 | 22.000000 |
muzzy | 5 | 111 | 22.200000 |
xu | 2 | 45 | 22.500000 |
172821 rows × 3 columns
What is the maximum value of Ratio
?¶
In [ ]:
What word is the one with the highest Ratio
?¶
In [61]:
df[(df['Char Count']==17)&(df['Value']==260)]
Out[61]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
hydroxytryptamine | 17 | 260 | 15.294118 |
How many words have a Ratio
of 10
?¶
In [ ]:
What is the maximum Value
of all the words with a Ratio
of 10
?¶
In [ ]:
Of those words with a Value
of 260
, what is the lowest Char Count
found?¶
In [43]:
mean_char = df['Char Count'].mean()
mean_char
Out[43]:
9.087628239623657
Based on the previous task, what word is it?¶
In [48]:
df.query("`Char Count`>@mean_char")
Out[48]:
Char Count | Value | |
---|---|---|
Word | ||
aardwolves | 10 | 120 |
abacterial | 10 | 72 |
abandoners | 10 | 93 |
abandoning | 10 | 81 |
abandonment | 11 | 103 |
... | ... | ... |
zygomorphies | 12 | 176 |
zygomorphy | 10 | 168 |
zygosities | 10 | 154 |
zygospores | 10 | 165 |
zymologies | 10 | 146 |
67582 rows × 2 columns