Statement of Completion#62748e94
Intro to Pandas for Data Analysis
easy
DataFrames practice: working with English Words
Resolution
Activities
Project.ipynb
In [1]:
import pandas as pd
In [4]:
df = pd.read_csv('words.csv', index_col='Word')
In [3]:
df.head()
Out[3]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [10]:
df.shape
Out[10]:
(172821, 2)
What is the value of the word microspectrophotometries
?¶
In [22]:
df.loc['microspectrophotometries','Value']
Out[22]:
317
In [23]:
df.max()
Out[23]:
Char Count 28 Value 319 dtype: int64
In [24]:
df.describe()
Out[24]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
In [25]:
df['Value'].max()
Out[25]:
319
What is the highest possible value of a word?¶
In [26]:
df['Value'].max()
Out[26]:
319
Which of the following words have a Char Count of 15
?¶
In [30]:
df.loc[[
"glowing",
"pinfish",
"superheterodyne",
"enfold",
"microbrew"
],'Char Count']
Out[30]:
Word glowing 7 pinfish 7 superheterodyne 15 enfold 6 microbrew 9 Name: Char Count, dtype: int64
What is the highest possible length of a word?¶
In [29]:
df.describe()
Out[29]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
What is the word with the value of 319
?¶
In [33]:
df.loc[df['Value']==319]
Out[33]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
What is the most common value?¶
In [37]:
df['Value'].mode()
Out[37]:
0 93 Name: Value, dtype: int64
In [39]:
df.loc[df['Value']==93]
Out[39]:
Char Count | Value | |
---|---|---|
Word | ||
abandoners | 10 | 93 |
ablations | 9 | 93 |
aboiteaus | 9 | 93 |
abridgment | 10 | 93 |
abstracted | 10 | 93 |
... | ... | ... |
zinkified | 9 | 93 |
zonule | 6 | 93 |
zoogleal | 8 | 93 |
zorilla | 7 | 93 |
zucchini | 8 | 93 |
1965 rows × 2 columns
In [41]:
df['Value'].value_counts().head()
Out[41]:
Value 93 1965 100 1921 95 1915 99 1907 92 1902 Name: count, dtype: int64
What is the shortest word with value 274
?¶
In [43]:
df.loc[df['Value']==274].sort_values(by='Char Count')
Out[43]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
countercountermeasure | 21 | 274 |
psychophysiologically | 21 | 274 |
In [44]:
df.loc[df['Value']==274,'Char Count'].min()
Out[44]:
20
In [46]:
df.loc[
(df['Value']==274)
&
(df['Char Count']==20)
]
Out[46]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
Create a column Ratio
which represents the 'Value Ratio' of a word¶
In [48]:
df['Ratio']= df['Value']/df['Char Count']
df['Ratio']
Out[48]:
Word aa 1.000000 aah 3.333333 aahed 3.800000 aahing 6.666667 aahs 7.250000 ... zymotic 15.857143 zymurgies 15.888889 zymurgy 19.285714 zyzzyva 21.571429 zyzzyvas 21.250000 Name: Ratio, Length: 172821, dtype: float64
What is the maximum value of Ratio
?¶
In [50]:
df['Ratio'].max()
Out[50]:
22.5
What word is the one with the highest Ratio
?¶
In [51]:
df.loc[df['Ratio']==df['Ratio'].max()]
Out[51]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
xu | 2 | 45 | 22.5 |
How many words have a Ratio
of 10
?¶
In [54]:
df.loc[df['Ratio']==10].shape
Out[54]:
(2604, 3)
What is the maximum Value
of all the words with a Ratio
of 10
?¶
In [56]:
df.loc[
(df['Value'].max())&(df['Ratio']==10)
]
Out[56]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
aardwolf | 8 | 80 | 10.0 |
abatements | 10 | 100 | 10.0 |
abducts | 7 | 70 | 10.0 |
abetment | 8 | 80 | 10.0 |
abettals | 8 | 80 | 10.0 |
... | ... | ... | ... |
ycleped | 7 | 70 | 10.0 |
yodeled | 7 | 70 | 10.0 |
zamia | 5 | 50 | 10.0 |
zebecs | 6 | 60 | 10.0 |
zwieback | 8 | 80 | 10.0 |
2604 rows × 3 columns
In [ ]:
In [59]:
df.loc[df['Ratio']==10].sort_values(by='Value',ascending=False)
Out[59]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
electrocardiographically | 24 | 240 | 10.0 |
electroencephalographies | 24 | 240 | 10.0 |
electroencephalographer | 23 | 230 | 10.0 |
electrodesiccation | 18 | 180 | 10.0 |
phonocardiographic | 18 | 180 | 10.0 |
... | ... | ... | ... |
col | 3 | 30 | 10.0 |
bis | 3 | 30 | 10.0 |
sib | 3 | 30 | 10.0 |
as | 2 | 20 | 10.0 |
oe | 2 | 20 | 10.0 |
2604 rows × 3 columns
Of those words with a Value
of 260
, what is the lowest Char Count
found?¶
In [61]:
df.loc[df['Value']==260].sort_values(by='Char Count')
Out[61]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
hydroxytryptamine | 17 | 260 | 15.294118 |
neuropsychologists | 18 | 260 | 14.444444 |
psychophysiologist | 18 | 260 | 14.444444 |
revolutionarinesses | 19 | 260 | 13.684211 |
countermobilizations | 20 | 260 | 13.000000 |
underrepresentations | 20 | 260 | 13.000000 |
Based on the previous task, what word is it?¶
In [62]:
hydroxytryptamine
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[62], line 1 ----> 1 hydroxytryptamine NameError: name 'hydroxytryptamine' is not defined
In [64]:
mean_char_count = df['Char Count'].mean()
In [65]:
mean_char_count
Out[65]:
9.087628239623657
In [71]:
df[df['Char Count']>df['Char Count'].mean()].sort_values(by='Char Count',ascending = False)
Out[71]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
ethylenediaminetetraacetates | 28 | 287 | 10.250000 |
electroencephalographically | 27 | 269 | 9.962963 |
ethylenediaminetetraacetate | 27 | 268 | 9.925926 |
phosphatidylethanolamines | 25 | 289 | 11.560000 |
immunoelectrophoretically | 25 | 307 | 12.280000 |
... | ... | ... | ... |
zoologists | 10 | 157 | 15.700000 |
zoometries | 10 | 145 | 14.500000 |
zoomorphic | 10 | 138 | 13.800000 |
zoophilies | 10 | 134 | 13.400000 |
zoophilous | 10 | 156 | 15.600000 |
67582 rows × 3 columns
In [ ]: