Statement of Completion#478cc12c
Intro to Pandas for Data Analysis
easy
DataFrames practice: working with English Words
Resolution
Activities
In [61]:
import pandas as pd
In [62]:
df = pd.read_csv('words.csv', index_col='Word')
In [63]:
df.head()
Out[63]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [64]:
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 172821 entries, aa to zyzzyvas Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Char Count 172821 non-null int64 1 Value 172821 non-null int64 dtypes: int64(2) memory usage: 4.0+ MB
In [65]:
df.shape
Out[65]:
(172821, 2)
What is the value of the word microspectrophotometries
?¶
In [66]:
df.loc["microspectrophotometries"]
Out[66]:
Char Count 24 Value 317 Name: microspectrophotometries, dtype: int64
In [67]:
df.loc["microspectrophotometries","Value"] #df.loc[Index,Column]
Out[67]:
317
What is the highest possible value of a word?¶
In [68]:
df['Value'].max()
Out[68]:
319
In [69]:
df.describe()
Out[69]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
Which of the following words have a Char Count of 15
?¶
In [70]:
df[df['Char Count'] == 15]
Out[70]:
Char Count | Value | |
---|---|---|
Word | ||
absorbabilities | 15 | 143 |
abstractionisms | 15 | 182 |
abstractionists | 15 | 189 |
acanthocephalan | 15 | 122 |
acceptabilities | 15 | 134 |
... | ... | ... |
worthlessnesses | 15 | 220 |
wrongheadedness | 15 | 161 |
xerographically | 15 | 174 |
xeroradiography | 15 | 184 |
zoogeographical | 15 | 158 |
3192 rows × 2 columns
In [71]:
df.loc[[
"superheterodyne",
"microbrew",
"enfold",
"glowing",
"pinfish"
]]
Out[71]:
Char Count | Value | |
---|---|---|
Word | ||
superheterodyne | 15 | 198 |
microbrew | 9 | 106 |
enfold | 6 | 56 |
glowing | 7 | 87 |
pinfish | 7 | 81 |
In [72]:
df.loc[[
"superheterodyne",
"microbrew",
"enfold",
"glowing",
"pinfish"
]].values
Out[72]:
array([[ 15, 198], [ 9, 106], [ 6, 56], [ 7, 87], [ 7, 81]])
In [73]:
df.loc[[
"superheterodyne",
"microbrew",
"enfold",
"glowing",
"pinfish"
],"Value"]
Out[73]:
Word superheterodyne 198 microbrew 106 enfold 56 glowing 87 pinfish 81 Name: Value, dtype: int64
What is the highest possible length of a word?¶
In [74]:
df['Char Count'].max()
Out[74]:
28
What is the word with the value of 319
?¶
In [75]:
df[df['Value'] == 319]
Out[75]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
In [76]:
df.sort_values(by=['Value'],ascending=False)
Out[76]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
microspectrophotometries | 24 | 317 |
microspectrophotometry | 22 | 309 |
microspectrophotometers | 23 | 308 |
immunoelectrophoretically | 25 | 307 |
... | ... | ... |
aba | 3 | 4 |
baa | 3 | 4 |
ba | 2 | 3 |
ab | 2 | 3 |
aa | 2 | 2 |
172821 rows × 2 columns
In [77]:
df.loc[df['Value']==319]
Out[77]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
What is the most common value?¶
In [78]:
df.describe()
Out[78]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
In [79]:
df.mode()
Out[79]:
Char Count | Value | |
---|---|---|
0 | 8 | 93 |
In [80]:
df["Value"].value_counts().head()
Out[80]:
Value 93 1965 100 1921 95 1915 99 1907 92 1902 Name: count, dtype: int64
In [81]:
df.loc[df['Value']==93].head()
Out[81]:
Char Count | Value | |
---|---|---|
Word | ||
abandoners | 10 | 93 |
ablations | 9 | 93 |
aboiteaus | 9 | 93 |
abridgment | 10 | 93 |
abstracted | 10 | 93 |
In [82]:
df.loc[df['Value']==93].sample(10)
Out[82]:
Char Count | Value | |
---|---|---|
Word | ||
crispened | 9 | 93 |
befriending | 11 | 93 |
nargilehs | 9 | 93 |
abstracted | 10 | 93 |
bobsledding | 11 | 93 |
unplait | 7 | 93 |
epoxies | 7 | 93 |
completed | 9 | 93 |
demerits | 8 | 93 |
hexyls | 6 | 93 |
What is the shortest word with value 274
?¶
In [83]:
df.loc[df['Value']==274]
Out[83]:
Char Count | Value | |
---|---|---|
Word | ||
countercountermeasure | 21 | 274 |
overprotectivenesses | 20 | 274 |
psychophysiologically | 21 | 274 |
In [84]:
df.loc[df['Value']==274].sort_values(by=['Char Count'])
Out[84]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
countercountermeasure | 21 | 274 |
psychophysiologically | 21 | 274 |
In [85]:
df.loc[
(df['Value']==274) &
(df['Char Count']==df.loc[df['Value']==274,"Char Count"].min())
]
Out[85]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
Create a column Ratio
which represents the 'Value Ratio' of a word¶
In [86]:
df["Ratio"] = df["Value"] / df["Char Count"]
In [87]:
df.head()
Out[87]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
aa | 2 | 2 | 1.000000 |
aah | 3 | 10 | 3.333333 |
aahed | 5 | 19 | 3.800000 |
aahing | 6 | 40 | 6.666667 |
aahs | 4 | 29 | 7.250000 |
What is the maximum value of Ratio
?¶
In [88]:
df["Ratio"].max()
Out[88]:
22.5
What word is the one with the highest Ratio
?¶
In [89]:
df.loc[df['Ratio']==df["Ratio"].max()]
Out[89]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
xu | 2 | 45 | 22.5 |
How many words have a Ratio
of 10
?¶
In [107]:
df.loc[df['Ratio']==10.0].shape
Out[107]:
(2604, 3)
In [109]:
df.query("Ratio==10.0").shape
Out[109]:
(2604, 3)
What is the maximum Value
of all the words with a Ratio
of 10
?¶
In [96]:
df.loc[df['Ratio']==10.0].sort_values(by=['Value'],ascending=False).head()
Out[96]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
electrocardiographically | 24 | 240 | 10.0 |
electroencephalographies | 24 | 240 | 10.0 |
electroencephalographer | 23 | 230 | 10.0 |
electrodesiccation | 18 | 180 | 10.0 |
phonocardiographic | 18 | 180 | 10.0 |
Of those words with a Value
of 260
, what is the lowest Char Count
found?¶
In [99]:
df.loc[df["Value"]==260].sort_values(by=['Char Count']).head()
Out[99]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
hydroxytryptamine | 17 | 260 | 15.294118 |
neuropsychologists | 18 | 260 | 14.444444 |
psychophysiologist | 18 | 260 | 14.444444 |
revolutionarinesses | 19 | 260 | 13.684211 |
countermobilizations | 20 | 260 | 13.000000 |
Based on the previous task, what word is it?¶
In [105]:
df.loc[(df["Value"]==260) & (df["Char Count"]==17)]
Out[105]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
hydroxytryptamine | 17 | 260 | 15.294118 |