Statement of Completion#688ba205
Intro to Pandas for Data Analysis
easy
DataFrames practice: working with English Words
Resolution
Activities
In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('words.csv', index_col='Word')
In [3]:
df.head()
Out[3]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 172821 entries, aa to zyzzyvas Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Char Count 172821 non-null int64 1 Value 172821 non-null int64 dtypes: int64(2) memory usage: 4.0+ MB
What is the value of the word microspectrophotometries
?¶
In [5]:
df.shape
Out[5]:
(172821, 2)
What is the highest possible value of a word?¶
In [6]:
df.loc['microspectrophotometries']
Out[6]:
Char Count 24 Value 317 Name: microspectrophotometries, dtype: int64
Which of the following words have a Char Count of 15
?¶
In [7]:
df.max()
Out[7]:
Char Count 28 Value 319 dtype: int64
What is the highest possible length of a word?¶
In [11]:
df.describe()
Out[11]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
What is the word with the value of 319
?¶
In [12]:
df.sort_values(by=['Value'], ascending=False)
Out[12]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
microspectrophotometries | 24 | 317 |
microspectrophotometry | 22 | 309 |
microspectrophotometers | 23 | 308 |
immunoelectrophoretically | 25 | 307 |
... | ... | ... |
aba | 3 | 4 |
baa | 3 | 4 |
ba | 2 | 3 |
ab | 2 | 3 |
aa | 2 | 2 |
172821 rows × 2 columns
In [15]:
df.loc[df['Value'] == 319]
Out[15]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
In [16]:
(df['Value'] == 319)
Out[16]:
Word aa False aah False aahed False aahing False aahs False ... zymotic False zymurgies False zymurgy False zyzzyva False zyzzyvas False Name: Value, Length: 172821, dtype: bool
What is the most common value?¶
In [17]:
df['Value'].describe()
Out[17]:
count 172821.000000 mean 107.754179 std 39.317452 min 2.000000 25% 80.000000 50% 103.000000 75% 131.000000 max 319.000000 Name: Value, dtype: float64
In [18]:
df['Value'].mode()
Out[18]:
0 93 Name: Value, dtype: int64
In [20]:
df['Value'].value_counts().head()
Out[20]:
Value 93 1965 100 1921 95 1915 99 1907 92 1902 Name: count, dtype: int64
What is the shortest word with value 274
?¶
In [23]:
df.loc[df['Value'] == 274].sort_values(by="Char Count")
Out[23]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
countercountermeasure | 21 | 274 |
psychophysiologically | 21 | 274 |
Create a column Ratio
which represents the 'Value Ratio' of a word¶
In [24]:
df.head()
Out[24]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
In [27]:
df['Ratio'] = df ['Value'] / df['Char Count']
What is the maximum value of Ratio
?¶
In [29]:
df['Ratio'].max()
Out[29]:
22.5
What word is the one with the highest Ratio
?¶
In [34]:
df.sort_values(by='Ratio', ascending=False).head()
Out[34]:
Char Count | Value | Ratioi | Ratio | |
---|---|---|---|---|
Word | ||||
xu | 2 | 45 | 22.500000 | 22.500000 |
muzzy | 5 | 111 | 22.200000 | 22.200000 |
wry | 3 | 66 | 22.000000 | 22.000000 |
xyst | 4 | 88 | 22.000000 | 22.000000 |
tux | 3 | 65 | 21.666667 | 21.666667 |
How many words have a Ratio
of 10
?¶
In [33]:
df.loc[df['Ratio'] == 10]
Out[33]:
Char Count | Value | Ratioi | Ratio | |
---|---|---|---|---|
Word | ||||
aardwolf | 8 | 80 | 10.0 | 10.0 |
abatements | 10 | 100 | 10.0 | 10.0 |
abducts | 7 | 70 | 10.0 | 10.0 |
abetment | 8 | 80 | 10.0 | 10.0 |
abettals | 8 | 80 | 10.0 | 10.0 |
... | ... | ... | ... | ... |
ycleped | 7 | 70 | 10.0 | 10.0 |
yodeled | 7 | 70 | 10.0 | 10.0 |
zamia | 5 | 50 | 10.0 | 10.0 |
zebecs | 6 | 60 | 10.0 | 10.0 |
zwieback | 8 | 80 | 10.0 | 10.0 |
2604 rows × 4 columns
What is the maximum Value
of all the words with a Ratio
of 10
?¶
In [42]:
df.query('Ratio == 10').sort_values(by='Value', ascending=False).head()
Out[42]:
Char Count | Value | Ratioi | Ratio | |
---|---|---|---|---|
Word | ||||
electrocardiographically | 24 | 240 | 10.0 | 10.0 |
electroencephalographies | 24 | 240 | 10.0 | 10.0 |
electroencephalographer | 23 | 230 | 10.0 | 10.0 |
electrodesiccation | 18 | 180 | 10.0 | 10.0 |
phonocardiographic | 18 | 180 | 10.0 | 10.0 |
In [44]:
df.loc[df['Ratio'] == 10, 'Value'].max()
Out[44]:
240
Of those words with a Value
of 260
, what is the lowest Char Count
found?¶
In [45]:
df.loc[df['Value'] == 260].sort_values(by="Char Count")
Out[45]:
Char Count | Value | Ratioi | Ratio | |
---|---|---|---|---|
Word | ||||
hydroxytryptamine | 17 | 260 | 15.294118 | 15.294118 |
neuropsychologists | 18 | 260 | 14.444444 | 14.444444 |
psychophysiologist | 18 | 260 | 14.444444 | 14.444444 |
revolutionarinesses | 19 | 260 | 13.684211 | 13.684211 |
countermobilizations | 20 | 260 | 13.000000 | 13.000000 |
underrepresentations | 20 | 260 | 13.000000 | 13.000000 |
Based on the previous task, what word is it?¶
In [ ]: