Statement of Completion#315f9eeb
Intro to Pandas for Data Analysis
easy
DataFrames practice: working with English Words
Resolution
Activities
In [2]:
import pandas as pd
In [3]:
df = pd.read_csv('words.csv', index_col='Word')
In [4]:
df.head()
Out[4]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [7]:
df.shape
Out[7]:
(172821, 2)
What is the value of the word microspectrophotometries
?¶
In [9]:
df.loc['microspectrophotometries']
Out[9]:
Char Count 24 Value 317 Name: microspectrophotometries, dtype: int64
What is the highest possible value of a word?¶
In [13]:
df['Value'].max()
Out[13]:
319
In [14]:
df.describe()
Out[14]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
Which of the following words have a Char Count of 15
?¶
In [16]:
df.loc[
'enfold'
'pinfish'
'superheterodyne'
'glowing'
'microbrew'
]
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key) 3804 try: -> 3805 return self._engine.get_loc(casted_key) 3806 except KeyError as err: File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc() File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'enfoldpinfishsuperheterodyneglowingmicrobrew' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[16], line 1 ----> 1 df.loc[ 2 'enfold' 3 'pinfish' 4 'superheterodyne' 5 'glowing' 6 'microbrew' 7 ] File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1191, in _LocationIndexer.__getitem__(self, key) 1189 maybe_callable = com.apply_if_callable(key, self.obj) 1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable) -> 1191 return self._getitem_axis(maybe_callable, axis=axis) File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1431, in _LocIndexer._getitem_axis(self, key, axis) 1429 # fall thru to straight lookup 1430 self._validate_key(key, axis) -> 1431 return self._get_label(key, axis=axis) File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1381, in _LocIndexer._get_label(self, label, axis) 1379 def _get_label(self, label, axis: AxisInt): 1380 # GH#5567 this will fail if the label is not present in the axis. -> 1381 return self.obj.xs(label, axis=axis) File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:4301, in NDFrame.xs(self, key, axis, level, drop_level) 4299 new_index = index[loc] 4300 else: -> 4301 loc = index.get_loc(key) 4303 if isinstance(loc, np.ndarray): 4304 if loc.dtype == np.bool_: File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3807 if isinstance(casted_key, slice) or ( 3808 isinstance(casted_key, abc.Iterable) 3809 and any(isinstance(x, slice) for x in casted_key) 3810 ): 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. 3817 self._check_indexing_error(key) KeyError: 'enfoldpinfishsuperheterodyneglowingmicrobrew'
In [ ]:
What is the highest possible length of a word?¶
In [17]:
df.loc[df['Value']==319]
Out[17]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
What is the word with the value of 319
?¶
In [18]:
df.loc[df['Value']==274].sort_values(by='Char Count')
Out[18]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
countercountermeasure | 21 | 274 |
psychophysiologically | 21 | 274 |
What is the most common value?¶
In [20]:
df.loc[df['Value']==274,'Char Count'].min()
Out[20]:
20
What is the shortest word with value 274
?¶
In [ ]:
Create a column Ratio
which represents the 'Value Ratio' of a word¶
In [21]:
df.head()
Out[21]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
In [25]:
df['Ratio'] = df['Value']/df['Char Count']
df['Ratio'].max()
Out[25]:
22.5
What is the maximum value of Ratio
?¶
In [28]:
df['Ratio'].max()
Out[28]:
22.5
What word is the one with the highest Ratio
?¶
In [35]:
df.sort_values(by='Ratio',ascending = False)
Out[35]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
xu | 2 | 45 | 22.500000 |
muzzy | 5 | 111 | 22.200000 |
wry | 3 | 66 | 22.000000 |
xyst | 4 | 88 | 22.000000 |
tux | 3 | 65 | 21.666667 |
... | ... | ... | ... |
ba | 2 | 3 | 1.500000 |
baba | 4 | 6 | 1.500000 |
aba | 3 | 4 | 1.333333 |
baa | 3 | 4 | 1.333333 |
aa | 2 | 2 | 1.000000 |
172821 rows × 3 columns
How many words have a Ratio
of 10
?¶
In [37]:
df['Ratio'].value_counts()
df.loc[df['Ratio']==10].shape
Out[37]:
(2604, 3)
What is the maximum Value
of all the words with a Ratio
of 10
?¶
In [39]:
df.loc[df['Ratio']==10,'Value'].max()
Out[39]:
240
Of those words with a Value
of 260
, what is the lowest Char Count
found?¶
In [40]:
df.query('Value==260').sort_values(by='Char Count')
Out[40]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
hydroxytryptamine | 17 | 260 | 15.294118 |
neuropsychologists | 18 | 260 | 14.444444 |
psychophysiologist | 18 | 260 | 14.444444 |
revolutionarinesses | 19 | 260 | 13.684211 |
countermobilizations | 20 | 260 | 13.000000 |
underrepresentations | 20 | 260 | 13.000000 |
Based on the previous task, what word is it?¶
In [ ]: