Statement of Completion#8a17a928
Intro to Pandas for Data Analysis
easy
DataFrames practice: working with English Words
Resolution
Activities
In [1]:
import pandas as pd
In [13]:
df = pd.read_csv('words.csv',index_col="Word")
In [14]:
df.head()
Out[14]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [7]:
df.shape
Out[7]:
(172821, 3)
In [8]:
df.info
Out[8]:
<bound method DataFrame.info of Word Char Count Value 0 aa 2 2 1 aah 3 10 2 aahed 5 19 3 aahing 6 40 4 aahs 4 29 ... ... ... ... 172816 zymotic 7 111 172817 zymurgies 9 143 172818 zymurgy 7 135 172819 zyzzyva 7 151 172820 zyzzyvas 8 170 [172821 rows x 3 columns]>
In [ ]:
In [ ]:
What is the value of the word microspectrophotometries
?¶
In [17]:
df.loc["microspectrophotometries"]
Out[17]:
Char Count 24 Value 317 Name: microspectrophotometries, dtype: int64
In [ ]:
What is the highest possible value of a word?¶
In [18]:
df.max()
Out[18]:
Char Count 28 Value 319 dtype: int64
In [19]:
df.describe()
Out[19]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
In [ ]:
Which of the following words have a Char Count of 7
and a Value of 87
?¶
In [26]:
df.loc[[
"microbrew",
"pinfish",
"enfold",
"superheterodyne"
,
"glowing"
]]
Out[26]:
Char Count | Value | |
---|---|---|
Word | ||
microbrew | 9 | 106 |
pinfish | 7 | 81 |
enfold | 6 | 56 |
superheterodyne | 15 | 198 |
glowing | 7 | 87 |
In [ ]:
What is the highest possible length of a word?¶
In [36]:
df.loc[df['Value']==319]
Out[36]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
In [ ]:
In [40]:
dt=df.loc[['Value']==274]
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[40], line 1 ----> 1 dt=df.loc[['Value']==274] File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1191, in _LocationIndexer.__getitem__(self, key) 1189 maybe_callable = com.apply_if_callable(key, self.obj) 1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable) -> 1191 return self._getitem_axis(maybe_callable, axis=axis) File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1430, in _LocIndexer._getitem_axis(self, key, axis) 1427 return self.obj.iloc[tuple(indexer)] 1429 # fall thru to straight lookup -> 1430 self._validate_key(key, axis) 1431 return self._get_label(key, axis=axis) File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1239, in _LocIndexer._validate_key(self, key, axis) 1232 ax = self.obj._get_axis(axis) 1233 if isinstance(key, bool) and not ( 1234 is_bool_dtype(ax.dtype) 1235 or ax.dtype.name == "boolean" 1236 or isinstance(ax, MultiIndex) 1237 and is_bool_dtype(ax.get_level_values(0).dtype) 1238 ): -> 1239 raise KeyError( 1240 f"{key}: boolean label can not be used without a boolean index" 1241 ) 1243 if isinstance(key, slice) and ( 1244 isinstance(key.start, bool) or isinstance(key.stop, bool) 1245 ): 1246 raise TypeError(f"{key}: boolean values can not be used in a slice") KeyError: 'False: boolean label can not be used without a boolean index'
In [ ]:
What is the word with the value of 319
?¶
In [39]:
df.loc[df['Value']==319 & df['Char Count']==23]
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ipykernel_19/2958559696.py in ?() ----> 1 df.loc[df['Value']==319 & df['Char Count']==23] /usr/local/lib/python3.11/site-packages/pandas/core/generic.py in ?(self) 1575 @final 1576 def __nonzero__(self) -> NoReturn: -> 1577 raise ValueError( 1578 f"The truth value of a {type(self).__name__} is ambiguous. " 1579 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1580 ) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
What is the most common value?¶
In [46]:
df.loc[df["Value"]==274].sort_values(by="Char Count")
Out[46]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
countercountermeasure | 21 | 274 |
psychophysiologically | 21 | 274 |
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
What is the shortest word with value 274
?¶
In [ ]:
Create a column Ratio
which represents the 'Value Ratio' of a word¶
In [ ]:
What is the maximum value of Ratio
?¶
In [47]:
df.describe()
Out[47]:
Char Count | Value | |
---|---|---|
count | 172821.000000 | 172821.000000 |
mean | 9.087628 | 107.754179 |
std | 2.818285 | 39.317452 |
min | 2.000000 | 2.000000 |
25% | 7.000000 | 80.000000 |
50% | 9.000000 | 103.000000 |
75% | 11.000000 | 131.000000 |
max | 28.000000 | 319.000000 |
In [48]:
df["Ratio"]=df["Value"]/df["Char Count"]
In [49]:
df
Out[49]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
aa | 2 | 2 | 1.000000 |
aah | 3 | 10 | 3.333333 |
aahed | 5 | 19 | 3.800000 |
aahing | 6 | 40 | 6.666667 |
aahs | 4 | 29 | 7.250000 |
... | ... | ... | ... |
zymotic | 7 | 111 | 15.857143 |
zymurgies | 9 | 143 | 15.888889 |
zymurgy | 7 | 135 | 19.285714 |
zyzzyva | 7 | 151 | 21.571429 |
zyzzyvas | 8 | 170 | 21.250000 |
172821 rows × 3 columns
In [ ]:
What word is the one with the highest Ratio
?¶
In [52]:
df["Ratio"].max()
Out[52]:
22.5
In [57]:
df.loc[df["Ratio"]==10]
Out[57]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
aardwolf | 8 | 80 | 10.0 |
abatements | 10 | 100 | 10.0 |
abducts | 7 | 70 | 10.0 |
abetment | 8 | 80 | 10.0 |
abettals | 8 | 80 | 10.0 |
... | ... | ... | ... |
ycleped | 7 | 70 | 10.0 |
yodeled | 7 | 70 | 10.0 |
zamia | 5 | 50 | 10.0 |
zebecs | 6 | 60 | 10.0 |
zwieback | 8 | 80 | 10.0 |
2604 rows × 3 columns
In [63]:
df.loc[df["Ratio"]==10].sort_values(by="Value",ascending=False)
Out[63]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
electrocardiographically | 24 | 240 | 10.0 |
electroencephalographies | 24 | 240 | 10.0 |
electroencephalographer | 23 | 230 | 10.0 |
phonocardiographic | 18 | 180 | 10.0 |
inconceivabilities | 18 | 180 | 10.0 |
... | ... | ... | ... |
web | 3 | 30 | 10.0 |
bug | 3 | 30 | 10.0 |
elm | 3 | 30 | 10.0 |
as | 2 | 20 | 10.0 |
oe | 2 | 20 | 10.0 |
2604 rows × 3 columns
In [64]:
df.loc[df["Value"]==260].sort_values(by="Char Count")
Out[64]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
hydroxytryptamine | 17 | 260 | 15.294118 |
neuropsychologists | 18 | 260 | 14.444444 |
psychophysiologist | 18 | 260 | 14.444444 |
revolutionarinesses | 19 | 260 | 13.684211 |
countermobilizations | 20 | 260 | 13.000000 |
underrepresentations | 20 | 260 | 13.000000 |
In [65]:
df["Char Count"].describe()
Out[65]:
count 172821.000000 mean 9.087628 std 2.818285 min 2.000000 25% 7.000000 50% 9.000000 75% 11.000000 max 28.000000 Name: Char Count, dtype: float64
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
How many words have a Ratio
of 10
?¶
In [ ]:
What is the maximum Value
of all the words with a Ratio
of 10
?¶
In [ ]:
Of those words with a Value
of 260
, what is the lowest Char Count
found?¶
In [ ]:
Based on the previous task, what word is it?¶
In [ ]: