Statement of Completion#a703ab06
Intro to Pandas for Data Analysis
easy
DataFrames practice: working with English Words
Resolution
Activities
Project.ipynb
In [20]:
import pandas as pd
In [21]:
df = pd.read_csv('words.csv', index_col='Word')
In [22]:
df.head()
Out[22]:
Char Count | Value | |
---|---|---|
Word | ||
aa | 2 | 2 |
aah | 3 | 10 |
aahed | 5 | 19 |
aahing | 6 | 40 |
aahs | 4 | 29 |
Activities¶
How many elements does this dataframe have?¶
In [23]:
print(df.count())
Char Count 172821 Value 172821 dtype: int64
What is the value of the word microspectrophotometries
?¶
In [25]:
df.loc["microspectrophotometries"]
Out[25]:
Char Count 24 Value 317 Name: microspectrophotometries, dtype: int64
What is the highest possible value of a word?¶
In [26]:
df["Value"].max()
Out[26]:
319
Which of the following words have a Char Count of 15
?¶
In [29]:
words=["pinfish", "microbrew", "superheterodyne", "enfold", "glowing"]
for word in words:
if len(word)==15:
print(word)
superheterodyne
What is the highest possible length of a word?¶
In [30]:
df["Char Count"].max()
Out[30]:
28
What is the word with the value of 319
?¶
In [38]:
df[df["Value"]==319]
Out[38]:
Char Count | Value | |
---|---|---|
Word | ||
reinstitutionalizations | 23 | 319 |
What is the most common value?¶
In [43]:
df["Value"].value_counts().head()
Out[43]:
Value 93 1965 100 1921 95 1915 99 1907 92 1902 Name: count, dtype: int64
What is the shortest word with value 274
?¶
In [45]:
df_specval = df[df["Value"]==274]
df_specval.sort_values("Char Count", ascending=True)
Out[45]:
Char Count | Value | |
---|---|---|
Word | ||
overprotectivenesses | 20 | 274 |
countercountermeasure | 21 | 274 |
psychophysiologically | 21 | 274 |
Create a column Ratio
which represents the 'Value Ratio' of a word¶
In [50]:
df.insert(2,"Ratio",df["Value"]/df["Char Count"])
df
Out[50]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
aa | 2 | 2 | 1.000000 |
aah | 3 | 10 | 3.333333 |
aahed | 5 | 19 | 3.800000 |
aahing | 6 | 40 | 6.666667 |
aahs | 4 | 29 | 7.250000 |
... | ... | ... | ... |
zymotic | 7 | 111 | 15.857143 |
zymurgies | 9 | 143 | 15.888889 |
zymurgy | 7 | 135 | 19.285714 |
zyzzyva | 7 | 151 | 21.571429 |
zyzzyvas | 8 | 170 | 21.250000 |
172821 rows × 3 columns
What is the maximum value of Ratio
?¶
In [52]:
df["Ratio"].describe()
Out[52]:
count 172821.000000 mean 11.825114 std 2.205677 min 1.000000 25% 10.400000 50% 11.875000 75% 13.285714 max 22.500000 Name: Ratio, dtype: float64
What word is the one with the highest Ratio
?¶
In [54]:
df[df["Ratio"]==df["Ratio"].max()]
Out[54]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
xu | 2 | 45 | 22.5 |
How many words have a Ratio
of 10
?¶
In [57]:
df[df["Ratio"]==10].count()
Out[57]:
Char Count 2604 Value 2604 Ratio 2604 dtype: int64
What is the maximum Value
of all the words with a Ratio
of 10
?¶
In [59]:
df[df["Ratio"]==10].sort_values("Value",ascending=False)
Out[59]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
electrocardiographically | 24 | 240 | 10.0 |
electroencephalographies | 24 | 240 | 10.0 |
electroencephalographer | 23 | 230 | 10.0 |
electrodesiccation | 18 | 180 | 10.0 |
phonocardiographic | 18 | 180 | 10.0 |
... | ... | ... | ... |
col | 3 | 30 | 10.0 |
bis | 3 | 30 | 10.0 |
sib | 3 | 30 | 10.0 |
as | 2 | 20 | 10.0 |
oe | 2 | 20 | 10.0 |
2604 rows × 3 columns
Of those words with a Value
of 260
, what is the lowest Char Count
found?¶
In [64]:
df[df["Value"]==260].sort_values("Char Count", ascending=True)
Out[64]:
Char Count | Value | Ratio | |
---|---|---|---|
Word | |||
hydroxytryptamine | 17 | 260 | 15.294118 |
neuropsychologists | 18 | 260 | 14.444444 |
psychophysiologist | 18 | 260 | 14.444444 |
revolutionarinesses | 19 | 260 | 13.684211 |
countermobilizations | 20 | 260 | 13.000000 |
underrepresentations | 20 | 260 | 13.000000 |
Based on the previous task, what word is it?¶
In [66]:
df.loc[df["Value"]==260, "Char Count"].min()
Out[66]:
17
In [67]:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ipykernel_17/738365183.py in ?() ----> 1 df.loc[df["Char Count"]==17 & df["Value"]==260, "Word"] /usr/local/lib/python3.11/site-packages/pandas/core/generic.py in ?(self) 1575 @final 1576 def __nonzero__(self) -> NoReturn: -> 1577 raise ValueError( 1578 f"The truth value of a {type(self).__name__} is ambiguous. " 1579 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1580 ) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In [ ]: