In [61]:

import pandas as pd

In [62]:

df = pd.read_csv('words.csv', index_col='Word')

In [63]:

df.head()

Out[63]:

	Char Count	Value
Word
aa	2	2
aah	3	10
aahed	5	19
aahing	6	40
aahs	4	29

Activities¶

How many elements does this dataframe have?¶

In [64]:

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 172821 entries, aa to zyzzyvas
Data columns (total 2 columns):
 #   Column      Non-Null Count   Dtype
---  ------      --------------   -----
 0   Char Count  172821 non-null  int64
 1   Value       172821 non-null  int64
dtypes: int64(2)
memory usage: 4.0+ MB

In [65]:

df.shape

Out[65]:

(172821, 2)

What is the value of the word `microspectrophotometries`?¶

In [66]:

df.loc["microspectrophotometries"]

Out[66]:

Char Count     24
Value         317
Name: microspectrophotometries, dtype: int64

In [67]:

df.loc["microspectrophotometries","Value"] #df.loc[Index,Column]

Out[67]:

What is the highest possible value of a word?¶

In [68]:

df['Value'].max()

Out[68]:

In [69]:

df.describe()

Out[69]:

	Char Count	Value
count	172821.000000	172821.000000
mean	9.087628	107.754179
std	2.818285	39.317452
min	2.000000	2.000000
25%	7.000000	80.000000
50%	9.000000	103.000000
75%	11.000000	131.000000
max	28.000000	319.000000

Which of the following words have a Char Count of `15`?¶

In [70]:

df[df['Char Count'] == 15]

Out[70]:

	Char Count	Value
Word
absorbabilities	15	143
abstractionisms	15	182
abstractionists	15	189
acanthocephalan	15	122
acceptabilities	15	134
...	...	...
worthlessnesses	15	220
wrongheadedness	15	161
xerographically	15	174
xeroradiography	15	184
zoogeographical	15	158

3192 rows × 2 columns

In [71]:

df.loc[[
    "superheterodyne",
    "microbrew",
    "enfold",
    "glowing",
    "pinfish"
]]

Out[71]:

	Char Count	Value
Word
superheterodyne	15	198
microbrew	9	106
enfold	6	56
glowing	7	87
pinfish	7	81

In [72]:

df.loc[[
    "superheterodyne",
    "microbrew",
    "enfold",
    "glowing",
    "pinfish"
]].values

Out[72]:

array([[ 15, 198],
       [  9, 106],
       [  6,  56],
       [  7,  87],
       [  7,  81]])

In [73]:

df.loc[[
    "superheterodyne",
    "microbrew",
    "enfold",
    "glowing",
    "pinfish"
],"Value"]

Out[73]:

Word
superheterodyne    198
microbrew          106
enfold              56
glowing             87
pinfish             81
Name: Value, dtype: int64

What is the highest possible length of a word?¶

In [74]:

df['Char Count'].max()

Out[74]:

What is the word with the value of `319`?¶

In [75]:

df[df['Value'] == 319]

Out[75]:

	Char Count	Value
Word
reinstitutionalizations	23	319

In [76]:

df.sort_values(by=['Value'],ascending=False)

Out[76]:

	Char Count	Value
Word
reinstitutionalizations	23	319
microspectrophotometries	24	317
microspectrophotometry	22	309
microspectrophotometers	23	308
immunoelectrophoretically	25	307
...	...	...
aba	3	4
baa	3	4
ba	2	3
ab	2	3
aa	2	2

172821 rows × 2 columns

In [77]:

df.loc[df['Value']==319]

Out[77]:

	Char Count	Value
Word
reinstitutionalizations	23	319

What is the most common value?¶

In [78]:

df.describe()

Out[78]:

	Char Count	Value
count	172821.000000	172821.000000
mean	9.087628	107.754179
std	2.818285	39.317452
min	2.000000	2.000000
25%	7.000000	80.000000
50%	9.000000	103.000000
75%	11.000000	131.000000
max	28.000000	319.000000

In [79]:

df.mode()

Out[79]:

	Char Count	Value
0	8	93

In [80]:

df["Value"].value_counts().head()

Out[80]:

Value
93     1965
100    1921
95     1915
99     1907
92     1902
Name: count, dtype: int64

In [81]:

df.loc[df['Value']==93].head()

Out[81]:

	Char Count	Value
Word
abandoners	10	93
ablations	9	93
aboiteaus	9	93
abridgment	10	93
abstracted	10	93

In [82]:

df.loc[df['Value']==93].sample(10)

Out[82]:

	Char Count	Value
Word
crispened	9	93
befriending	11	93
nargilehs	9	93
abstracted	10	93
bobsledding	11	93
unplait	7	93
epoxies	7	93
completed	9	93
demerits	8	93
hexyls	6	93

What is the shortest word with value `274`?¶

In [83]:

df.loc[df['Value']==274]

Out[83]:

	Char Count	Value
Word
countercountermeasure	21	274
overprotectivenesses	20	274
psychophysiologically	21	274

In [84]:

df.loc[df['Value']==274].sort_values(by=['Char Count'])

Out[84]:

	Char Count	Value
Word
overprotectivenesses	20	274
countercountermeasure	21	274
psychophysiologically	21	274

In [85]:

df.loc[
 (df['Value']==274) &
 (df['Char Count']==df.loc[df['Value']==274,"Char Count"].min())
]

Out[85]:

	Char Count	Value
Word
overprotectivenesses	20	274

Create a column `Ratio` which represents the 'Value Ratio' of a word¶

In [86]:

df["Ratio"] = df["Value"] / df["Char Count"]

In [87]:

df.head()

Out[87]:

	Char Count	Value	Ratio
Word
aa	2	2	1.000000
aah	3	10	3.333333
aahed	5	19	3.800000
aahing	6	40	6.666667
aahs	4	29	7.250000

What is the maximum value of `Ratio`?¶

In [88]:

df["Ratio"].max()

Out[88]:

22.5

What word is the one with the highest `Ratio`?¶

In [89]:

df.loc[df['Ratio']==df["Ratio"].max()]

Out[89]:

	Char Count	Value	Ratio
Word
xu	2	45	22.5

How many words have a `Ratio` of `10`?¶

In [107]:

df.loc[df['Ratio']==10.0].shape

Out[107]:

(2604, 3)

In [109]:

df.query("Ratio==10.0").shape

Out[109]:

(2604, 3)

What is the maximum `Value` of all the words with a `Ratio` of `10`?¶

In [96]:

df.loc[df['Ratio']==10.0].sort_values(by=['Value'],ascending=False).head()

Out[96]:

	Char Count	Value	Ratio
Word
electrocardiographically	24	240	10.0
electroencephalographies	24	240	10.0
electroencephalographer	23	230	10.0
electrodesiccation	18	180	10.0
phonocardiographic	18	180	10.0

Of those words with a `Value` of `260`, what is the lowest `Char Count` found?¶

In [99]:

df.loc[df["Value"]==260].sort_values(by=['Char Count']).head()

Out[99]:

	Char Count	Value	Ratio
Word
hydroxytryptamine	17	260	15.294118
neuropsychologists	18	260	14.444444
psychophysiologist	18	260	14.444444
revolutionarinesses	19	260	13.684211
countermobilizations	20	260	13.000000

Based on the previous task, what word is it?¶

In [105]:

df.loc[(df["Value"]==260) & (df["Char Count"]==17)]

Out[105]:

	Char Count	Value	Ratio
Word
hydroxytryptamine	17	260	15.294118

Statement of Completion#478cc12c

Intro to Pandas for Data Analysis

DataFrames practice: working with English Words

Activities¶

How many elements does this dataframe have?¶

What is the value of the word `microspectrophotometries`?¶

What is the highest possible value of a word?¶

Which of the following words have a Char Count of `15`?¶

What is the highest possible length of a word?¶

What is the word with the value of `319`?¶

What is the most common value?¶

What is the shortest word with value `274`?¶

Create a column `Ratio` which represents the 'Value Ratio' of a word¶

What is the maximum value of `Ratio`?¶

What word is the one with the highest `Ratio`?¶

How many words have a `Ratio` of `10`?¶

What is the maximum `Value` of all the words with a `Ratio` of `10`?¶

Of those words with a `Value` of `260`, what is the lowest `Char Count` found?¶

Based on the previous task, what word is it?¶

Statement of Completion#478cc12c

Intro to Pandas for Data Analysis

DataFrames practice: working with English Words

Activities¶

How many elements does this dataframe have?¶

What is the value of the word microspectrophotometries?¶

What is the highest possible value of a word?¶

Which of the following words have a Char Count of 15?¶

What is the highest possible length of a word?¶

What is the word with the value of 319?¶

What is the most common value?¶

What is the shortest word with value 274?¶

Create a column Ratio which represents the 'Value Ratio' of a word¶

What is the maximum value of Ratio?¶

What word is the one with the highest Ratio?¶

How many words have a Ratio of 10?¶

What is the maximum Value of all the words with a Ratio of 10?¶

Of those words with a Value of 260, what is the lowest Char Count found?¶

Based on the previous task, what word is it?¶

What is the value of the word `microspectrophotometries`?¶

Which of the following words have a Char Count of `15`?¶

What is the word with the value of `319`?¶

What is the shortest word with value `274`?¶

Create a column `Ratio` which represents the 'Value Ratio' of a word¶

What is the maximum value of `Ratio`?¶

What word is the one with the highest `Ratio`?¶

How many words have a `Ratio` of `10`?¶

What is the maximum `Value` of all the words with a `Ratio` of `10`?¶

Of those words with a `Value` of `260`, what is the lowest `Char Count` found?¶