Statement of Completion#56f8a0a5
Intro to Pandas for Data Analysis
medium
How Much Do You Know About Anime?: Filtering, Selection and Sorting
Resolution
Activities
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
df = pd.read_csv('anime_dataset.csv')
In [14]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7797 entries, 0 to 7796 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Title 7797 non-null object 1 Genres 7797 non-null object 2 Rank 7797 non-null int64 3 Popularity 7797 non-null int64 4 Score 7797 non-null float64 5 Episodes 7797 non-null int64 6 Release Date 3210 non-null object 7 Episode_Length(In min) 7797 non-null int64 8 Release_Year 7797 non-null int64 dtypes: float64(1), int64(5), object(3) memory usage: 548.4+ KB
Activities¶
Note
: Please do not use string handling for solving activities 🙏. Solve them only with the specified methods.
Selection with .iloc[]
¶
1) Select the first 7 rows of the dataset¶
In [3]:
first_seven_rows = df.iloc[:7]
2) Select the last nine records¶
In [5]:
last_nine_rows = df.iloc[-9:]
3) Select the Release_Year
column¶
In [9]:
year_df = df.iloc[:,[8]]
Validate your year_df
variable is a DataFrame
In [11]:
type(year_df)
Out[11]:
pandas.core.frame.DataFrame
4) Select rows 3 to 6 and extract only the Title
and Genres
columns¶
In [17]:
selected_rows_cols = df.iloc[2:6,[0,1]]
Selection with .loc[]
¶
5) Select the Episodes
column¶
In [22]:
episodes_df = df.loc[:,['Episodes']]
Validate your episodes_df
variable is a DataFrame
In [24]:
type(episodes_df)
Out[24]:
pandas.core.frame.DataFrame
6) Select the popularity scores of anime from rows 200 to 300, Inclusive¶
In [25]:
pop_200_300_df = df.loc[199:299,['Popularity']]
What Is Special About Year 2018 ?¶
By looking at the bar chart below, we can clearly see that in the year 2018, the highest number of anime were released.
In [ ]:
# Run the below cell to see the bar chart
In [27]:
year_counts = df[df['Release_Year'] != 0]['Release_Year'].value_counts().sort_index()
plt.figure(figsize=(10, 6), facecolor='white', edgecolor='black')
year_counts.plot(kind='bar', color='skyblue')
plt.title('Anime Releases by Year')
plt.xlabel('Release Year')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
7) Select the anime that were released in the year 2018 and store them in released_2018
¶
In [29]:
released_2018 = df.loc[df['Release_Year']==2018]
Just For Exploration¶
Hey, did you know? In spring 2022, a bunch of new anime came out. Guess what genre most of them were? Comedy and romance! Maybe it's because spring gets people feeling romantic, who knows? But here's the cool part: I found out by making a bar chart. Shh, it's our little secret!
In [ ]:
# Run the below cell to see the plot about genres distribution for spring 2022
In [31]:
genre_counts = df[df['Release Date'] == 'Spring 2022']['Genres'].value_counts()
plt.figure(figsize=(10, 6))
sns.barplot(x=genre_counts.index, y=genre_counts.values, color='#FFF225')
plt.title('Genres Distribution for Spring 2022 Anime')
plt.xlabel('Genres')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
8) How many anime were released in Spring 2022
¶
In [35]:
df.loc[df['Release Date']=='Spring 2022'].shape
Out[35]:
(21, 9)
9) Which of the following anime have a single episode ?¶
In [40]:
option_list = ['Gintama: The Final', 'Fullmetal Alchemist: Brotherhood', 'Violet Evergarden Movie', 'Chiisana Kyojin Microman']
option_series = pd.Series(option_list)
option_series.isin(df.loc[df['Episodes']==1,'Title'])
Out[40]:
0 True 1 False 2 True 3 False dtype: bool
10) Select anime for which the Release Date
is missing¶
In [41]:
missing_release_date = df.loc[df['Release Date'].isna()]
11) Select the anime that belong to Action,Comedy,Sci-Fi
in Genres
column, Only select Title
, Score
¶
In [43]:
specific_genre_df = df.loc[df['Genres']=='Action,Comedy,Sci-Fi',('Title','Score')]
12) What is the popularity value that marks the top 1% of anime?¶
In [46]:
df.shape
Out[46]:
(7797, 9)
In [48]:
df['Popularity'].quantile(0.99)
Out[48]:
14102.0
Top 1% Anime By Popularity¶
The popularity of anime are heavily distributed towards the median of 5000. But there are some very popularity anime that greatly surpass this value, the top 1% of the anime, have a popularity score of ~14000:
Here is the distribution of the popularity of anime, with a reference line at the value 14,000.
In [ ]:
# Run the below cell for distribution
In [49]:
sns.boxplot(x="Popularity", data=df, showmeans=True, color='#33FF6B')
plt.axvline(x=14000, color='red', linestyle='--', label='Reference Line')
plt.xlabel("Popularity")
plt.ylabel("Data")
plt.title("Boxplot of Popularity")
plt.legend()
plt.show()
13) Select the anime that have a popularity score greater than 14000
¶
In [50]:
high_popularity_df = df.loc[df['Popularity']>14000]
14) Select the anime that have an episode length less than or equal to 5¶
In [55]:
less_episode_length_df = df.loc[df['Episode_Length(In min)']<=5]
15) What percentage of anime episodes last half an hour or more ?¶
In [57]:
df.loc[df['Episode_Length(In min)']>=30].shape[0]/(df.shape[0])*100
Out[57]:
24.624855713736054
16) Select the anime that released after 2020, Select only Title
and Release_Year
columns¶
In [58]:
filtered_year = df.loc[df['Release_Year']>2020,('Title','Release_Year')]
Beginner Query Activities¶
17) What are the anime that were released in 2015 ?¶
In [60]:
anime_2015 = df.query('Release_Year == 2015')
Do you know ?¶
The genre that the majority of the anime in this dataset falls under is Comedy,Slice of Life
, making up 5.4 percent.
In [ ]:
# Run the below cell to see the pie chart
In [62]:
genre_counts = df['Genres'].value_counts()
total = genre_counts.sum()
genre_percentages = genre_counts / total
others_threshold = 0.02
main_genres = genre_counts[genre_counts / genre_counts.sum() >= others_threshold]
main_genres['Others'] = genre_counts[genre_counts / genre_counts.sum() < others_threshold].sum()
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#c2c2f0', '#ffb3e6']
plt.figure(figsize=(8, 8))
plt.pie(main_genres, labels=main_genres.index, autopct='%1.1f%%', startangle=140, colors=colors)
plt.title('Genre Distribution')
plt.axis('equal')
plt.show()
18) Filter the dataset to select the anime that belong to genre Comedy,Slice of Life
¶
In [66]:
comedy_slice_of_life_df = df.query("Genres == 'Comedy,Slice of Life'")
19) How many anime consist of precisely ten episodes ?¶
In [68]:
df.query('Episodes == 10')
Out[68]:
Title | Genres | Rank | Popularity | Score | Episodes | Release Date | Episode_Length(In min) | Release_Year | |
---|---|---|---|---|---|---|---|---|---|
13 | Mushishi Zoku Shou 2nd Season | Adventure,Fantasy,Mystery,Slice of Life,Supern... | 41 | 827 | 8.74 | 10 | Fall 2014 | 23 | 2014 |
16 | Mushishi Zoku Shou | Adventure,Fantasy,Mystery,Slice of Life,Supern... | 47 | 731 | 8.71 | 10 | Spring 2014 | 24 | 2014 |
33 | Hellsing Ultimate | Action,Horror,Supernatural | 191 | 165 | 8.36 | 10 | NaN | 49 | 0 |
36 | Kotarou wa Hitorigurashi | Comedy,Slice of Life | 188 | 2262 | 8.37 | 10 | NaN | 27 | 0 |
127 | Grisaia no Rakuen | Drama,Romance | 898 | 498 | 7.79 | 10 | Spring 2015 | 23 | 2015 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7508 | Kono Subarashii Sekai ni Shukufuku wo! | Adventure,Comedy,Fantasy | 418 | 37 | 8.12 | 10 | Winter 2016 | 23 | 2016 |
7617 | Wu Liuqi Zhi Zui Qiang Fa Xing Shi | Action,Comedy,Drama,Mystery | 277 | 2716 | 8.25 | 10 | NaN | 16 | 0 |
7632 | Kono Subarashii Sekai ni Shukufuku wo! 2 | Adventure,Comedy,Fantasy | 259 | 75 | 8.28 | 10 | Winter 2017 | 23 | 2017 |
7689 | Wu Liuqi Zhi Xuanwu Guo Pian | Action,Adventure,Comedy,Drama,Mystery | 140 | 3228 | 8.45 | 10 | NaN | 16 | 0 |
7789 | Shingeki no Kyojin Season 3 Part 2 | Action,Drama | 3 | 31 | 9.08 | 10 | Spring 2019 | 23 | 2019 |
112 rows × 9 columns
20) Filter the dataset on the given condition¶
In [69]:
episodes_less_than_five = df.query('Episodes < 5')
21) Filter the dataset on the given condition¶
In [72]:
high_score_anime = df.query('Score >= 9')
22) Which of the following Anime are in Top 10 Anime's based on Rank ?¶
In [75]:
df.sort_values('Rank').head(10)
Out[75]:
Title | Genres | Rank | Popularity | Score | Episodes | Release Date | Episode_Length(In min) | Release_Year | |
---|---|---|---|---|---|---|---|---|---|
7790 | Fullmetal Alchemist: Brotherhood | Action,Adventure,Drama,Fantasy | 1 | 3 | 9.14 | 64 | Spring 2009 | 24 | 2009 |
7788 | Spy x Family | Action,Comedy | 2 | 255 | 9.08 | 12 | Spring 2022 | 24 | 2022 |
7789 | Shingeki no Kyojin Season 3 Part 2 | Action,Drama | 3 | 31 | 9.08 | 10 | Spring 2019 | 23 | 2019 |
7794 | Steins;Gate | Drama,Sci-Fi,Suspense | 4 | 13 | 9.08 | 24 | Spring 2011 | 24 | 2011 |
7776 | Gintama° | Action,Comedy,Sci-Fi | 5 | 338 | 9.08 | 51 | Spring 2015 | 24 | 2015 |
7779 | Gintama' | Action,Comedy,Sci-Fi | 6 | 382 | 9.05 | 51 | Spring 2011 | 24 | 2011 |
7775 | Gintama: The Final | Action,Comedy,Drama,Sci-Fi | 7 | 1735 | 9.05 | 1 | NaN | 104 | 0 |
7774 | Hunter x Hunter (2011) | Action,Adventure,Fantasy | 8 | 10 | 9.05 | 148 | Fall 2011 | 23 | 2011 |
7784 | Fruits Basket: The Final | Drama,Romance,Supernatural | 9 | 547 | 9.04 | 13 | Spring 2021 | 23 | 2021 |
7772 | Gintama': Enchousen | Action,Comedy,Sci-Fi | 10 | 696 | 9.04 | 13 | Fall 2012 | 24 | 2012 |
23) Filter the dataset on the specified condition¶
In [76]:
filtered_popular_score = df.loc[(df['Popularity']>1000) & (df['Score'] > 8.5)]
24) Filtering Dataset for Recent Releases and High Popularity¶
In [78]:
recent_popular_anime = df.loc[(df['Release_Year'] >=2020) & (df['Popularity']>9000)]
25) Filtering Based on three conditions¶
In [80]:
filtered_data = df.loc[(df['Episodes'] > 20) & (df['Episodes'] < 30) & (df['Rank'] < 20)]
Beginner Sorting Activities¶
26) Sort the dataframe by single a column¶
In [82]:
sorted_rank_df = df.sort_values('Rank')
In the above activity, we know that Fullmetal Alchemist: Brotherhood
holds the top rank among anime. Did you know? This acclaimed series secured the Tokyo Anime Award for Best Television Series.¶
Source: Fumination
27) Sort in descending order¶
In [84]:
popularity_desc_df = df.sort_values('Popularity',ascending=False)
28) Sort by Multiple Columns¶
In [86]:
sorted_pop_score_df = df.sort_values(['Popularity','Score'])
29) What is the title of the first anime when the dataframe is sorted by the Genres
column ?¶
In [88]:
df.sort_values('Genres')
Out[88]:
Title | Genres | Rank | Popularity | Score | Episodes | Release Date | Episode_Length(In min) | Release_Year | |
---|---|---|---|---|---|---|---|---|---|
1452 | Explorer Woman Ray | Action,Adventure | 11450 | 10589 | 5.26 | 2 | NaN | 30 | 0 |
3692 | Chou Seimeitai Transformers Beast Wars Neo | Action,Adventure | 6666 | 9291 | 6.44 | 35 | Winter 1999 | 23 | 1999 |
2390 | Kaitouranma The Animation | Action,Adventure | 9369 | 9412 | 5.89 | 2 | NaN | 28 | 0 |
3073 | Ginga Reppuu Baxingar | Action,Adventure | 7912 | 11290 | 6.20 | 39 | Summer 1982 | 25 | 1982 |
679 | Yuusha Shirei Dagwon | Action,Adventure | 4646 | 9810 | 6.83 | 48 | Winter 1996 | 24 | 1996 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
638 | Keijo!!!!!!!! | Sports,Ecchi | 4094 | 512 | 6.95 | 12 | Fall 2016 | 23 | 2016 |
2868 | Gegege no Kitarou: Obake Nighter | Sports,Supernatural | 8324 | 11965 | 6.12 | 1 | NaN | 30 | 0 |
7044 | Death Note: Rewrite | Supernatural,Suspense | 1105 | 1040 | 7.70 | 2 | NaN | 112 | 0 |
7737 | Death Note | Supernatural,Suspense | 74 | 2 | 8.62 | 37 | Fall 2006 | 23 | 2006 |
5825 | Munou na Nana | Supernatural,Suspense | 2951 | 736 | 7.20 | 13 | Fall 2020 | 23 | 2020 |
7797 rows × 9 columns
Boolean Logic and Sorting Activities¶
30) Select anime released in 2017 that have more than 20 episodes¶
In [89]:
df_2017_more_than_20_episodes = df.loc[(df['Release_Year'] == 2017) & (df['Episodes'] > 20)]
Anime Release Distribution in 2009¶
The following pie chart shows the distribution of release dates in 2009. From it, we can infer that in 2009, most anime were released in the spring season, while the fewest were released in the summer season.
In [ ]:
# Run the below cell to see the pie chart
In [91]:
release_date_counts = df.loc[df['Release_Year'] == 2009]['Release Date'].value_counts()
custom_colors = sns.color_palette('Paired')
plt.figure(figsize=(8, 8))
sns.set_style("whitegrid")
plt.pie(release_date_counts, labels=release_date_counts.index, autopct='%1.1f%%', startangle=140, colors=custom_colors)
plt.axis('equal')
plt.title('Distribution of Anime Release Dates')
plt.show()
31) Choose anime that were released during either Spring 2009
or Summer 2009
¶
In [92]:
spring_summer_2009_df = df.loc[(df['Release Date'] == 'Summer 2009') | (df['Release Date'] == 'Spring 2009')]
32) Find anime that do not belong to the genre Action,Adventure
¶
In [94]:
not_act_adv_df = df.loc[df['Genres']!= 'Action,Adventure']
33) Popularity Range Sorting¶
In [96]:
pop_range_sorted_df = df.loc[(df['Popularity'] <= 8000) & (df['Popularity'] >= 7500)].sort_values('Popularity')
34) What is the score of the anime with the lowest rank where the genre of the movie is Drama,Sci-Fi
?¶
In [98]:
df.loc[df['Genres']=='Drama,Sci-Fi'].sort_values('Rank',ascending=False)
Out[98]:
Title | Genres | Rank | Popularity | Score | Episodes | Release Date | Episode_Length(In min) | Release_Year | |
---|---|---|---|---|---|---|---|---|---|
1603 | The Humanoid: Ai no Wakusei Lezeria | Drama,Sci-Fi | 12200 | 9408 | 4.77 | 1 | NaN | 47 | 0 |
1614 | Vie Durant | Drama,Sci-Fi | 12154 | 10392 | 4.82 | 8 | NaN | 7 | 0 |
1982 | Gekidol: Actidol Project | Drama,Sci-Fi | 10697 | 4915 | 5.52 | 12 | Winter 2021 | 23 | 2021 |
1397 | Moonrakers | Drama,Sci-Fi | 10496 | 10223 | 5.58 | 1 | NaN | 2 | 0 |
2176 | Kigyou Senshi Yamazaki: Long Distance Call | Drama,Sci-Fi | 9982 | 11500 | 5.73 | 1 | NaN | 43 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7272 | Densetsu Kyojin Ideon: Hatsudou-hen | Drama,Sci-Fi | 776 | 6009 | 7.86 | 1 | NaN | 98 | 0 |
7302 | Carole & Tuesday | Drama,Sci-Fi | 728 | 667 | 7.89 | 24 | Spring 2019 | 22 | 2019 |
7355 | Mobile Suit Gundam 0080: War in the Pocket | Drama,Sci-Fi | 641 | 2613 | 7.94 | 6 | NaN | 27 | 0 |
7724 | Steins;Gate Movie: Fuka Ryouiki no Déjà vu | Drama,Sci-Fi | 131 | 335 | 8.46 | 1 | NaN | 90 | 0 |
7 | Ginga Eiyuu Densetsu | Drama,Sci-Fi | 11 | 700 | 9.03 | 110 | NaN | 26 | 0 |
69 rows × 9 columns
35) Anime Selection & Ranking (2017-2019)¶
In [99]:
anime_2017_2019_df = df.loc[(df['Release_Year'] >= 2017) & (df['Release_Year'] <= 2019)].sort_values('Score',ascending=False)
36) Select and sort the dataframe based on the given conditions, then store it in last_activity_df
¶
In [108]:
last_activity_df = df.loc[(df['Genres'] == 'Action,Comedy') &
(df['Popularity'] > 50) &
(df['Episodes'] < 100) &
((df['Release_Year'] >= 2015) & (df['Release_Year'] <= 2019))].sort_values('Episode_Length(In min)')
In [ ]: