Marcos Ycaro Barros Trindade has successfully completed this project.

Intro to Pandas for Data Analysis

medium

4.51

Practice DataFrame Mutations using Good Reads Books and Reviews Data

Finished

September 24, 2024 12:00 AM

Elapsed time (min)

107

Completed activities

Resolution

Activities

Import the libraries and load the dataset¶

In [1]:

import warnings 
# Ignore FutureWarning 
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:

import pandas as pd

df = pd.read_csv('Best_Books_Ever.csv')

In [3]:

df.head()

Out[3]:

	bookId	title	series	author	rating	description	language	isbn	genres	characters	...	firstPublishDate	awards	numRatings	ratingsByStars	likedPercent	setting	coverImg	bbeScore	bbeVotes	price
0	2.Harry_Potter_and_the_Order_of_the_Phoenix	Harry Potter and the Order of the Phoenix	Harry Potter #5	J.K. Rowling, Mary GrandPré (Illustrator)	4.50	There is a door at the end of a silent corrido...	English	9780439358071	['Fantasy', 'Young Adult', 'Fiction', 'Magic',...	['Sirius Black', 'Draco Malfoy', 'Ron Weasley'...	...	2003-06-21	['Bram Stoker Award for Works for Young Reader...	2507623	['1593642', '637516', '222366', '39573', '14526']	98.0	['Hogwarts School of Witchcraft and Wizardry (...	https://i.gr-assets.com/images/S/compressed.ph...	2632233	26923	7.38
1	30.J_R_R_Tolkien_4_Book_Boxed_Set	J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...	The Lord of the Rings #0-3	J.R.R. Tolkien	4.60	This four-volume, boxed set contains J.R.R. To...	English	9780345538376	['Fantasy', 'Fiction', 'Classics', 'Adventure'...	['Frodo Baggins', 'Gandalf', 'Bilbo Baggins', ...	...	2055-10-20	[]	110146	['78217', '22857', '6628', '1477', '967']	98.0	['Middle-earth']	https://i.gr-assets.com/images/S/compressed.ph...	1159802	12111	21.15
2	375802.Ender_s_Game	Ender's Game	Ender's Saga #1	Orson Scott Card, Stefan Rudnicki (Narrator), ...	4.30	Andrew "Ender" Wiggin thinks he is playing com...	English	9780812550702	['Science Fiction', 'Fiction', 'Young Adult', ...	['Dink', 'Bernard', 'Valentine Wiggin', 'Peter...	...	1985-10-28	['Hugo Award for Best Novel (1986)', 'Nebula A...	1131303	['603209', '339819', '132305', '35667', '20303']	95.0	[]	https://i.gr-assets.com/images/S/compressed.ph...	720651	7515	4.60
3	17245.Dracula	Dracula	Dracula #1	Bram Stoker, Nina Auerbach (Editor), David J. ...	4.00	You can find an alternative cover edition for ...	English	9780393970128	['Classics', 'Horror', 'Fiction', 'Fantasy', '...	['Jonathan Harker', 'Lucy Westenra', 'Abraham ...	...	1997-05-26	[]	938325	['345260', '329217', '197206', '48642', '18000']	93.0	['Transylvania (Romania)', 'Budapest (Hungary)...	https://i.gr-assets.com/images/S/compressed.ph...	646782	6988	4.55
4	28187.The_Lightning_Thief	The Lightning Thief	Percy Jackson and the Olympians #1	Rick Riordan (Goodreads Author)	4.26	Alternate cover for this ISBN can be found her...	English	9780786838653	['Fantasy', 'Young Adult', 'Mythology', 'Ficti...	['Annabeth Chase', 'Grover Underwood', 'Luke C...	...	2005-06-28	["Young Readers' Choice Award (2008)", 'Books ...	1992300	['1006885', '604999', '289310', '64014', '27092']	95.0	['New York City, New York (United States)', 'M...	https://i.gr-assets.com/images/S/compressed.ph...	597132	6370	1.79

5 rows × 25 columns

In [4]:

df.shape

Out[4]:

(794, 25)

In [5]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 794 entries, 0 to 793
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   bookId            794 non-null    object 
 1   title             794 non-null    object 
 2   series            794 non-null    object 
 3   author            794 non-null    object 
 4   rating            794 non-null    float64
 5   description       794 non-null    object 
 6   language          794 non-null    object 
 7   isbn              794 non-null    int64  
 8   genres            794 non-null    object 
 9   characters        794 non-null    object 
 10  bookFormat        794 non-null    object 
 11  edition           794 non-null    object 
 12  pages             794 non-null    int64  
 13  publisher         794 non-null    object 
 14  publishDate       794 non-null    object 
 15  firstPublishDate  794 non-null    object 
 16  awards            794 non-null    object 
 17  numRatings        794 non-null    int64  
 18  ratingsByStars    794 non-null    object 
 19  likedPercent      794 non-null    float64
 20  setting           794 non-null    object 
 21  coverImg          794 non-null    object 
 22  bbeScore          794 non-null    int64  
 23  bbeVotes          794 non-null    int64  
 24  price             794 non-null    float64
dtypes: float64(3), int64(5), object(17)
memory usage: 155.2+ KB

Activities¶

Activity 1. Calculating the Price-to-Rating Ratio¶

In [6]:

df['price_to_rating'] = df['price'] / df['rating']

Activity 2. Remove the "isbn" Column¶

In [7]:

del df['isbn']

Activity 3. Extract and Add the "Year Published" Column¶

In [8]:

df['YearPublished'] = df['publishDate'].map(lambda x:x[:x.find('-')]).astype('int32')

Activity 4. Filter Books with Ratings Above 4.5¶

In [9]:

best_books = df[df['rating'] >= 4.5]

Activity 5. Count and Add the Number of Genres¶

In [10]:

df['GenreCount'] = df['genres'].map(lambda x: x.count("'")//2)

Activity 6. Split Author Names into First and Last Name Columns¶

In [11]:

df['FirstName'] = df['author'].map(lambda x:" ".join(x.split()[:-1]) if len(x.split())>1 else x)
df['LastName'] = df['author'].map(lambda x:"".join(x.split()[-1]) if len(x.split())>1 else None)
df['LastName'][380]

Activity 7. Drop Books with Fewer than 100 Pages¶

In [12]:

df.drop(df[df['pages']<100].index,inplace=True)

Activity 8. Extract the Primary Genre¶

In [13]:

df['PrimaryGenre'] = df['genres'].map(lambda x:eval(x)[0] if len(eval(x)) > 0 else None)

Activity 9. Flag Books with multiple Awards¶

In [14]:

df['MultipleAwards'] = df['awards'].map(lambda x: True if len(eval(x)) > 1 else False)

Activity 10. Estimate Reading Time Based on Page Count¶

In [15]:

df['ReadingTimeHours'] = df['pages']*300/250/60

Activity 11. Flag books published in year 2000 onwards¶

In [16]:

df['Published21stCentury'] = df['YearPublished'] >= 2000

Activity 12. Simplifying the DataFrame by Dropping Columns¶

In [17]:

df.drop(['coverImg','description','ratingsByStars'],axis=1,inplace=True)

Activity 13. Adding a New Book Entry¶

In [18]:

new_book = {
    "bookId": '10000',
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "rating": 3.9,
    "pages": 180,
    "publishDate": '1925-04-10',
    "publisher": "Scribner",
    "price": 7.99,
    "genres": "['Fiction', 'Classics']",
    "GenreCount": 2,
    "FirstName": "F.",
    "LastName": "Fitzgerald",
    "PrimaryGenre": "Fiction",
    "MultipleAwards": False,
    "ReadingTimeHours": 9.0,
    "Published21stCentury": True
}
new_book = pd.DataFrame(new_book,index=[len(df)])
df = pd.concat([df,new_book])

Activity 14. Transforming Publish Dates into Datetime Format¶

In [19]:

df['publishDate'] = pd.to_datetime(df['publishDate'])
df['firstPublishDate'] = pd.to_datetime(df['firstPublishDate'])

Activity 15. Bulk Adding New Book Entries to the DataFrame¶

In [20]:

new_books = [
    {
        "bookId": '10001',
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "rating": 4.3,
        "pages": 281,
        "publishDate": pd.to_datetime('1960-07-11'),
        "firstPublishDate": pd.to_datetime('1960-07-11'),
        "publisher": "J.B. Lippincott & Co.",
        "price": 9.99,
        "genres": "['Fiction', 'Classics']",
        "GenreCount": 2,
        "FirstName": "Harper",
        "LastName": "Lee",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 11.24,
        "Published21stCentury": False
    },
    {
        "bookId": '10002',
        "title": "1984",
        "author": "George Orwell",
        "rating": 4.2,
        "pages": 328,
        "publishDate": pd.to_datetime('1949-06-08'),
        "firstPublishDate": pd.to_datetime('1949-06-08'),
        "publisher": "Secker & Warburg",
        "price": 12.99,
        "genres": "['Fiction', 'Classics']",
        "GenreCount": 2,
        "FirstName": "George",
        "LastName": "Orwell",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 13.12,
        "Published21stCentury": False
    }
]

index = len(df)

for i in range(len(new_books)):
    new_books[i] = pd.DataFrame(new_books[i],index=[index])
    index+=1
df = pd.concat([df,new_books[0],new_books[1]],ignore_index=True)

Statement of Completion#6c131a5a