Honing DataFrame Mutation Techniques Using Spotify Data¶

In [1]:

import numpy as np
import pandas as pd

In [2]:

df = pd.read_csv('data/data.csv', low_memory=False)

In [3]:

df.head()

Out[3]:

	id	name	artists	duration_ms	release_date	year	acousticness	danceability	energy	instrumentalness	liveness	loudness	speechiness	tempo	valence	mode	key	popularity
0	6KbQ3uYMLKb5jDxLF7wYDD	Singende Bataillone 1. Teil	['Carl Woitschach']	158648	1928	1928	0.995	0.708	0.1950	0.563	0.1510	-12.428	0.0506	118.469	0.7790	1	10	0
1	6KuQTIu1KoTTkLXKrwlLPV	Fantasiestücke, Op. 111: Più tosto lento	['Robert Schumann', 'Vladimir Horowitz']	282133	1928	1928	0.994	0.379	0.0135	0.901	0.0763	-28.454	0.0462	83.972	0.0767	1	8	0
2	6L63VW0PibdM1HDSBoqnoM	Chapter 1.18 - Zamek kaniowski	['Seweryn Goszczyński']	104300	1928	1928	0.604	0.749	0.2200	0.000	0.1190	-19.924	0.9290	107.177	0.8800	0	5	0
3	6M94FkXd15sOAOQYRnWPN8	Bebamos Juntos - Instrumental (Remasterizado)	['Francisco Canaro']	180760	9/25/28	1928	0.995	0.781	0.1300	0.887	0.1110	-14.734	0.0926	108.003	0.7200	0	1	0
4	6N6tiFZ9vLTSOIxkj8qKrd	Polonaise-Fantaisie in A-Flat Major, Op. 61	['Frédéric Chopin', 'Vladimir Horowitz']	687733	1928	1928	0.990	0.210	0.2040	0.908	0.0980	-16.829	0.0424	62.149	0.0693	1	11	1

In [4]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169909 entries, 0 to 169908
Data columns (total 19 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   id                169909 non-null  object 
 1   name              169909 non-null  object 
 2   artists           169909 non-null  object 
 3   duration_ms       169909 non-null  int64  
 4   release_date      169909 non-null  object 
 5   year              169909 non-null  int64  
 6   acousticness      169909 non-null  float64
 7   danceability      169909 non-null  float64
 8   energy            169909 non-null  float64
 9   instrumentalness  169909 non-null  float64
 10  liveness          169909 non-null  float64
 11  loudness          169909 non-null  float64
 12  speechiness       169909 non-null  float64
 13  tempo             169909 non-null  float64
 14  valence           169909 non-null  float64
 15  mode              169909 non-null  int64  
 16  key               169909 non-null  int64  
 17  popularity        169909 non-null  int64  
 18  explicit          169909 non-null  int64  
dtypes: float64(9), int64(6), object(4)
memory usage: 24.6+ MB

1) Rename the `acousticness` column to `acoustic_level`¶

In [6]:

# Write your code here
df.rename(columns = {'acousticness': 'acoustic_level'}, inplace = True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 169909 entries, 0 to 169908
Data columns (total 19 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   id                169909 non-null  object 
 1   name              169909 non-null  object 
 2   artists           169909 non-null  object 
 3   duration_ms       169909 non-null  int64  
 4   release_date      169909 non-null  object 
 5   year              169909 non-null  int64  
 6   acoustic_level    169909 non-null  float64
 7   danceability      169909 non-null  float64
 8   energy            169909 non-null  float64
 9   instrumentalness  169909 non-null  float64
 10  liveness          169909 non-null  float64
 11  loudness          169909 non-null  float64
 12  speechiness       169909 non-null  float64
 13  tempo             169909 non-null  float64
 14  valence           169909 non-null  float64
 15  mode              169909 non-null  int64  
 16  key               169909 non-null  int64  
 17  popularity        169909 non-null  int64  
 18  explicit          169909 non-null  int64  
dtypes: float64(9), int64(6), object(4)
memory usage: 25.9+ MB

2) Rename Multiple Columns Using the `rename()` Function¶

In [8]:

# Write your code here
df.rename(columns = {'danceability': 'dance_score',
                    'duration_ms': 'duration_milliseconds',
                    'instrumentalness': 'instrumental',
                    'liveness': 'live_performance',
                    'speechiness': 'speech_presence'}, inplace = True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 169909 entries, 0 to 169908
Data columns (total 19 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   id                     169909 non-null  object 
 1   name                   169909 non-null  object 
 2   artists                169909 non-null  object 
 3   duration_milliseconds  169909 non-null  int64  
 4   release_date           169909 non-null  object 
 5   year                   169909 non-null  int64  
 6   acoustic_level         169909 non-null  float64
 7   dance_score            169909 non-null  float64
 8   energy                 169909 non-null  float64
 9   instrumental           169909 non-null  float64
 10  live_performance       169909 non-null  float64
 11  loudness               169909 non-null  float64
 12  speech_presence        169909 non-null  float64
 13  tempo                  169909 non-null  float64
 14  valence                169909 non-null  float64
 15  mode                   169909 non-null  int64  
 16  key                    169909 non-null  int64  
 17  popularity             169909 non-null  int64  
 18  explicit               169909 non-null  int64  
dtypes: float64(9), int64(6), object(4)
memory usage: 25.9+ MB

3) Add a new column called `duration_seconds` that converts the `duration_milliseconds` column from milliseconds to seconds¶

In [18]:

# Write your code here
df['duration_seconds'] = df['duration_milliseconds'] / 1000

In [17]:

df['duration_milliseconds'] / 1000

Out[17]:

0         158.648
1         282.133
2         104.300
3         180.760
4         687.733
           ...   
169904    163.800
169905    167.468
169906    180.700
169907    167.308
169908    214.787
Name: duration_milliseconds, Length: 169909, dtype: float64

4) Add a new column called `popularity_score` that multiplies the `popularity` column by 0.01¶

In [20]:

# Write your code here
df['popularity_score'] = df['popularity'] * 0.01

In [22]:

df.head(5)

Out[22]:

	id	name	artists	duration_milliseconds	release_date	year	acoustic_level	dance_score	energy	instrumental	...	loudness	speech_presence	tempo	valence	mode	key	popularity	duration_seconds	popularity_score
0	6KbQ3uYMLKb5jDxLF7wYDD	Singende Bataillone 1. Teil	['Carl Woitschach']	158648	1928	1928	0.995	0.708	0.1950	0.563	...	-12.428	0.0506	118.469	0.7790	1	10	0	158.648	0.00
1	6KuQTIu1KoTTkLXKrwlLPV	Fantasiestücke, Op. 111: Più tosto lento	['Robert Schumann', 'Vladimir Horowitz']	282133	1928	1928	0.994	0.379	0.0135	0.901	...	-28.454	0.0462	83.972	0.0767	1	8	0	282.133	0.00
2	6L63VW0PibdM1HDSBoqnoM	Chapter 1.18 - Zamek kaniowski	['Seweryn Goszczyński']	104300	1928	1928	0.604	0.749	0.2200	0.000	...	-19.924	0.9290	107.177	0.8800	0	5	0	104.300	0.00
3	6M94FkXd15sOAOQYRnWPN8	Bebamos Juntos - Instrumental (Remasterizado)	['Francisco Canaro']	180760	9/25/28	1928	0.995	0.781	0.1300	0.887	...	-14.734	0.0926	108.003	0.7200	0	1	0	180.760	0.00
4	6N6tiFZ9vLTSOIxkj8qKrd	Polonaise-Fantaisie in A-Flat Major, Op. 61	['Frédéric Chopin', 'Vladimir Horowitz']	687733	1928	1928	0.990	0.210	0.2040	0.908	...	-16.829	0.0424	62.149	0.0693	1	11	1	687.733	0.01

5 rows × 21 columns

5) Add a new column called `is_popular` that assigns 1 to songs with `popularity` greater than 70 and 0 otherwise¶

In [25]:

(df['popularity'] > 70).astype(int)

Out[25]:

0         0
1         0
2         0
3         0
4         0
         ..
169904    1
169905    0
169906    0
169907    0
169908    0
Name: popularity, Length: 169909, dtype: int64

In [26]:

# Write your code here
df['is_popular'] = (df['popularity'] > 70).astype(int)

6) Add a new column called `artist_count` that counts the number of artists in the `artists` column¶

In [ ]:

In [50]:

df['artists'].str.count(',')+1

Out[50]:

0         1
1         2
2         1
3         1
4         2
         ..
169904    2
169905    2
169906    2
169907    2
169908    1
Name: artists, Length: 169909, dtype: int64

In [51]:

# Write your code here
df['artist_count'] = df['artists'].str.count(',') + 1

In [ ]:

7) Add a new column called `duration_minutes` that calculates the duration in minutes from the `duration_seconds` column¶

In [53]:

# Write your code here
df['duration_minutes'] = df['duration_seconds'] / 60

8) Update the `popularity` column by adding 10 to each value¶

In [55]:

# Write your code here
df['popularity'] = df['popularity'] + 10

9) Update the `speech_presence` column by multiplying each value by 0.8¶

In [57]:

# Write your code here
df['speech_presence'] = df['speech_presence'] * 0.8

In [ ]:

In [60]:

df.head().T

Out[60]:

	0	1	2	3	4
id	6KbQ3uYMLKb5jDxLF7wYDD	6KuQTIu1KoTTkLXKrwlLPV	6L63VW0PibdM1HDSBoqnoM	6M94FkXd15sOAOQYRnWPN8	6N6tiFZ9vLTSOIxkj8qKrd
name	Singende Bataillone 1. Teil	Fantasiestücke, Op. 111: Più tosto lento	Chapter 1.18 - Zamek kaniowski	Bebamos Juntos - Instrumental (Remasterizado)	Polonaise-Fantaisie in A-Flat Major, Op. 61
artists	['Carl Woitschach']	['Robert Schumann', 'Vladimir Horowitz']	['Seweryn Goszczyński']	['Francisco Canaro']	['Frédéric Chopin', 'Vladimir Horowitz']
duration_milliseconds	158648	282133	104300	180760	687733
release_date	1928	1928	1928	9/25/28	1928
year	1928	1928	1928	1928	1928
acoustic_level	0.995	0.994	0.604	0.995	0.99
dance_score	0.708	0.379	0.749	0.781	0.21
energy	0.195	0.0135	0.22	0.13	0.204
instrumental	0.563	0.901	0.0	0.887	0.908
live_performance	0.151	0.0763	0.119	0.111	0.098
loudness	-12.428	-28.454	-19.924	-14.734	-16.829
speech_presence	0.04048	0.03696	0.7432	0.07408	0.03392
tempo	118.469	83.972	107.177	108.003	62.149
valence	0.779	0.0767	0.88	0.72	0.0693
mode	1	1	0	0	1
key	10	8	5	1	11
popularity	10	10	10	10	11
explicit	0	0	0	0	0
duration_seconds	158.648	282.133	104.3	180.76	687.733
popularity_score	0.0	0.0	0.0	0.0	0.01
is_popular	0	0	0	0	0
artist_count	1	2	1	1	2
duration_minutes	2.644133	4.702217	1.738333	3.012667	11.462217

10) Update the `dance_score` column by subtracting 0.1 from each value¶

In [61]:

# Write your code here
df['dance_score'] = df['dance_score'] - 0.1

11) Update the `mode` column by replacing 0 with 'Minor' and 1 with 'Major'¶

In [64]:

df['mode'].map(lambda x: 'Major' if x == 1 else 'Minor')

Out[64]:

0         Major
1         Major
2         Minor
3         Minor
4         Major
          ...  
169904    Major
169905    Major
169906    Major
169907    Major
169908    Major
Name: mode, Length: 169909, dtype: object

In [65]:

# Write your code here
df['mode'] = df['mode'].map(lambda x: 'Major' if x == 1 else 'Minor')

12) Update the `tempo` column by setting values greater than 150 to 150¶

In [67]:

df['tempo'] = df['tempo'].clip(upper=150)

In [ ]:

# Write your code here

13) Replace the `key` column values 0, 1, and 2 with 'C', 'C#', and 'D', respectively¶

In [69]:

# Write your code here

df['key'] = df['key'].replace({0: 'C', 1: 'C#', 2: 'D', 3: 'D#', 4: 'E', 5: 'F', 6: 'F#', 7: 'G', 8:'G#', 9:'A', 10: 'A#', 11: 'B'})

In [72]:

df.head().T

Out[72]:

	0	1	2	3	4
id	6KbQ3uYMLKb5jDxLF7wYDD	6KuQTIu1KoTTkLXKrwlLPV	6L63VW0PibdM1HDSBoqnoM	6M94FkXd15sOAOQYRnWPN8	6N6tiFZ9vLTSOIxkj8qKrd
name	Singende Bataillone 1. Teil	Fantasiestücke, Op. 111: Più tosto lento	Chapter 1.18 - Zamek kaniowski	Bebamos Juntos - Instrumental (Remasterizado)	Polonaise-Fantaisie in A-Flat Major, Op. 61
artists	['Carl Woitschach']	['Robert Schumann', 'Vladimir Horowitz']	['Seweryn Goszczyński']	['Francisco Canaro']	['Frédéric Chopin', 'Vladimir Horowitz']
duration_milliseconds	158648	282133	104300	180760	687733
release_date	1928	1928	1928	9/25/28	1928
year	1928	1928	1928	1928	1928
acoustic_level	0.995	0.994	0.604	0.995	0.99
dance_score	0.608	0.279	0.649	0.681	0.11
energy	0.195	0.0135	0.22	0.13	0.204
instrumental	0.563	0.901	0.0	0.887	0.908
live_performance	0.151	0.0763	0.119	0.111	0.098
loudness	-12.428	-28.454	-19.924	-14.734	-16.829
speech_presence	0.04048	0.03696	0.7432	0.07408	0.03392
tempo	118.469	83.972	107.177	108.003	62.149
valence	0.779	0.0767	0.88	0.72	0.0693
mode	Major	Major	Minor	Minor	Major
key	A#	G#	F	C#	B
popularity	10	10	10	10	11
explicit	0	0	0	0	0
duration_seconds	158.648	282.133	104.3	180.76	687.733
popularity_score	0.0	0.0	0.0	0.0	0.01
is_popular	0	0	0	0	0
artist_count	1	2	1	1	2
duration_minutes	2.644133	4.702217	1.738333	3.012667	11.462217

14) Replace the `explicit` column values 0 and 1 with 'Not Explicit' and 'Explicit', respectively¶

In [73]:

# Write your code here
df['explicit'] = df['explicit'].replace({0: 'Not Explicit', 1: 'Explicit'})

15) Replace the `year` column values before 1950 with 1950¶

In [76]:

# Write your code here
df['year'] = df['year'].clip(lower = 1950)

16) Replace the `tempo` column values above 150 with 150 and values below 50 with 50¶

In [79]:

# Write your code here
df['tempo'] = df['tempo'].clip(lower = 50, upper = 150)

In [ ]:

Statement of Completion#96308455

Intro to Pandas for Data Analysis

Spotify Data Explorer: Honing DataFrame Mutation Techniques

Honing DataFrame Mutation Techniques Using Spotify Data¶

1) Rename the `acousticness` column to `acoustic_level`¶

2) Rename Multiple Columns Using the `rename()` Function¶

3) Add a new column called `duration_seconds` that converts the `duration_milliseconds` column from milliseconds to seconds¶

4) Add a new column called `popularity_score` that multiplies the `popularity` column by 0.01¶

5) Add a new column called `is_popular` that assigns 1 to songs with `popularity` greater than 70 and 0 otherwise¶

6) Add a new column called `artist_count` that counts the number of artists in the `artists` column¶

7) Add a new column called `duration_minutes` that calculates the duration in minutes from the `duration_seconds` column¶

8) Update the `popularity` column by adding 10 to each value¶

9) Update the `speech_presence` column by multiplying each value by 0.8¶

10) Update the `dance_score` column by subtracting 0.1 from each value¶

11) Update the `mode` column by replacing 0 with 'Minor' and 1 with 'Major'¶

12) Update the `tempo` column by setting values greater than 150 to 150¶

13) Replace the `key` column values 0, 1, and 2 with 'C', 'C#', and 'D', respectively¶

14) Replace the `explicit` column values 0 and 1 with 'Not Explicit' and 'Explicit', respectively¶

15) Replace the `year` column values before 1950 with 1950¶

16) Replace the `tempo` column values above 150 with 150 and values below 50 with 50¶

Statement of Completion#96308455

Intro to Pandas for Data Analysis

Spotify Data Explorer: Honing DataFrame Mutation Techniques

Honing DataFrame Mutation Techniques Using Spotify Data¶

1) Rename the acousticness column to acoustic_level¶

2) Rename Multiple Columns Using the rename() Function¶

3) Add a new column called duration_seconds that converts the duration_milliseconds column from milliseconds to seconds¶

4) Add a new column called popularity_score that multiplies the popularity column by 0.01¶

5) Add a new column called is_popular that assigns 1 to songs with popularity greater than 70 and 0 otherwise¶

6) Add a new column called artist_count that counts the number of artists in the artists column¶

7) Add a new column called duration_minutes that calculates the duration in minutes from the duration_seconds column¶

8) Update the popularity column by adding 10 to each value¶

9) Update the speech_presence column by multiplying each value by 0.8¶

10) Update the dance_score column by subtracting 0.1 from each value¶

11) Update the mode column by replacing 0 with 'Minor' and 1 with 'Major'¶

12) Update the tempo column by setting values greater than 150 to 150¶

13) Replace the key column values 0, 1, and 2 with 'C', 'C#', and 'D', respectively¶

14) Replace the explicit column values 0 and 1 with 'Not Explicit' and 'Explicit', respectively¶

15) Replace the year column values before 1950 with 1950¶

16) Replace the tempo column values above 150 with 150 and values below 50 with 50¶

1) Rename the `acousticness` column to `acoustic_level`¶

2) Rename Multiple Columns Using the `rename()` Function¶

3) Add a new column called `duration_seconds` that converts the `duration_milliseconds` column from milliseconds to seconds¶

4) Add a new column called `popularity_score` that multiplies the `popularity` column by 0.01¶

5) Add a new column called `is_popular` that assigns 1 to songs with `popularity` greater than 70 and 0 otherwise¶

6) Add a new column called `artist_count` that counts the number of artists in the `artists` column¶

7) Add a new column called `duration_minutes` that calculates the duration in minutes from the `duration_seconds` column¶

8) Update the `popularity` column by adding 10 to each value¶

9) Update the `speech_presence` column by multiplying each value by 0.8¶

10) Update the `dance_score` column by subtracting 0.1 from each value¶

11) Update the `mode` column by replacing 0 with 'Minor' and 1 with 'Major'¶

12) Update the `tempo` column by setting values greater than 150 to 150¶

13) Replace the `key` column values 0, 1, and 2 with 'C', 'C#', and 'D', respectively¶

14) Replace the `explicit` column values 0 and 1 with 'Not Explicit' and 'Explicit', respectively¶

15) Replace the `year` column values before 1950 with 1950¶

16) Replace the `tempo` column values above 150 with 150 and values below 50 with 50¶