LI WEISHENG has successfully completed this project.

Intro to Pandas for Data Analysis

medium

4.72

Exploring Data Science Salaries: A Pandas Series Analysis

Finished

June 28, 2025 12:12 PM

Elapsed time (min)

Completed activities

Resolution

Activities

Project.ipynb

Notebook

In [1]:

import numpy as np
import pandas as pd

In [2]:

df = pd.read_csv('Data_Science_Salaries.xls')

In [3]:

df.head()

Out[3]:

	work_year	experience_level	employment_type	job_title	salary_in_usd	employee_residence	remote_ratio	company_location	company_size
0	2023	SE	FT	Principal Data Scientist	85847	ES	100	ES	L
1	2023	MI	CT	ML Engineer	30000	US	100	US	S
2	2023	MI	CT	ML Engineer	25500	US	100	US	S
3	2023	SE	FT	Data Scientist	175000	CA	100	CA	M
4	2023	SE	FT	Data Scientist	120000	CA	100	CA	M

In [1]:

df.info()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 df.info()
      2 df.head

NameError: name 'df' is not defined

1. Which method is used to display basic information about a pandas series?¶

In [4]:

import pandas as pd
df = pd.read_csv('Data_Science_Salaries.xls')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3755 entries, 0 to 3754
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   work_year           3755 non-null   int64 
 1   experience_level    3755 non-null   object
 2   employment_type     3755 non-null   object
 3   job_title           3755 non-null   object
 4   salary_in_usd       3755 non-null   int64 
 5   employee_residence  3755 non-null   object
 6   remote_ratio        3755 non-null   int64 
 7   company_location    3755 non-null   object
 8   company_size        3755 non-null   object
dtypes: int64(3), object(6)
memory usage: 264.1+ KB

2. Assign the `employee_residence` column to the `employee_residence_series` variable as a Series.¶

In [6]:

# Enter your code here
import pandas as pd
df = pd.read_csv('Data_Science_Salaries.xls')

employee_residence_series = pd.Series(df['employee_residence'])
employee_residence_series

Out[6]:

0       ES
1       US
2       US
3       CA
4       CA
        ..
3750    US
3751    US
3752    US
3753    US
3754    IN
Name: employee_residence, Length: 3755, dtype: object

3. Create a Series from the `experience_level` column and store the first 10 elements in the `experience_level_series_10` variable.¶

In [7]:

# Enter your code here
import pandas as pd
experience_series = df['experience_level']
experience_level_series_10 = experience_series[:10]
experience_level_series_10

Out[7]:

0    SE
1    MI
2    MI
3    SE
4    SE
5    SE
6    SE
7    SE
8    SE
9    SE
Name: experience_level, dtype: object

4. What does the `len()` function return when applied to a Series?¶

In [ ]:

The total number of elements.

5. Find the unique values in `company_size_series` along with their counts, and store the results in the `company_size_counts_series` variable.¶

In [9]:

company_size_series = df['company_size']
company_size_counts_series = company_size_series.value_counts()
company_size_counts_series

Out[9]:

company_size
M    3153
L     454
S     148
Name: count, dtype: int64

6. Which method calculates the average value of a Series?¶

In [ ]:

7. Calculate the mean, median, and standard deviation of `salary_usd_series`, and store these values as a Series in the `salary_details variable`.¶

In [14]:

import pandas as pd
salary_usd_series = df['salary_in_usd']
# Enter your code here
mean_salary = salary_usd_series.mean()
median_salary = salary_usd_series.median()
std_salary = salary_usd_series.std()

salary_details = pd.Series({
    'Mean':mean_salary,
    'Median':median_salary,
    'Standard Deviation':std_salary
})

salary_details

Out[14]:

Mean                  137570.389880
Median                135000.000000
Standard Deviation     63055.625278
dtype: float64

8. What method would you use to count unique values in a Series?¶

In [ ]:

9. Identify the top 5 most frequent job titles and store them in the `top_5_job_titles` variable.¶

In [16]:

import pandas as pd
job_title_series = df['job_title']
top_5_job_titles = job_title_series.value_counts().head(5)
top_5_job_titles

Out[16]:

job_title
Data Engineer                1040
Data Scientist                840
Data Analyst                  612
Machine Learning Engineer     289
Analytics Engineer            103
Name: count, dtype: int64

10. Which method would you use to find the most frequent value in a Series?¶

In [18]:

import pandas as pd
job_title_series = df['job_title']
most_fq_value_job = job_title_series.mode()
most_fq_value_job

Out[18]:

0    Data Engineer
Name: job_title, dtype: object

11. Calculate the 25th, 50th, and 75th percentiles of `salary_usd_series` and store these values as a Series in the `salary_quartiles` variable.¶

In [19]:

# Enter your code here
import pandas as pd
salary_usd_series = df['salary_in_usd']
percentile_25 = salary_usd_series.quantile(0.25)
percentile_50 = salary_usd_series.quantile(0.50)
percentile_75 = salary_usd_series.quantile(0.75)

salary_quartiles = pd.Series({
    '25th Percentile':percentile_25,
    '50th Percentile':percentile_50,
    '75th Percentile':percentile_75
})

salary_quartiles

Out[19]:

25th Percentile     95000.0
50th Percentile    135000.0
75th Percentile    175000.0
dtype: float64

12. Which method is used to apply a function to every element in a Series?¶

In [ ]:

13. Create a new Series, `increased_salary`, by applying a `10%` increase to each salary in the `salary_usd_series` Series.¶

In [21]:

# Enter your code here
import pandas as pd
salary_usd_series = df['salary_in_usd']
increased_salary = pd.Series(salary_usd_series.apply(lambda x:x*1.1))
increased_salary

Out[21]:

0        94431.7
1        33000.0
2        28050.0
3       192500.0
4       132000.0
          ...   
3750    453200.0
3751    166100.0
3752    115500.0
3753    110000.0
3754    104131.5
Name: salary_in_usd, Length: 3755, dtype: float64

14. What does the operation series1 > series2 return?¶

In [ ]:

15. Compare the `increased_salary` Series with the `salary_usd_series` element-wise to check for equality.¶

In [23]:

# Enter your code here
import pandas as pd
salary_usd_series = df['salary_in_usd']
increased_salary = pd.Series(salary_usd_series.apply(lambda x:x*1.1))
salary_compare_series = increased_salary == salary_usd_series
salary_compare_series

Out[23]:

0       False
1       False
2       False
3       False
4       False
        ...  
3750    False
3751    False
3752    False
3753    False
3754    False
Name: salary_in_usd, Length: 3755, dtype: bool

In [ ]:

Statement of Completion#d5442f66

Intro to Pandas for Data Analysis

Exploring Data Science Salaries: A Pandas Series Analysis

1. Which method is used to display basic information about a pandas series?¶

2. Assign the employee_residence column to the employee_residence_series variable as a Series.¶

3. Create a Series from the experience_level column and store the first 10 elements in the experience_level_series_10 variable.¶

4. What does the len() function return when applied to a Series?¶

5. Find the unique values in company_size_series along with their counts, and store the results in the company_size_counts_series variable.¶

6. Which method calculates the average value of a Series?¶

7. Calculate the mean, median, and standard deviation of salary_usd_series, and store these values as a Series in the salary_details variable.¶

8. What method would you use to count unique values in a Series?¶

9. Identify the top 5 most frequent job titles and store them in the top_5_job_titles variable.¶

10. Which method would you use to find the most frequent value in a Series?¶

11. Calculate the 25th, 50th, and 75th percentiles of salary_usd_series and store these values as a Series in the salary_quartiles variable.¶

12. Which method is used to apply a function to every element in a Series?¶

13. Create a new Series, increased_salary, by applying a 10% increase to each salary in the salary_usd_series Series.¶

14. What does the operation series1 > series2 return?¶

15. Compare the increased_salary Series with the salary_usd_series element-wise to check for equality.¶

2. Assign the `employee_residence` column to the `employee_residence_series` variable as a Series.¶

3. Create a Series from the `experience_level` column and store the first 10 elements in the `experience_level_series_10` variable.¶

4. What does the `len()` function return when applied to a Series?¶

5. Find the unique values in `company_size_series` along with their counts, and store the results in the `company_size_counts_series` variable.¶

7. Calculate the mean, median, and standard deviation of `salary_usd_series`, and store these values as a Series in the `salary_details variable`.¶

9. Identify the top 5 most frequent job titles and store them in the `top_5_job_titles` variable.¶

11. Calculate the 25th, 50th, and 75th percentiles of `salary_usd_series` and store these values as a Series in the `salary_quartiles` variable.¶

13. Create a new Series, `increased_salary`, by applying a `10%` increase to each salary in the `salary_usd_series` Series.¶

15. Compare the `increased_salary` Series with the `salary_usd_series` element-wise to check for equality.¶