Marcos Ycaro Barros Trindade has successfully completed this project.

Intro to Pandas for Data Analysis

medium

4.72

Exploring Data Science Salaries: A Pandas Series Analysis

Finished

September 23, 2024 8:16 PM

Elapsed time (min)

Completed activities

Resolution

Activities

In [1]:

import numpy as np
import pandas as pd

In [2]:

df = pd.read_csv('Data_Science_Salaries.xls')

In [3]:

df.head()

Out[3]:

	work_year	experience_level	employment_type	job_title	salary_in_usd	employee_residence	remote_ratio	company_location	company_size
0	2023	SE	FT	Principal Data Scientist	85847	ES	100	ES	L
1	2023	MI	CT	ML Engineer	30000	US	100	US	S
2	2023	MI	CT	ML Engineer	25500	US	100	US	S
3	2023	SE	FT	Data Scientist	175000	CA	100	CA	M
4	2023	SE	FT	Data Scientist	120000	CA	100	CA	M

In [4]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3755 entries, 0 to 3754
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   work_year           3755 non-null   int64 
 1   experience_level    3755 non-null   object
 2   employment_type     3755 non-null   object
 3   job_title           3755 non-null   object
 4   salary_in_usd       3755 non-null   int64 
 5   employee_residence  3755 non-null   object
 6   remote_ratio        3755 non-null   int64 
 7   company_location    3755 non-null   object
 8   company_size        3755 non-null   object
dtypes: int64(3), object(6)
memory usage: 264.1+ KB

1. Which method is used to display basic information about a pandas series?¶

In [ ]:

2. Assign the `employee_residence` column to the `employee_residence_series` variable as a Series.¶

In [3]:

employee_residence_series = df['employee_residence']

3. Create a Series from the `experience_level` column and store the first 10 elements in the `experience_level_series_10` variable.¶

In [7]:

experience_level_series_10 = df['experience_level'].iloc[:10]

4. What does the `len()` function return when applied to a Series?¶

In [ ]:

5. Find the unique values in `company_size_series` along with their counts, and store the results in the `company_size_series_counts` variable.¶

In [9]:

company_size_series = df['company_size']
company_size_counts_series = company_size_series.value_counts()

6. Which method calculates the average value of a Series?¶

In [ ]:

7. Calculate the mean, median, and standard deviation of `salary_usd_series`, and store these values as a Series in the `salary_details variable`.¶

In [15]:

salary_usd_series = df['salary_in_usd']
salary_details = pd.Series({
    'Mean':salary_usd_series.mean(),
    'Median':salary_usd_series.median(),
    'Standard Deviation':salary_usd_series.std()
})
salary_details

Out[15]:

Mean                  137570.389880
Median                135000.000000
Standard Deviation     63055.625278
dtype: float64

8. What method would you use to count unique values in a Series?¶

In [19]:

salary_usd_series.nunique()

Out[19]:

9. Identify the top 5 most frequent job titles and store them in the `top_5_job_titles` variable.¶

In [40]:

job_title_series = df['job_title'].value_counts()
top_5_job_titles = job_title_series.iloc[:5]
top_5_job_titles

Out[40]:

job_title
Data Engineer                1040
Data Scientist                840
Data Analyst                  612
Machine Learning Engineer     289
Analytics Engineer            103
Name: count, dtype: int64

10. Which method would you use to find the most frequent value in a Series?¶

In [45]:

job_title_series.mode()

Out[45]:

0    1
Name: count, dtype: int64

11. Calculate the 25th, 50th, and 75th percentiles of `salary_usd_series` and store these values as a Series in the `salary_quartiles` variable.¶

In [56]:

salary_quartiles = salary_usd_series.quantile([0.25,0.5,0.75])
salary_quartiles.index = ['25th Percentile','50th Percentile','75th Percentile']
salary_quartiles

Out[56]:

25th Percentile     95000.0
50th Percentile    135000.0
75th Percentile    175000.0
Name: salary_in_usd, dtype: float64

12. Which method is used to apply a function to every element in a Series?¶

In [ ]:

13. Create a new Series, `increased_salary`, by applying a `10%` increase to each salary in the `salary_usd_series` Series.¶

In [60]:

increased_salary = salary_usd_series + (salary_usd_series*10/100)
increased_salary

Out[60]:

0        94431.7
1        33000.0
2        28050.0
3       192500.0
4       132000.0
          ...   
3750    453200.0
3751    166100.0
3752    115500.0
3753    110000.0
3754    104131.5
Name: salary_in_usd, Length: 3755, dtype: float64

14. What does the operation series1 > series2 return?¶

In [ ]:

15. Compare the `increased_salary` Series with the `salary_usd_series` element-wise to check for equality.¶

In [65]:

salary_compare_series = increased_salary == salary_usd_series

Statement of Completion#50eb93da

Intro to Pandas for Data Analysis

Exploring Data Science Salaries: A Pandas Series Analysis

1. Which method is used to display basic information about a pandas series?¶

2. Assign the employee_residence column to the employee_residence_series variable as a Series.¶

3. Create a Series from the experience_level column and store the first 10 elements in the experience_level_series_10 variable.¶

4. What does the len() function return when applied to a Series?¶

5. Find the unique values in company_size_series along with their counts, and store the results in the company_size_series_counts variable.¶

6. Which method calculates the average value of a Series?¶

7. Calculate the mean, median, and standard deviation of salary_usd_series, and store these values as a Series in the salary_details variable.¶

8. What method would you use to count unique values in a Series?¶

9. Identify the top 5 most frequent job titles and store them in the top_5_job_titles variable.¶

10. Which method would you use to find the most frequent value in a Series?¶

11. Calculate the 25th, 50th, and 75th percentiles of salary_usd_series and store these values as a Series in the salary_quartiles variable.¶

12. Which method is used to apply a function to every element in a Series?¶

13. Create a new Series, increased_salary, by applying a 10% increase to each salary in the salary_usd_series Series.¶

14. What does the operation series1 > series2 return?¶

15. Compare the increased_salary Series with the salary_usd_series element-wise to check for equality.¶

2. Assign the `employee_residence` column to the `employee_residence_series` variable as a Series.¶

3. Create a Series from the `experience_level` column and store the first 10 elements in the `experience_level_series_10` variable.¶

4. What does the `len()` function return when applied to a Series?¶

5. Find the unique values in `company_size_series` along with their counts, and store the results in the `company_size_series_counts` variable.¶

7. Calculate the mean, median, and standard deviation of `salary_usd_series`, and store these values as a Series in the `salary_details variable`.¶

9. Identify the top 5 most frequent job titles and store them in the `top_5_job_titles` variable.¶

11. Calculate the 25th, 50th, and 75th percentiles of `salary_usd_series` and store these values as a Series in the `salary_quartiles` variable.¶

13. Create a new Series, `increased_salary`, by applying a `10%` increase to each salary in the `salary_usd_series` Series.¶

15. Compare the `increased_salary` Series with the `salary_usd_series` element-wise to check for equality.¶