Statement of Completion#50eb93da
Intro to Pandas for Data Analysis
medium
Exploring Data Science Salaries: A Pandas Series Analysis
Resolution
Activities
In [1]:
import numpy as np
import pandas as pd
In [2]:
df = pd.read_csv('Data_Science_Salaries.xls')
In [3]:
df.head()
Out[3]:
| work_year | experience_level | employment_type | job_title | salary_in_usd | employee_residence | remote_ratio | company_location | company_size | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023 | SE | FT | Principal Data Scientist | 85847 | ES | 100 | ES | L |
| 1 | 2023 | MI | CT | ML Engineer | 30000 | US | 100 | US | S |
| 2 | 2023 | MI | CT | ML Engineer | 25500 | US | 100 | US | S |
| 3 | 2023 | SE | FT | Data Scientist | 175000 | CA | 100 | CA | M |
| 4 | 2023 | SE | FT | Data Scientist | 120000 | CA | 100 | CA | M |
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3755 entries, 0 to 3754 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 work_year 3755 non-null int64 1 experience_level 3755 non-null object 2 employment_type 3755 non-null object 3 job_title 3755 non-null object 4 salary_in_usd 3755 non-null int64 5 employee_residence 3755 non-null object 6 remote_ratio 3755 non-null int64 7 company_location 3755 non-null object 8 company_size 3755 non-null object dtypes: int64(3), object(6) memory usage: 264.1+ KB
1. Which method is used to display basic information about a pandas series?¶
In [ ]:
2. Assign the employee_residence column to the employee_residence_series variable as a Series.¶
In [3]:
employee_residence_series = df['employee_residence']
3. Create a Series from the experience_level column and store the first 10 elements in the experience_level_series_10 variable.¶
In [7]:
experience_level_series_10 = df['experience_level'].iloc[:10]
4. What does the len() function return when applied to a Series?¶
In [ ]:
5. Find the unique values in company_size_series along with their counts, and store the results in the company_size_series_counts variable.¶
In [9]:
company_size_series = df['company_size']
company_size_counts_series = company_size_series.value_counts()
6. Which method calculates the average value of a Series?¶
In [ ]:
7. Calculate the mean, median, and standard deviation of salary_usd_series, and store these values as a Series in the salary_details variable.¶
In [15]:
salary_usd_series = df['salary_in_usd']
salary_details = pd.Series({
'Mean':salary_usd_series.mean(),
'Median':salary_usd_series.median(),
'Standard Deviation':salary_usd_series.std()
})
salary_details
Out[15]:
Mean 137570.389880 Median 135000.000000 Standard Deviation 63055.625278 dtype: float64
8. What method would you use to count unique values in a Series?¶
In [19]:
salary_usd_series.nunique()
Out[19]:
1035
9. Identify the top 5 most frequent job titles and store them in the top_5_job_titles variable.¶
In [40]:
job_title_series = df['job_title'].value_counts()
top_5_job_titles = job_title_series.iloc[:5]
top_5_job_titles
Out[40]:
job_title Data Engineer 1040 Data Scientist 840 Data Analyst 612 Machine Learning Engineer 289 Analytics Engineer 103 Name: count, dtype: int64
10. Which method would you use to find the most frequent value in a Series?¶
In [45]:
job_title_series.mode()
Out[45]:
0 1 Name: count, dtype: int64
11. Calculate the 25th, 50th, and 75th percentiles of salary_usd_series and store these values as a Series in the salary_quartiles variable.¶
In [56]:
salary_quartiles = salary_usd_series.quantile([0.25,0.5,0.75])
salary_quartiles.index = ['25th Percentile','50th Percentile','75th Percentile']
salary_quartiles
Out[56]:
25th Percentile 95000.0 50th Percentile 135000.0 75th Percentile 175000.0 Name: salary_in_usd, dtype: float64
12. Which method is used to apply a function to every element in a Series?¶
In [ ]:
13. Create a new Series, increased_salary, by applying a 10% increase to each salary in the salary_usd_series Series.¶
In [60]:
increased_salary = salary_usd_series + (salary_usd_series*10/100)
increased_salary
Out[60]:
0 94431.7
1 33000.0
2 28050.0
3 192500.0
4 132000.0
...
3750 453200.0
3751 166100.0
3752 115500.0
3753 110000.0
3754 104131.5
Name: salary_in_usd, Length: 3755, dtype: float64
14. What does the operation series1 > series2 return?¶
In [ ]:
15. Compare the increased_salary Series with the salary_usd_series element-wise to check for equality.¶
In [65]:
salary_compare_series = increased_salary == salary_usd_series