In [1]:

import pandas as pd

Intro to Series¶

Take a look the following list of companies:

No description has been provided for this image

We'll represent them using a Series in the following way:

In [2]:

companies = [
    'Apple', 'Samsung', 'Alphabet', 'Foxconn',
    'Microsoft', 'Huawei', 'Dell Technologies',
    'Meta', 'Sony', 'Hitachi', 'Intel',
    'IBM', 'Tencent', 'Panasonic'
]

In [3]:

s = pd.Series([
    274515, 200734, 182527, 181945, 143015,
    129184, 92224, 85965, 84893, 82345,
    77867, 73620, 69864, 63191],
    index=companies,
    name="Top Technology Companies by Revenue")

In [4]:

Out[4]:

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Top Technology Companies by Revenue, dtype: int64

1. Check your knowledge: build a series¶

Create a series called my_series

In [12]:

my_series = pd.Series([9,11,-5],index=['a','b','c'], name="My First Series")

In [13]:

my_series

Out[13]:

a     9
b    11
c    -5
Name: My First Series, dtype: int64

Basic selection and location¶

Selecting by index:¶

In [6]:

s['Apple']

Out[6]:

.loc is the preferred way:

In [7]:

s.loc['Apple']

Out[7]:

Selection by position:¶

In [8]:

s.iloc[0]

Out[8]:

In [9]:

s.iloc[-1]

Out[9]:

Errors in selection:¶

In [14]:

# this code will fail
s.loc["Non existent company"]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Non existent company'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[14], line 2
      1 # this code will fail
----> 2 s.loc["Non existent company"]

File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1191, in _LocationIndexer.__getitem__(self, key)
   1189 maybe_callable = com.apply_if_callable(key, self.obj)
   1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable)
-> 1191 return self._getitem_axis(maybe_callable, axis=axis)

File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1431, in _LocIndexer._getitem_axis(self, key, axis)
   1429 # fall thru to straight lookup
   1430 self._validate_key(key, axis)
-> 1431 return self._get_label(key, axis=axis)

File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1381, in _LocIndexer._get_label(self, label, axis)
   1379 def _get_label(self, label, axis: AxisInt):
   1380     # GH#5567 this will fail if the label is not present in the axis.
-> 1381     return self.obj.xs(label, axis=axis)

File /usr/local/lib/python3.11/site-packages/pandas/core/generic.py:4301, in NDFrame.xs(self, key, axis, level, drop_level)
   4299             new_index = index[loc]
   4300 else:
-> 4301     loc = index.get_loc(key)
   4303     if isinstance(loc, np.ndarray):
   4304         if loc.dtype == np.bool_:

File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'Non existent company'

In [15]:

# This code also fails, 132 it's out of boundaries
# (there are not so many elements in the Series)
s.iloc[132]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[15], line 3
      1 # This code also fails, 132 it's out of boundaries
      2 # (there are not so many elements in the Series)
----> 3 s.iloc[132]

File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1191, in _LocationIndexer.__getitem__(self, key)
   1189 maybe_callable = com.apply_if_callable(key, self.obj)
   1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable)
-> 1191 return self._getitem_axis(maybe_callable, axis=axis)

File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1752, in _iLocIndexer._getitem_axis(self, key, axis)
   1749     raise TypeError("Cannot index by location index with a non-integer key")
   1751 # validate the location
-> 1752 self._validate_integer(key, axis)
   1754 return self.obj._ixs(key, axis=axis)

File /usr/local/lib/python3.11/site-packages/pandas/core/indexing.py:1685, in _iLocIndexer._validate_integer(self, key, axis)
   1683 len_axis = len(self.obj._get_axis(axis))
   1684 if key >= len_axis or key < -len_axis:
-> 1685     raise IndexError("single positional indexer is out-of-bounds")

IndexError: single positional indexer is out-of-bounds

We could prevent these errors using the membership check in:

In [16]:

"Apple" in s

Out[16]:

True

In [17]:

"Snapchat" in s

Out[17]:

False

Multiple selection:¶

By index:

In [18]:

s[['Apple', 'Intel', 'Sony']]

Out[18]:

Apple    274515
Intel     77867
Sony      84893
Name: Top Technology Companies by Revenue, dtype: int64

By position:

In [19]:

s.iloc[[0, 5, -1]]

Out[19]:

Apple        274515
Huawei       129184
Panasonic     63191
Name: Top Technology Companies by Revenue, dtype: int64

Activities:¶

2. Check your knowledge: location by index¶

Select the revenue of Intel and store it in a variable named intel_revenue:

In [21]:

intel_revenue = s.loc['Intel']
intel_revenue

Out[21]:

3. Check your knowledge: location by position¶

Select the revenue of the "second to last" element in our series s and store it in a variable named second_to_last:

In [24]:

second_to_last = s.iloc[-2]
second_to_last

Out[24]:

4. Check your knowledge: multiple selection¶

Use multiple label selection to retrieve the revenues of the companies:

Samsung
Dell Technologies
Panasonic
Microsoft

In [27]:

sub_series = s.loc[['Samsung','Dell Technologies','Panasonic','Microsoft']]
sub_series

Out[27]:

Samsung              200734
Dell Technologies     92224
Panasonic             63191
Microsoft            143015
Name: Top Technology Companies by Revenue, dtype: int64

Series Attributes and Methods¶

In [29]:

s.head()

Out[29]:

Apple        274515
Samsung      200734
Alphabet     182527
Foxconn      181945
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

In [30]:

s.tail()

Out[30]:

Hitachi      82345
Intel        77867
IBM          73620
Tencent      69864
Panasonic    63191
Name: Top Technology Companies by Revenue, dtype: int64

Main Attributes¶

The underlying data:

In [31]:

s.values

Out[31]:

array([274515, 200734, 182527, 181945, 143015, 129184,  92224,  85965,
        84893,  82345,  77867,  73620,  69864,  63191])

The index:

In [32]:

s.index

Out[32]:

Index(['Apple', 'Samsung', 'Alphabet', 'Foxconn', 'Microsoft', 'Huawei',
       'Dell Technologies', 'Meta', 'Sony', 'Hitachi', 'Intel', 'IBM',
       'Tencent', 'Panasonic'],
      dtype='object')

The name (if any):

In [33]:

s.name

Out[33]:

'Top Technology Companies by Revenue'

The type associated with the values:

In [34]:

s.dtype

Out[34]:

dtype('int64')

The size of the Series:

In [35]:

s.size

Out[35]:

len also works:

In [36]:

len(s)

Out[36]:

Statistical methods¶

In [37]:

s.describe()

Out[37]:

count        14.000000
mean     124420.642857
std       63686.481231
min       63191.000000
25%       78986.500000
50%       89094.500000
75%      172212.500000
max      274515.000000
Name: Top Technology Companies by Revenue, dtype: float64

In [38]:

s.mean()

Out[38]:

124420.64285714286

In [39]:

s.median()

Out[39]:

89094.5

In [40]:

s.std()

Out[40]:

63686.48123135607

In [41]:

s.min(), s.max()

Out[41]:

(63191, 274515)

In [42]:

s.quantile(.75)

Out[42]:

172212.5

In [43]:

s.quantile(.99)

Out[43]:

264923.47

Activities¶

In [44]:

# Run this cell to complete the activity
american_companies = s[[
    'Meta', 'IBM', 'Microsoft',
    'Dell Technologies', 'Apple', 'Intel', 'Alphabet'
]]
american_companies

Out[44]:

Meta                  85965
IBM                   73620
Microsoft            143015
Dell Technologies     92224
Apple                274515
Intel                 77867
Alphabet             182527
Name: Top Technology Companies by Revenue, dtype: int64

5. What's the average revenue of American Companies?¶

In [47]:

american_companies.mean()

Out[47]:

132819.0

6. What's the median revenue of American Companies?¶

In [48]:

american_companies.median()

Out[48]:

92224.0

Sorting Series¶

Sorting by values or Index¶

Sorting by values, notice it's in "ascending mode":

In [49]:

s.sort_values()

Out[49]:

Panasonic             63191
Tencent               69864
IBM                   73620
Intel                 77867
Hitachi               82345
Sony                  84893
Meta                  85965
Dell Technologies     92224
Huawei               129184
Microsoft            143015
Foxconn              181945
Alphabet             182527
Samsung              200734
Apple                274515
Name: Top Technology Companies by Revenue, dtype: int64

Sorting by index (lexicographically by company's name), notice it's in ascending mode:

In [50]:

s.sort_index()

Out[50]:

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                   73620
Intel                 77867
Meta                  85965
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Name: Top Technology Companies by Revenue, dtype: int64

To sort in descending mode:

In [51]:

s.sort_values(ascending=False).head()

Out[51]:

Apple        274515
Samsung      200734
Alphabet     182527
Foxconn      181945
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

In [52]:

s.sort_index(ascending=False).head()

Out[52]:

Tencent       69864
Sony          84893
Samsung      200734
Panasonic     63191
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

Activities¶

7. What company has the largest revenue?¶

In [57]:

s.sort_values(ascending=False)

Out[57]:

Apple                274515
Samsung              200734
Alphabet             182527
Foxconn              181945
Microsoft            143015
Huawei               129184
Dell Technologies     92224
Meta                  85965
Sony                  84893
Hitachi               82345
Intel                 77867
IBM                   73620
Tencent               69864
Panasonic             63191
Name: Top Technology Companies by Revenue, dtype: int64

8. Sort company names lexicographically. Which one comes first?¶

In [58]:

s.sort_index(ascending=True)

Out[58]:

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                   73620
Intel                 77867
Meta                  85965
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Name: Top Technology Companies by Revenue, dtype: int64

Immutability¶

Run the sort methods above and check the series again, you'll see that s has NOT changed:

In [59]:

s.head()

Out[59]:

Apple        274515
Samsung      200734
Alphabet     182527
Foxconn      181945
Microsoft    143015
Name: Top Technology Companies by Revenue, dtype: int64

We will sort the series by revenue, ascending, and we'll mutate the original one. Notice how the method doesn't return anything:

In [60]:

s.sort_values(inplace=True)

But now the series is sorted by revenue in ascending order:

In [61]:

s.head()

Out[61]:

Panasonic    63191
Tencent      69864
IBM          73620
Intel        77867
Hitachi      82345
Name: Top Technology Companies by Revenue, dtype: int64

We'll now sort the series by index, mutating it again:

In [62]:

s.sort_index(inplace=True)

In [63]:

s.head()

Out[63]:

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Name: Top Technology Companies by Revenue, dtype: int64

Activities¶

9. Sort American Companies by Revenue¶

In [66]:

american_companies_desc = american_companies.sort_values(ascending=False)

10. Sort (and mutate) international companies¶

In [70]:

# Run this cell to complete the activity
international_companies = s[[
    "Sony", "Tencent", "Panasonic",
    "Samsung", "Hitachi", "Foxconn", "Huawei"
]]
international_companies.sort_values(ascending=False, inplace=True)

In [72]:

international_companies

Out[72]:

Samsung      200734
Foxconn      181945
Huawei       129184
Sony          84893
Hitachi       82345
Tencent       69864
Panasonic     63191
Name: Top Technology Companies by Revenue, dtype: int64

Modifying series¶

Modifying values:

In [73]:

s['IBM']  = 0

In [74]:

s.sort_values().head()

Out[74]:

IBM              0
Panasonic    63191
Tencent      69864
Intel        77867
Hitachi      82345
Name: Top Technology Companies by Revenue, dtype: int64

Adding elements:

In [75]:

s['Tesla'] = 21450

In [76]:

s.sort_values().head()

Out[76]:

IBM              0
Tesla        21450
Panasonic    63191
Tencent      69864
Intel        77867
Name: Top Technology Companies by Revenue, dtype: int64

Removing elements:

In [77]:

del s['Tesla']

In [78]:

s.sort_values().head()

Out[78]:

IBM              0
Panasonic    63191
Tencent      69864
Intel        77867
Hitachi      82345
Name: Top Technology Companies by Revenue, dtype: int64

Activities¶

11. Insert Amazon's Revenue¶

In [81]:

s['Amazon']=469_822

12. Delete the revenue of Meta¶

In [83]:

del s['Meta']

Concatenating Series¶

We can append series to other series using the .concat() method:

In [85]:

another_s = pd.Series([21_450, 4_120], index=['Tesla', 'Snapchat'])

In [86]:

another_s

Out[86]:

Tesla       21450
Snapchat     4120
dtype: int64

In [87]:

s_new = pd.concat([s, another_s])

The original series s is not modified:

In [88]:

Out[88]:

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                       0
Intel                 77867
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Amazon               469822
Name: Top Technology Companies by Revenue, dtype: int64

s_new is the concatenation of s and s2:

In [89]:

s_new

Out[89]:

Alphabet             182527
Apple                274515
Dell Technologies     92224
Foxconn              181945
Hitachi               82345
Huawei               129184
IBM                       0
Intel                 77867
Microsoft            143015
Panasonic             63191
Samsung              200734
Sony                  84893
Tencent               69864
Amazon               469822
Tesla                 21450
Snapchat               4120
dtype: int64

Statement of Completion#830ef269

Intro to Pandas for Data Analysis

Intro to Pandas Series

Intro to Series¶

1. Check your knowledge: build a series¶

Basic selection and location¶

Selecting by index:¶

Selection by position:¶

Errors in selection:¶

Multiple selection:¶

Activities:¶

2. Check your knowledge: location by index¶

3. Check your knowledge: location by position¶

4. Check your knowledge: multiple selection¶

Series Attributes and Methods¶

Main Attributes¶

Statistical methods¶

Activities¶

5. What's the average revenue of American Companies?¶

6. What's the median revenue of American Companies?¶

Sorting Series¶

Sorting by values or Index¶

Activities¶

7. What company has the largest revenue?¶

8. Sort company names lexicographically. Which one comes first?¶

Immutability¶

Activities¶

9. Sort American Companies by Revenue¶

10. Sort (and mutate) international companies¶

Modifying series¶

Activities¶

11. Insert Amazon's Revenue¶

12. Delete the revenue of Meta¶

Concatenating Series¶

The End!¶