Statement of Completion#e2d8a4ac
Intro to Pandas for Data Analysis
medium
Practicing Vectorized Operations with Argentina's Oceanographic Data
Resolution
Activities
In [1]:
import pandas as pd
Understanding the data¶
Basic weather data:
In [2]:
pd.read_csv('met2016.csv', index_col="Fecha", parse_dates=True).head()
Out[2]:
Temperatura_Max | Humedad_Max | Vel.Viento_Km/h_Max | Temperatura_Min | Humedad_Min | Vel.Viento_Km/h_Min | Temperatura_Prom | Humedad_Prom | Vel.Viento_Km/h_Prom | Direccion_Viento | |
---|---|---|---|---|---|---|---|---|---|---|
Fecha | ||||||||||
2016-01-01 | 25.4 | 99.0 | 39.5 | 16.8 | 41.1 | 4.1 | 19.5 | 80.2 | 19.6 | NE |
2016-01-02 | 31.3 | 99.0 | 27.6 | 16.2 | 17.0 | 0.0 | 22.7 | 68.5 | 12.1 | NE |
2016-01-03 | 23.5 | 85.4 | 38.4 | 8.2 | 48.0 | 1.1 | 17.7 | 71.8 | 15.8 | SE |
2016-01-04 | 20.7 | 99.0 | 32.3 | 13.7 | 33.4 | 0.1 | 16.9 | 73.8 | 12.5 | ESE |
2016-01-05 | 25.2 | 86.0 | 41.8 | 11.8 | 22.0 | 0.9 | 18.2 | 56.7 | 15.5 | NE |
Sea water readings:
In [3]:
pd.read_csv('mediciones2016.csv', index_col="Fecha", parse_dates=True).head()
Out[3]:
Latitud | Longitud | Clorofila(ug/L) | Fosfato(uM) | Silicato(uM) | Nitrito+Nitrato(uM) | Temperatura(C) | Salinidad | |
---|---|---|---|---|---|---|---|---|
Fecha | ||||||||
2016-02-11 | -43.333333 | -65.033333 | 3.50 | 0.0 | 0.0 | 0.0 | 19.5 | 33.0 |
2016-04-07 | -43.333333 | -65.033333 | 3.34 | 0.0 | 0.0 | 0.0 | 15.9 | 30.9 |
2016-04-21 | -43.333333 | -65.033333 | 2.40 | 0.0 | 0.0 | 0.0 | 12.7 | 31.2 |
2016-04-28 | -43.333333 | -65.033333 | 5.94 | 0.0 | 0.0 | 0.0 | 11.7 | 32.8 |
2016-05-05 | -43.333333 | -65.033333 | 2.16 | 0.0 | 0.0 | 0.0 | 11.7 | 30.6 |
Derived series for this project:
In [5]:
med_df = pd.read_csv('mediciones2016.csv', index_col="Fecha", parse_dates=True)
chlorophyll = med_df['Clorofila(ug/L)']
phosphate = med_df['Fosfato(uM)']
silicate = med_df['Silicato(uM)']
nitrite_nitrate = med_df['Nitrito+Nitrato(uM)']
salinity = med_df['Salinidad']
sea_temperature = med_df['Temperatura(C)']
In [6]:
weather_df = pd.read_csv('met2016.csv', index_col="Fecha", parse_dates=True)
temp_max = weather_df['Temperatura_Max']
humidity_max = weather_df['Humedad_Max']
wind_speed_max = weather_df['Vel.Viento_Km/h_Max']
temp_min = weather_df['Temperatura_Min']
humidity_min = weather_df['Humedad_Min']
wind_speed_min = weather_df['Vel.Viento_Km/h_Min']
temp_avg = weather_df['Temperatura_Prom']
humidity_avg = weather_df['Humedad_Prom']
wind_speed_avg = weather_df['Vel.Viento_Km/h_Prom']
wind_direction = weather_df['Direccion_Viento']
Data Analysis¶
There's not much to do here, but it's interesting to see the relationship between some of these variables:
In [7]:
import matplotlib.pyplot as plt
def plot_on_axis(axis, series, color, secondary_axis=False):
if secondary_axis:
ax = axis.twinx()
else:
ax = axis
ax.plot(series.index, series, color=color, label=series.name)
ax.set_ylabel(series.name, color=color)
ax.tick_params(axis='y', labelcolor=color)
return ax
def compare_plots(axis, series1, series2, color1, color2):
plot_on_axis(axis, series1, color1)
plot_on_axis(axis, series2, color2, secondary_axis=True)
axis.set_title(f'{series1.name} vs {series2.name}')
In [8]:
fig, axes = plt.subplots(3, 1, figsize=(14, 12))
compare_plots(axes[0], weather_df['Temperatura_Prom'], weather_df['Humedad_Prom'], 'r', 'b')
compare_plots(axes[1], weather_df['Temperatura_Prom'], weather_df['Vel.Viento_Km/h_Prom'], 'r', 'g')
compare_plots(axes[2], weather_df['Humedad_Prom'], weather_df['Vel.Viento_Km/h_Prom'], 'b', 'g')
fig.tight_layout()
In [9]:
fig, axes = plt.subplots(3, 1, figsize=(14, 12))
compare_plots(axes[0], med_df['Temperatura(C)'], med_df['Clorofila(ug/L)'], 'r', 'g')
compare_plots(axes[1], med_df['Temperatura(C)'], med_df['Salinidad'], 'r', 'b')
compare_plots(axes[2], med_df['Salinidad'], med_df['Clorofila(ug/L)'], 'b', 'g')
fig.tight_layout()
Warm Up activities¶
Recap of the series available:
In [ ]:
temp_max.head()
In [ ]:
temp_min.head()
In [ ]:
humidity_max.head()
In [ ]:
# ... many others ...
# see above
1. What's the Maximum value of the maximum temperatures measured?¶
In [13]:
temp_max.max()
Out[13]:
39.5
2. What's the Maximum value of the minimum temperatures measured?¶
In [14]:
temp_min.max()
Out[14]:
20.2
3. What's the Minimum value of all the humidity ever measured?¶
In [19]:
humidity_min.min()
Out[19]:
4.0
Activities¶
4. Create the series temp_range
¶
In [20]:
temp_range = temp_max - temp_min
5. Create the series humidity_range
¶
In [22]:
humidity_range = humidity_max - humidity_min
6. Create the series wind_speed_range
¶
In [25]:
wind_speed_range = wind_speed_max - wind_speed_min
7. Create the series chlorophyll_normalized
¶
In [27]:
chlorophyll_normalized = (chlorophyll - chlorophyll.mean())/chlorophyll.std()
8. Create the series density_anomaly
¶
In [29]:
density_anomaly = 1000 - (0.2 * sea_temperature) + (0.8 * salinity)
9. Create the series wind_chill_index
¶
In [32]:
wind_chill_index = 13.12 + 0.6215 * temp_min - 11.37 * (wind_speed_max ** 0.16) + 0.3965 * temp_min * (wind_speed_max ** 0.16)
10. Compute the series ocean_nutrient_index
¶
In [34]:
ocean_nutrient_index = (phosphate + silicate + nitrite_nitrate) / 3
11. Create the series humidity_comfort_index
¶
In [36]:
humidity_comfort_index = temp_avg - (0.55 - 0.0055 * humidity_avg) * (temp_avg - 14.5)
12. Create the series storm_potential_index
¶
In [38]:
storm_potential_index = (wind_speed_max * humidity_max * temp_max) / 1000
13. Create the series comfort_temp_diff_ratio
¶
In [40]:
comfort_temp_diff_ratio = humidity_comfort_index / temp_range
14. Create the series storm_wind_correlation
¶
In [42]:
storm_wind_correlation = storm_potential_index * wind_speed_range