To compare annual average temperature records from Portugal against corresponding world annual average temperature records between 1750 and 2013, we take the following steps:
Download regional and global temperature trends from http://berkeleyearth.lbl.gov using the command line (see Bash
script in annex);
Load, prepare, and visualize the trends using Python
(see script in annex):
2.1. Open, clean the CSV table with pandas
;
2.2. Produce a line chart showing a moving average of the temperatures with plotnine
1 plotnine
is based on Matplotlib
and implements ggplot2
’s grammar of graphics, well-suited for exploratory analyses.. plotnine
’s moving average smoothing method is based on the pandas.rolling()
function.
Display the results in R Markdown, following the tufte
layout.
Figure 1 shows a number of interesting observations:
Over the past 250 years, Portugal has been consistently warmer than the rest of the world by about 7ºC.
Year-over-year temperatures appear to oscillate more at a regional level. The ruggedness of the moving average and the level of dispersion of the data points seems higher for Portugal. By contrast, world average temperatures appear to typically more consistent from a year to the next. Possibly, they do not suffer from regional fluctuations.
The 20th century appears to show a regional and global warming trend.
#!/bin/bash
echo 'parsing monthly average temperatures for Portugal...';
curl -s http://berkeleyearth.lbl.gov/auto/Regional/TAVG/Text/portugal-TAVG-Trend.txt \
| egrep "^% Estimated Jan(.*)+monthly" -A 2 \
| tail -n 1 | tr -d "%" | tr -s '[:blank:]' \
| cut -c 2- \
| tr ' ' '\n' \
> portugal_monthly_avg.csv;
echo 'parsing monthly historic temperature anomalies for Portugal...';
curl -s http://berkeleyearth.lbl.gov/auto/Regional/TAVG/Text/portugal-TAVG-Trend.txt \
| egrep -v "^%|^( )?$" \
| tr -s '[:blank:]' \
| cut -c 2- \
| cut -d ' ' -f 1,2,3 \
| tr ' ' ',' \
> portugal_monthly_anom.csv;
echo 'parsing monthly average temperatures worldwide...';
curl -s http://berkeleyearth.lbl.gov/auto/Regional/TAVG/Text/global-land-TAVG-Trend.txt \
| egrep "^% Estimated Jan(.*)+monthly" -A 2 \
| tail -n 1 | tr -d "%" | tr -s '[:blank:]' \
| cut -c 2- \
| tr ' ' '\n' \
> world_monthly_avg.csv;
echo 'parsing monthly historic temperature anomalies worldwide...';
curl -s http://berkeleyearth.lbl.gov/auto/Regional/TAVG/Text/global-land-TAVG-Trend.txt \
| egrep -v "^%|^( )?$" \
| tr -s '[:blank:]' \
| cut -c 2- \
| cut -d ' ' -f 1,2,3 \
| tr ' ' ',' \
> world_monthly_anom.csv;
import numpy as np
import pandas as pd
from plotnine import *
# load data files
reference_pt = pd.read_csv('../data/portugal_monthly_avg.csv', \
header = None, \
names = ['reference'])
temperature_pt = pd.read_csv('../data/portugal_monthly_anom.csv', \
header = None, \
names = ['year', 'month', 'delta'])
reference_world = pd.read_csv('../data/world_monthly_avg.csv', \
header = None, \
names = ['reference'])
temperature_world = pd.read_csv('../data/world_monthly_anom.csv', \
header = None, \
names = ['year', 'month', 'delta'])
reference_pt['month'] = range(1, 13)
reference_world['month'] = range(1, 13)
# --- calculate temperatures from reference and delta temperature files
def calculate_temperatures(ref_temp, delta_temp, loc):
assert 'delta' in delta_temp.columns, 'delta_temp lacks a delta column'
assert 'reference' in ref_temp.columns, 'ref_temp lacks a reference column'
temp = ref_temp.merge(delta_temp, on = 'month')
temp['absolute'] = temp.reference + temp.delta
# remove years with missing measurements
temp['count'] = temp.groupby('year')['year'].transform('count')
temp = temp.query('count == 12')
# average annual temperature
temp = temp.groupby('year').mean()[['absolute']]
temp['year'] = temp.index
assert temp.year.duplicated().sum() == 0, 'remove duplicate years'
temp['location'] = np.repeat(loc, len(temp))
return temp
temperature_pt = calculate_temperatures(reference_pt, temperature_pt, 'Portugal')
temperature_world = calculate_temperatures(reference_world, temperature_world, 'World')
# --- combine data frames
temperatures = pd.concat([temperature_pt, temperature_world], ignore_index = True)
temperatures = temperatures.dropna()
# ------- line charts with moving averages
p = ggplot(temperatures, aes(x = 'year', y = 'absolute', color = 'location')) \
+ geom_point(alpha = .4, size = 2) \
+ geom_smooth(method = 'mavg', method_args = {'window': 10}, se = False) \
+ labs(x = 'Year', y = 'Average Temperature') \
+ theme_minimal() \
+ theme(legend_position = (0.75, 0.25))