Interpolating Temperature Data

AoT Workshop, Spatial Analysis, Part 2. August 30th, 2018. Argonne National Laboratory

This tutorial is brought to you by Anais Ladoy, Isaac Kamber, Marynia Kolak, Julia Koschinsky, and Luc Anselin at the Center for Spatial Data Science at the University of Chicago (see spatial.uchicago.edu for more info).


Environment Setup

Spatial analysis in R requires multiple libraries. Package installation is done with the following syntax: install.packages(“sp”). Some of these take additional time for installation, depending on your system. The following list is comprehensive for this tutorial, as well as much spatial analysis.

library(lubridate) #data wrangling
library(sp) #spatial data wrangling & analysis
library(rgdal) #spatial data wrangling
library(rgeos) #spatial data wrangling
library(raster) #spatial raster data wrangling
library(gstat) #kriging and geostatistics
library(tmap) #modern data visualizations
library(leaflet) #modern data visualizations

Import Data

nodes <- read.csv("nodes.csv")
head(nodes)
##        node_id  project_id vsn                               address
## 1 001e0610ba46 AoT_Chicago 004    State St & Jackson Blvd Chicago IL
## 2 001e0610ba3b AoT_Chicago 006    18th St & Lake Shore Dr Chicago IL
## 3 001e0610ba8f AoT_Chicago 00D          Cornell & 47th St Chicago IL
## 4 001e0610ba16 AoT_Chicago 010        Ohio St & Grand Ave Chicago IL
## 5 001e0610ba8b AoT_Chicago 018 Stony Island Ave & 63rd St Chicago IL
## 6 001e0610ba18 AoT_Chicago 01D         Damen Ave & Cermak Chicago IL
##        lat       lon         description     start_timestamp
## 1 41.87838 -87.62768 AoT Chicago (S) [C] 2017/10/09 00:00:00
## 2 41.85814 -87.61606     AoT Chicago (S) 2017/08/08 00:00:00
## 3 41.81034 -87.59023     AoT Chicago (S) 2017/08/08 00:00:00
## 4 41.89196 -87.61160 AoT Chicago (S) [C] 2017/12/01 00:00:00
## 5 41.78060 -87.58646 AoT Chicago (S) [C] 2018/02/26 00:00:00
## 6 41.85218 -87.67583     AoT Chicago (S) 2017/12/15 00:00:00
##         end_timestamp
## 1                    
## 2                    
## 3                    
## 4 2018/06/04 00:00:00
## 5                    
## 6
sensor.info <- read.csv("sensors.csv")
head(sensor.info)
##                           ontology subsystem sensor   parameter hrf_unit
## 1    /sensing/meteorology/humidity  metsense htu21d    humidity       RH
## 2    /sensing/meteorology/pressure  metsense bmp180    pressure      hPa
## 3 /sensing/meteorology/temperature  metsense bmp180 temperature        C
## 4 /sensing/meteorology/temperature  metsense htu21d temperature        C
## 5 /sensing/meteorology/temperature  metsense tsys01 temperature        C
##   hrf_minval hrf_maxval
## 1          0        100
## 2        300       1100
## 3        -40         85
## 4        -40        125
## 5        -40        125
##                                                                          datasheet
## 1 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/htu21d.pdf
## 2 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/bmp180.pdf
## 3 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/bmp180.pdf
## 4 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/htu21d.pdf
## 5 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/tsys01.pdf
sensor.data <- read.csv("data.csv.gz")
head(sensor.data)
##             timestamp      node_id subsystem sensor   parameter value_raw
## 1 2018/08/25 00:00:00 001e061135cb  metsense bmp180    pressure   9909900
## 2 2018/08/25 00:00:00 001e061135cb  metsense bmp180 temperature        NA
## 3 2018/08/25 00:00:00 001e061135cb  metsense htu21d    humidity        NA
## 4 2018/08/25 00:00:00 001e061135cb  metsense htu21d temperature        NA
## 5 2018/08/25 00:00:00 001e061135cb  metsense tsys01 temperature        NA
## 6 2018/08/25 00:00:02 001e0610bbf9  metsense bmp180    pressure  11016512
##   value_hrf
## 1    990.99
## 2     19.40
## 3     79.37
## 4     19.42
## 5     20.04
## 6   1015.59

Data Wrangling

Filter to Temperature Sensor of Interest

Isolate data for desired temperature sensor (tsys01 used for this). Get a summary of the data generated to confirm.

temp.data <- sensor.data %>%
  filter(sensor == "tsys01")

summary(temp.data)
##                timestamp              node_id         subsystem     
##  2018/08/25 00:00:41:     7   001e0610eef4: 3586   metsense:104038  
##  2018/08/25 00:09:53:     7   001e0610ba8f: 3582                    
##  2018/08/25 01:44:42:     7   001e0610ef27: 3581                    
##  2018/08/25 02:24:41:     7   001e0610fb4c: 3551                    
##  2018/08/25 04:13:54:     7   001e06113ad8: 3437                    
##  2018/08/25 16:52:21:     7   001e061135cb: 3302                    
##  (Other)            :103996   (Other)     :82999                    
##     sensor             parameter        value_raw          value_hrf     
##  bmp180:     0   humidity   :     0   Min.   : 9597902   Min.   :-13.57  
##  htu21d:     0   pressure   :     0   1st Qu.: 9659148   1st Qu.: 20.04  
##  tsys01:104038   temperature:104038   Median : 9707338   Median : 21.77  
##                                       Mean   : 9754247   Mean   : 22.74  
##                                       3rd Qu.: 9821416   3rd Qu.: 25.86  
##                                       Max.   :10124700   Max.   : 35.54  
##                                       NA's   :40194

Create Aggregate Temperature Variable

The timestamp holds date and time information. We can use the ymd_hms function from the lubridate package to convert the timestamp from a factor into a more searcheable data structure.

library(lubridate)
temp.data$timestamp2 <- ymd_hms(temp.data$timestamp)

When viewing the structure of the timestamp, we see it has thousands of measurements throughout the day. Generate an average of temperature for the afternoon, or second half of the day, by filtering for hours after 12pm. Then, group by node, and calculate the average. More sophisticaed analysis will filter for more precise temporal windows, and incorporate min and max temperatures for air quality analysis.

pm.temps <- temp.data %>% 
  filter(hour(timestamp2) >= 12) %>%
  group_by(node_id) %>%
  summarize(avg_temp = mean(value_hrf))

Next, add a Fahrenheit conversion to facilitate interpretation.

pm.temps$avg_tempF <- pm.temps$avg_temp*1.8+32

Examine data and remove a clearly faulty sensor (7.88 degrees celsius avg or 66.60 degrees Fahrenheit, accordingly.)

pm.temps$avg_temp
##  [1] 26.012653 25.879479 28.302872 25.967144 27.939340 25.517123 27.497169
##  [8] 29.305905 25.850505 27.028029  7.879993 28.152182 28.858535 25.913236
## [15] 28.164899 26.645538 27.566751 27.299922 29.175229 26.378803 26.941640
## [22] 26.222689 26.964437 25.908585 26.734579 28.866217 26.185707 26.065050
## [29] 25.545073 27.256887 25.892882 27.382421

In this step, we filter the data to take out the faulty sensor. In future work, this step will use a historical distribution of meteorological data (adjusted for seasonality) to more precisely tease out potentially faulty sensors. A sensor would be flagged for removal if it fell out of that range; the removal could be automatic or semi-automatic to facilitate additional guidance.

pm.temps <- pm.temps %>%
  filter(avg_temp > 15)

Next, we attach the hourly avg temperature to node info.

node.temps <- merge(pm.temps, nodes, by = c("node_id"))

Convert to Spatial Data Format

Then, convert the completed node data to spatial object format for plotting and more advanced spatial analytics.

coordinates(node.temps) <- node.temps[,c("lon", "lat")]
proj4string(node.temps) <- CRS("+init=epsg:4326")

Finalize Data Inspection

There are 31 sensors with temperature data in the array.

length(node.temps)
## [1] 31
head(node.temps)
##        node_id avg_temp avg_tempF  project_id  vsn
## 1 001e06109416 26.01265  78.82278 AoT_Chicago 090B
## 2 001e0610b9e9 25.87948  78.58306 AoT_Chicago  080
## 3 001e0610ba8f 28.30287  82.94517 AoT_Chicago  00D
## 4 001e0610bbf9 25.96714  78.74086 AoT_Chicago  020
## 5 001e0610bbff 27.93934  82.29081 AoT_Chicago  025
## 6 001e0610bc10 25.51712  77.93082 AoT_Chicago  01F
##                                  address      lat       lon
## 1         Elston and Cortland Chicago IL 41.91659 -87.66641
## 2 Broadway Ave & Lawrence Ave Chicago IL 41.96904 -87.65967
## 3           Cornell & 47th St Chicago IL 41.81034 -87.59023
## 4       Western Ave & 69th St Chicago IL 41.76832 -87.68340
## 5       Western Ave & 18th St Chicago IL 41.85780 -87.68581
## 6             State St & 87th Chicago IL 41.73631 -87.62418
##           description     start_timestamp end_timestamp
## 1 AoT Chicago (S) [C] 2018/01/01 00:00:00              
## 2 AoT Chicago (S) [C] 2018/02/14 00:00:00              
## 3     AoT Chicago (S) 2017/08/08 00:00:00              
## 4 AoT Chicago (S) [C] 2018/02/13 00:00:00              
## 5     AoT Chicago (S) 2017/12/15 00:00:00              
## 6 AoT Chicago (S) [C] 2018/02/22 00:00:00

Plot Temperature Data by Sensor

Confirm the success of spatial object transformation by simple plotting.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(node.temps) + tm_dots()

Add a new column equal to the average temp variable, renaming it “aveC” corresopnding to the Celsius unit. This step is used for troubleshooting plotting issues in the dot map to follow.

node.temps$aveC<-as.numeric(node.temps$avg_temp)
node.temps$aveF<-as.numeric(node.temps$avg_tempF)

Overlay Commmunity Areas

We add the Chicago Community Area spatial dataset (used in Part 1) to provide additional context for our maps.

chiCA <- readOGR(".","ChiComArea")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ChiComArea"
## with 77 features
## It has 4 fields
## Integer64 fields read as strings:  ComAreaID

Plot the temperature by sensor, adding Community Areas as a background.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(chiCA) + tm_borders() + 
  tm_shape(node.temps) + tm_dots(col="aveF",size=0.3,title="average temp (F)") 

Interpolate a Temperature Surface

We will use variograms to model the distribution of temperature data. We’ll then generate a grid on top of the Chicago area, and with the appropriate variogram model selected, use kriging to predict a temperature surface.

A variogram is a function that describes the degree of spatial dependence and contuinity across data. The final model uses the measure of variability between points at various distances. Points nearby are likely to have more similar values, and as distance between points increases, there is less likely to be similar values between points. In this application, we assume that temperature measurements that are further apart will vary more that measurements taken close together.

The variogram clearly has an outlier node, though it may not influence our final predicted surface.

tmp.vgm <- variogram(node.temps$avg_temp ~ 1, node.temps)
plot(tmp.vgm)

Spherical Model

We will generate two theoretical models to best approximate the experimental data. First, a theoretical semivariogram uses a spherical model fit.

tmp.fit.sph<- fit.variogram(tmp.vgm, model=vgm("Sph"))
plot(tmp.vgm, tmp.fit.sph)

Generate Grid

Next, we’ll create a grid from the Chicago area. The following function will generate a grid from a provided spatial data frame for n cells.

pt2grid <- function(ptframe,n) {
  bb <- bbox(ptframe)  
  ptcrs <- proj4string(ptframe)  
  xrange <- abs(bb[1,1] - bb[1,2])  
  yrange <- abs(bb[2,1] - bb[2,2])  
  cs <- c(xrange/n,yrange/n)  
  cc <- bb[,1] + (cs/2)  
  dc <- c(n,n)  
  x1 <- GridTopology(cellcentre.offset=cc,cellsize=cs,cells.dim=dc)  
  x2 <- SpatialGrid(grid=x1,proj4string=CRS(ptcrs))
  return(x2)
}

First, let’s generate a grid of 30 by 30 cells using the Chicago Community Area extent. Plot the grid for exploration.

chi.grid <- pt2grid((chiCA),30)
plot(chi.grid)

To get an even finer resolution, we generate a finer resolution of grid of 100 by 100 cells.

chi.grid <- pt2grid((chiCA),100)
plot(chi.grid)

Prepare Data for Kriging

First, we make sure that all our data is in the same projection.

projection(chi.grid) <- CRS("+init=epsg:4326")  
projection(node.temps) <-  CRS("+init=epsg:4326")
projection(chiCA) <- CRS("+init=epsg:4326")

Krige the data using the spherical model fit, and plot.

temp.kriged <- krige(node.temps$avg_temp ~ 1, node.temps, chi.grid, model = tmp.fit.sph)
## [using ordinary kriging]
plot(temp.kriged)

Clip to Chicago boundaries.

chi.temp.kriged <- temp.kriged[chiCA,]
plot(chi.temp.kriged)

Plot the kriged Chicago-area suface.

tm_shape(chi.temp.kriged) +
  tm_raster("var1.pred", style = "jenks", title = "Temperature (F)", palette = "BuPu") +
  tm_layout(main.title = "Avg Afternoon Temperature August 25", main.title.size = 1.1) +
  tm_legend(position = c("left", "bottom"))

Exponential Model and Kriging Surface

Next, we fit an Expoentntial model for the semivariogram.

tmp.fit.exp<- fit.variogram(tmp.vgm, model=vgm("Exp"))
plot(tmp.vgm, tmp.fit.exp)

We use the same process as above to generate a kriged surface. Compare the two models and predicted temperature surfaces. As more data are made available with the AoT sensors, the uncertainty across the temperature surface will be further reduced.

temp.kriged <- krige(node.temps$avg_temp ~ 1, node.temps, chi.grid, model = tmp.fit.exp)
## [using ordinary kriging]
chi.temp.kriged <- temp.kriged[chiCA,]
tm_shape(chi.temp.kriged) +
  tm_raster("var1.pred", style = "jenks", title = "Temperature C)", palette = "BuPu") +
  tm_layout(main.title = "Avg Afternoon Temperature August 25", main.title.size = 1.1) +
  tm_legend(position = c("left", "bottom"))

To finalize the map, we add the sensors for reference.

tm_shape(chi.temp.kriged) +
  tm_raster("var1.pred", style = "jenks", title = "Temperature (C)", palette = "BuPu") + tm_shape(node.temps) + tm_dots(size=0.01) +
  tm_layout(main.title = "Avg Afternoon Temperature August 25", main.title.size = 1.1) +
  tm_legend(position = c("left", "bottom"))