AoT Workshop, Spatial Analysis, Part 2. August 30th, 2018. Argonne National Laboratory
This tutorial is brought to you by Anais Ladoy, Isaac Kamber, Marynia Kolak, Julia Koschinsky, and Luc Anselin at the Center for Spatial Data Science at the University of Chicago (see spatial.uchicago.edu for more info).
Spatial analysis in R requires multiple libraries. Package installation is done with the following syntax: install.packages(“sp”). Some of these take additional time for installation, depending on your system. The following list is comprehensive for this tutorial, as well as much spatial analysis.
library(lubridate) #data wrangling
library(sp) #spatial data wrangling & analysis
library(rgdal) #spatial data wrangling
library(rgeos) #spatial data wrangling
library(raster) #spatial raster data wrangling
library(gstat) #kriging and geostatistics
library(tmap) #modern data visualizations
library(leaflet) #modern data visualizations
nodes <- read.csv("nodes.csv")
head(nodes)
## node_id project_id vsn address
## 1 001e0610ba46 AoT_Chicago 004 State St & Jackson Blvd Chicago IL
## 2 001e0610ba3b AoT_Chicago 006 18th St & Lake Shore Dr Chicago IL
## 3 001e0610ba8f AoT_Chicago 00D Cornell & 47th St Chicago IL
## 4 001e0610ba16 AoT_Chicago 010 Ohio St & Grand Ave Chicago IL
## 5 001e0610ba8b AoT_Chicago 018 Stony Island Ave & 63rd St Chicago IL
## 6 001e0610ba18 AoT_Chicago 01D Damen Ave & Cermak Chicago IL
## lat lon description start_timestamp
## 1 41.87838 -87.62768 AoT Chicago (S) [C] 2017/10/09 00:00:00
## 2 41.85814 -87.61606 AoT Chicago (S) 2017/08/08 00:00:00
## 3 41.81034 -87.59023 AoT Chicago (S) 2017/08/08 00:00:00
## 4 41.89196 -87.61160 AoT Chicago (S) [C] 2017/12/01 00:00:00
## 5 41.78060 -87.58646 AoT Chicago (S) [C] 2018/02/26 00:00:00
## 6 41.85218 -87.67583 AoT Chicago (S) 2017/12/15 00:00:00
## end_timestamp
## 1
## 2
## 3
## 4 2018/06/04 00:00:00
## 5
## 6
sensor.info <- read.csv("sensors.csv")
head(sensor.info)
## ontology subsystem sensor parameter hrf_unit
## 1 /sensing/meteorology/humidity metsense htu21d humidity RH
## 2 /sensing/meteorology/pressure metsense bmp180 pressure hPa
## 3 /sensing/meteorology/temperature metsense bmp180 temperature C
## 4 /sensing/meteorology/temperature metsense htu21d temperature C
## 5 /sensing/meteorology/temperature metsense tsys01 temperature C
## hrf_minval hrf_maxval
## 1 0 100
## 2 300 1100
## 3 -40 85
## 4 -40 125
## 5 -40 125
## datasheet
## 1 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/htu21d.pdf
## 2 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/bmp180.pdf
## 3 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/bmp180.pdf
## 4 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/htu21d.pdf
## 5 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/tsys01.pdf
sensor.data <- read.csv("data.csv.gz")
head(sensor.data)
## timestamp node_id subsystem sensor parameter value_raw
## 1 2018/08/25 00:00:00 001e061135cb metsense bmp180 pressure 9909900
## 2 2018/08/25 00:00:00 001e061135cb metsense bmp180 temperature NA
## 3 2018/08/25 00:00:00 001e061135cb metsense htu21d humidity NA
## 4 2018/08/25 00:00:00 001e061135cb metsense htu21d temperature NA
## 5 2018/08/25 00:00:00 001e061135cb metsense tsys01 temperature NA
## 6 2018/08/25 00:00:02 001e0610bbf9 metsense bmp180 pressure 11016512
## value_hrf
## 1 990.99
## 2 19.40
## 3 79.37
## 4 19.42
## 5 20.04
## 6 1015.59
Isolate data for desired temperature sensor (tsys01 used for this). Get a summary of the data generated to confirm.
temp.data <- sensor.data %>%
filter(sensor == "tsys01")
summary(temp.data)
## timestamp node_id subsystem
## 2018/08/25 00:00:41: 7 001e0610eef4: 3586 metsense:104038
## 2018/08/25 00:09:53: 7 001e0610ba8f: 3582
## 2018/08/25 01:44:42: 7 001e0610ef27: 3581
## 2018/08/25 02:24:41: 7 001e0610fb4c: 3551
## 2018/08/25 04:13:54: 7 001e06113ad8: 3437
## 2018/08/25 16:52:21: 7 001e061135cb: 3302
## (Other) :103996 (Other) :82999
## sensor parameter value_raw value_hrf
## bmp180: 0 humidity : 0 Min. : 9597902 Min. :-13.57
## htu21d: 0 pressure : 0 1st Qu.: 9659148 1st Qu.: 20.04
## tsys01:104038 temperature:104038 Median : 9707338 Median : 21.77
## Mean : 9754247 Mean : 22.74
## 3rd Qu.: 9821416 3rd Qu.: 25.86
## Max. :10124700 Max. : 35.54
## NA's :40194
The timestamp holds date and time information. We can use the ymd_hms function from the lubridate package to convert the timestamp from a factor into a more searcheable data structure.
library(lubridate)
temp.data$timestamp2 <- ymd_hms(temp.data$timestamp)
When viewing the structure of the timestamp, we see it has thousands of measurements throughout the day. Generate an average of temperature for the afternoon, or second half of the day, by filtering for hours after 12pm. Then, group by node, and calculate the average. More sophisticaed analysis will filter for more precise temporal windows, and incorporate min and max temperatures for air quality analysis.
pm.temps <- temp.data %>%
filter(hour(timestamp2) >= 12) %>%
group_by(node_id) %>%
summarize(avg_temp = mean(value_hrf))
Next, add a Fahrenheit conversion to facilitate interpretation.
pm.temps$avg_tempF <- pm.temps$avg_temp*1.8+32
Examine data and remove a clearly faulty sensor (7.88 degrees celsius avg or 66.60 degrees Fahrenheit, accordingly.)
pm.temps$avg_temp
## [1] 26.012653 25.879479 28.302872 25.967144 27.939340 25.517123 27.497169
## [8] 29.305905 25.850505 27.028029 7.879993 28.152182 28.858535 25.913236
## [15] 28.164899 26.645538 27.566751 27.299922 29.175229 26.378803 26.941640
## [22] 26.222689 26.964437 25.908585 26.734579 28.866217 26.185707 26.065050
## [29] 25.545073 27.256887 25.892882 27.382421
In this step, we filter the data to take out the faulty sensor. In future work, this step will use a historical distribution of meteorological data (adjusted for seasonality) to more precisely tease out potentially faulty sensors. A sensor would be flagged for removal if it fell out of that range; the removal could be automatic or semi-automatic to facilitate additional guidance.
pm.temps <- pm.temps %>%
filter(avg_temp > 15)
Next, we attach the hourly avg temperature to node info.
node.temps <- merge(pm.temps, nodes, by = c("node_id"))
Then, convert the completed node data to spatial object format for plotting and more advanced spatial analytics.
coordinates(node.temps) <- node.temps[,c("lon", "lat")]
proj4string(node.temps) <- CRS("+init=epsg:4326")
There are 31 sensors with temperature data in the array.
length(node.temps)
## [1] 31
head(node.temps)
## node_id avg_temp avg_tempF project_id vsn
## 1 001e06109416 26.01265 78.82278 AoT_Chicago 090B
## 2 001e0610b9e9 25.87948 78.58306 AoT_Chicago 080
## 3 001e0610ba8f 28.30287 82.94517 AoT_Chicago 00D
## 4 001e0610bbf9 25.96714 78.74086 AoT_Chicago 020
## 5 001e0610bbff 27.93934 82.29081 AoT_Chicago 025
## 6 001e0610bc10 25.51712 77.93082 AoT_Chicago 01F
## address lat lon
## 1 Elston and Cortland Chicago IL 41.91659 -87.66641
## 2 Broadway Ave & Lawrence Ave Chicago IL 41.96904 -87.65967
## 3 Cornell & 47th St Chicago IL 41.81034 -87.59023
## 4 Western Ave & 69th St Chicago IL 41.76832 -87.68340
## 5 Western Ave & 18th St Chicago IL 41.85780 -87.68581
## 6 State St & 87th Chicago IL 41.73631 -87.62418
## description start_timestamp end_timestamp
## 1 AoT Chicago (S) [C] 2018/01/01 00:00:00
## 2 AoT Chicago (S) [C] 2018/02/14 00:00:00
## 3 AoT Chicago (S) 2017/08/08 00:00:00
## 4 AoT Chicago (S) [C] 2018/02/13 00:00:00
## 5 AoT Chicago (S) 2017/12/15 00:00:00
## 6 AoT Chicago (S) [C] 2018/02/22 00:00:00
Confirm the success of spatial object transformation by simple plotting.
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(node.temps) + tm_dots()
Add a new column equal to the average temp variable, renaming it “aveC” corresopnding to the Celsius unit. This step is used for troubleshooting plotting issues in the dot map to follow.
node.temps$aveC<-as.numeric(node.temps$avg_temp)
node.temps$aveF<-as.numeric(node.temps$avg_tempF)
We add the Chicago Community Area spatial dataset (used in Part 1) to provide additional context for our maps.
chiCA <- readOGR(".","ChiComArea")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "ChiComArea"
## with 77 features
## It has 4 fields
## Integer64 fields read as strings: ComAreaID
Plot the temperature by sensor, adding Community Areas as a background.
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(chiCA) + tm_borders() +
tm_shape(node.temps) + tm_dots(col="aveF",size=0.3,title="average temp (F)")
We will use variograms to model the distribution of temperature data. We’ll then generate a grid on top of the Chicago area, and with the appropriate variogram model selected, use kriging to predict a temperature surface.
A variogram is a function that describes the degree of spatial dependence and contuinity across data. The final model uses the measure of variability between points at various distances. Points nearby are likely to have more similar values, and as distance between points increases, there is less likely to be similar values between points. In this application, we assume that temperature measurements that are further apart will vary more that measurements taken close together.
The variogram clearly has an outlier node, though it may not influence our final predicted surface.
tmp.vgm <- variogram(node.temps$avg_temp ~ 1, node.temps)
plot(tmp.vgm)
We will generate two theoretical models to best approximate the experimental data. First, a theoretical semivariogram uses a spherical model fit.
tmp.fit.sph<- fit.variogram(tmp.vgm, model=vgm("Sph"))
plot(tmp.vgm, tmp.fit.sph)
Next, we’ll create a grid from the Chicago area. The following function will generate a grid from a provided spatial data frame for n cells.
pt2grid <- function(ptframe,n) {
bb <- bbox(ptframe)
ptcrs <- proj4string(ptframe)
xrange <- abs(bb[1,1] - bb[1,2])
yrange <- abs(bb[2,1] - bb[2,2])
cs <- c(xrange/n,yrange/n)
cc <- bb[,1] + (cs/2)
dc <- c(n,n)
x1 <- GridTopology(cellcentre.offset=cc,cellsize=cs,cells.dim=dc)
x2 <- SpatialGrid(grid=x1,proj4string=CRS(ptcrs))
return(x2)
}
First, let’s generate a grid of 30 by 30 cells using the Chicago Community Area extent. Plot the grid for exploration.
chi.grid <- pt2grid((chiCA),30)
plot(chi.grid)
To get an even finer resolution, we generate a finer resolution of grid of 100 by 100 cells.
chi.grid <- pt2grid((chiCA),100)
plot(chi.grid)
First, we make sure that all our data is in the same projection.
projection(chi.grid) <- CRS("+init=epsg:4326")
projection(node.temps) <- CRS("+init=epsg:4326")
projection(chiCA) <- CRS("+init=epsg:4326")
Krige the data using the spherical model fit, and plot.
temp.kriged <- krige(node.temps$avg_temp ~ 1, node.temps, chi.grid, model = tmp.fit.sph)
## [using ordinary kriging]
plot(temp.kriged)
Clip to Chicago boundaries.
chi.temp.kriged <- temp.kriged[chiCA,]
plot(chi.temp.kriged)
Plot the kriged Chicago-area suface.
tm_shape(chi.temp.kriged) +
tm_raster("var1.pred", style = "jenks", title = "Temperature (F)", palette = "BuPu") +
tm_layout(main.title = "Avg Afternoon Temperature August 25", main.title.size = 1.1) +
tm_legend(position = c("left", "bottom"))
Next, we fit an Expoentntial model for the semivariogram.
tmp.fit.exp<- fit.variogram(tmp.vgm, model=vgm("Exp"))
plot(tmp.vgm, tmp.fit.exp)
We use the same process as above to generate a kriged surface. Compare the two models and predicted temperature surfaces. As more data are made available with the AoT sensors, the uncertainty across the temperature surface will be further reduced.
temp.kriged <- krige(node.temps$avg_temp ~ 1, node.temps, chi.grid, model = tmp.fit.exp)
## [using ordinary kriging]
chi.temp.kriged <- temp.kriged[chiCA,]
tm_shape(chi.temp.kriged) +
tm_raster("var1.pred", style = "jenks", title = "Temperature C)", palette = "BuPu") +
tm_layout(main.title = "Avg Afternoon Temperature August 25", main.title.size = 1.1) +
tm_legend(position = c("left", "bottom"))
To finalize the map, we add the sensors for reference.
tm_shape(chi.temp.kriged) +
tm_raster("var1.pred", style = "jenks", title = "Temperature (C)", palette = "BuPu") + tm_shape(node.temps) + tm_dots(size=0.01) +
tm_layout(main.title = "Avg Afternoon Temperature August 25", main.title.size = 1.1) +
tm_legend(position = c("left", "bottom"))