Sensor Data Access and Mapping Basics

AoT Workshop, Spatial Analysis, Part 1. August 30th, 2018. Argonne National Laboratory

This tutorial is brought to you by Anais Ladoy, Isaac Kamber, Marynia Kolak, Julia Koschinsky, and Luc Anselin at the Center for Spatial Data Science at the University of Chicago (see spatial.uchicago.edu for more info).


Environment Setup

Spatial analysis in R requires multiple libraries. Package installation is done with the following syntax: install.packages(“sp”). Some of these take additional time for installation, depending on your system. The following list is comprehensive for this tutorial, as well as much spatial analysis.

library(sp) #spatial data wrangling & analysis

library(rgdal) #spatial data wrangling
library(rgeos) #spatial data wrangling & analytics
library(tidyverse) # data wrangling

library(tmap) #modern data visualizations
library(leaflet) #modern data visualizations

Download Daily AoT Data

Former versions of uncalibrarted AoT data were found on the ANL Waggle datasite. This data was bundled daily to include all previous data, making the size unmanageable and not ideal for daily updates. (However, this data is still available and may be useful for one-off analysis or historical investigation of sensor data.)

In this tutorial, we focus on the new AoT data product, provided as complete daily data dumps on Plenar.io via https://aot-file-browser.plenar.io/data-sets/chicago-complete. We use data from August 25, 2018.

file.path <- c("https://s3.amazonaws.com/aot-tarballs/chicago-public.daily.2018-08-25.tar")
file.dest <- c("08-25-2018_public.tar")
download.file(file.path, file.dest)

We untar the file.

untar(file.dest)

Read and Inspect Data

Read in the files that contain node data, and inspect

nodes <- read.csv("nodes.csv")
head(nodes)
##        node_id  project_id vsn                               address
## 1 001e0610ba46 AoT_Chicago 004    State St & Jackson Blvd Chicago IL
## 2 001e0610ba3b AoT_Chicago 006    18th St & Lake Shore Dr Chicago IL
## 3 001e0610ba8f AoT_Chicago 00D          Cornell & 47th St Chicago IL
## 4 001e0610ba16 AoT_Chicago 010        Ohio St & Grand Ave Chicago IL
## 5 001e0610ba8b AoT_Chicago 018 Stony Island Ave & 63rd St Chicago IL
## 6 001e0610ba18 AoT_Chicago 01D         Damen Ave & Cermak Chicago IL
##        lat       lon         description     start_timestamp
## 1 41.87838 -87.62768 AoT Chicago (S) [C] 2017/10/09 00:00:00
## 2 41.85814 -87.61606     AoT Chicago (S) 2017/08/08 00:00:00
## 3 41.81034 -87.59023     AoT Chicago (S) 2017/08/08 00:00:00
## 4 41.89196 -87.61160 AoT Chicago (S) [C] 2017/12/01 00:00:00
## 5 41.78060 -87.58646 AoT Chicago (S) [C] 2018/02/26 00:00:00
## 6 41.85218 -87.67583     AoT Chicago (S) 2017/12/15 00:00:00
##         end_timestamp
## 1                    
## 2                    
## 3                    
## 4 2018/06/04 00:00:00
## 5                    
## 6
sensor.info <- read.csv("sensors.csv")
head(sensor.info)
##                           ontology subsystem sensor   parameter hrf_unit
## 1    /sensing/meteorology/humidity  metsense htu21d    humidity       RH
## 2    /sensing/meteorology/pressure  metsense bmp180    pressure      hPa
## 3 /sensing/meteorology/temperature  metsense bmp180 temperature        C
## 4 /sensing/meteorology/temperature  metsense htu21d temperature        C
## 5 /sensing/meteorology/temperature  metsense tsys01 temperature        C
##   hrf_minval hrf_maxval
## 1          0        100
## 2        300       1100
## 3        -40         85
## 4        -40        125
## 5        -40        125
##                                                                          datasheet
## 1 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/htu21d.pdf
## 2 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/bmp180.pdf
## 3 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/bmp180.pdf
## 4 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/htu21d.pdf
## 5 https://github.com/waggle-sensor/sensors/blob/master/sensors/airsense/tsys01.pdf
provenance <- read.csv("provenance.csv")
head(provenance)
##   data_format_version         project_id     data_start_date
## 1                   2 AoT_Chicago.public 2017/01/01 00:00:00
##         data_end_date       creation_date
## 1 2018/08/25 20:00:45 2018/08/25 20:00:45
##                                                                                                url
## 1 http://www.mcs.anl.gov/research/projects/waggle/downloads/datasets/AoT_Chicago.public.latest.tar

Read in the files that contain sensor data, and inspect

sensor.data <- read.csv("data.csv.gz")
head(sensor.data)
##             timestamp      node_id subsystem sensor   parameter value_raw
## 1 2018/08/25 00:00:00 001e061135cb  metsense bmp180    pressure   9909900
## 2 2018/08/25 00:00:00 001e061135cb  metsense bmp180 temperature        NA
## 3 2018/08/25 00:00:00 001e061135cb  metsense htu21d    humidity        NA
## 4 2018/08/25 00:00:00 001e061135cb  metsense htu21d temperature        NA
## 5 2018/08/25 00:00:00 001e061135cb  metsense tsys01 temperature        NA
## 6 2018/08/25 00:00:02 001e0610bbf9  metsense bmp180    pressure  11016512
##   value_hrf
## 1    990.99
## 2     19.40
## 3     79.37
## 4     19.42
## 5     20.04
## 6   1015.59

Let’s look at the underlying data structures of the data.

glimpse(sensor.data)
## Observations: 497,912
## Variables: 7
## $ timestamp <fct> 2018/08/25 00:00:00, 2018/08/25 00:00:00, 2018/08/25...
## $ node_id   <fct> 001e061135cb, 001e061135cb, 001e061135cb, 001e061135...
## $ subsystem <fct> metsense, metsense, metsense, metsense, metsense, me...
## $ sensor    <fct> bmp180, bmp180, htu21d, htu21d, tsys01, bmp180, bmp1...
## $ parameter <fct> pressure, temperature, humidity, temperature, temper...
## $ value_raw <int> 9909900, NA, NA, NA, NA, 11016512, 22608, 47274, 243...
## $ value_hrf <dbl> 990.99, 19.40, 79.37, 19.42, 20.04, 1015.59, 13.80, ...

We will return to this data more closely in Part 2, where we will plot and interpolate temperature, but first let’s better understand the spatial distribution of the nodes and sensors.

Convert data to spatial formats

From our data inspection completed prior, we know that latitutde and longitude information can be found in the nodes dataset. Data with latitute and longitude in columns is not explicitly spatial data until it’s spatial features have been projected and enabled. Thus, we need to convert the CSV format to a spatial format. In R, when using the sp package, we use the Spatial Points Data Frame.

A spatial projection is assigned; this dataset uses the standard EPSG:4326, or the WGS 84 projection. Note that units for this projection are decimal degrees, and that distances should not be calculated before transforming into a different spatial projection.

nodes.spt <- SpatialPointsDataFrame(nodes[,c('lon','lat')],nodes)
proj4string(nodes.spt) <- CRS("+init=epsg:4326")

Let’s plot the nodes in a basic format to ensure they’re plotting correctly:

plot(nodes.spt)

We can vaguely see the shape of Chicago, plus a monitor to the Southwest (likely at Argonne).

How many sensors are in the dataset? Checking the length of the dataset is one way to get the total number of sensors. You could alternatively count the number of unique node id’s, just to make sure that no node is counted twice for any reason. There are 90 unique nodes.

length(nodes.spt)
## [1] 90
#unique(nodes.spt$node_id)

Mapping Nodes

Let’s plot the nodes again, but in a more modern cartography. Here we’ll add a categorical component to distinguish the type of node using the “description” parameter.

tmap_mode('plot')
## tmap mode set to plotting
tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.3)

Let’s add a basemap for context, and make the map interactive.

tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.1)

Mapping Nodes with Community Areas

For additional context, we’ll add Chicago community areas as a layer. First, you will need to download a spatial file of Chicago communities in a format like geojson or shapefile. Here, we’ve downloaded and added a shapefile (composed of 4 data files) to the working directory for easy access.

chiCA <- readOGR(".","ChiComArea")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ChiComArea"
## with 77 features
## It has 4 fields
## Integer64 fields read as strings:  ComAreaID

We could inspect the structure of the spatial data file using the str() function, we’ll get a long overview returned detailing the complex spatial structures of the dataset. Spatial data formats are more complex than their non-spatial counterparts.

To inspect the data attributes of the shapefile, we’ll look at the non-spatial data dimension only; with the sp package, this is done using @data.

head(chiCA@data)
##   ComAreaID       community shape_area shape_len
## 0        35         DOUGLAS   46004621  31027.05
## 1        36         OAKLAND   16913961  19565.51
## 2        37     FULLER PARK   19916705  25339.09
## 3        38 GRAND BOULEVARD   48492503  28196.84
## 4        39         KENWOOD   29071742  23325.17
## 5         4  LINCOLN SQUARE   71352328  36624.60

We’ve confirmed that our community area file has all 77 Chicago areas. Let’s plot it quickly for final inspection:

plot(chiCA)

Now we’ll add the Community Areas to our interactive map.

tm_shape(chiCA) + tm_borders(alpha = 0.7) + tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.1)

To finalize the map, let’s add labels of each Community Area. The labels will be overlapping, with the current default, until the map is zoomed in further.

tm_shape(chiCA) + tm_borders(alpha = 0.7) + tm_text("community", size=0.5) + tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.1)

Mapping Density of AoT Sensors

Let’s further inspect the distribution of sensors across the city. Our next goal will be to generate a buffer of 1 kilometer around each sensor. We will then visualize the buffers, and finally calculat the density of AoT sensor areas per Community Areas.

First, we need to convert to a spatial projection that preserves distance. We’ll use EPSG:32616, which uses a unit of meters. See more about this projection at: https://epsg.io/32616.

nodes.spt.32616 <- spTransform(nodes.spt, CRS("+init=epsg:32616"))

Next, we’ll calculate 1 kilometer buffers (= 1,000 meters) for each node.

buffers <- gBuffer(nodes.spt.32616, width = 1000, byid = TRUE)

Convert back to a projection that all layers have in common. We’ll use the standard, EPSG.4326. We already used this at the beginning of the tutorial by default, but will denote in dataset name to avoid confusion. This extra step helps troubleshooting immensely!

nodes.spt.4326 <- spTransform(nodes.spt,  CRS("+init=epsg:4326"))
chiCA.4326 <- spTransform(chiCA,  CRS("+init=epsg:4326"))
buffer.4326 <- spTransform(buffers, CRS("+init=epsg:4326"))

Map the buffers for inspection.

tm_shape(chiCA.4326) + tm_borders() +
     tm_shape(buffer.4326) + tm_borders(col = "blue") +
     tm_shape(nodes.spt.4326) + tm_dots(col = "red") 

Let’s count all of the buffers per Community Area.

chiCA.4326$count_buffers = rowSums(gIntersects(buffer.4326, chiCA.4326, byid = TRUE))

Finally, let’s view the buffer density, overlaying the buffers for final confirmation and exploration.

tm_shape(chiCA.4326) + tm_fill(col = "count_buffers", palette = "BuGn", style = "quantile",title = "AoT Density") + tm_shape(buffer.4326) + tm_borders(col = "blue")