4 Link Community Data

4.1 Overview

Geographic location can serve as a “key” that links different datasets together. By referencing each dataset and enabling its spatial location, we can integrate different types of information in one setting. In this tutorial, we will use the approximated “service areas” generated in our buffer analysis to identify vulnerable populations during the COVID pandemic. We will connect Chicago COVID-19 Case data by ZIP Code, available as a flat file on the city’s data portal, to our environment.

We will then overlap the 1-mile buffers representing walkable access to the Methadone providers in the city. We use a conservative threshold because of the multiple challenges posed by the COVID pandemic that may impact travel.

Our final goal will be to identify zip codes most impacted by COVID that are outside our acceptable access threshold. Our tutorial objectives are to:

Clean data in preparation of merge
Integrate data using geographic identifiers
Generate maps for a basic gap analysis

4.2 Environment Setup

To replicate the codes & functions illustrated in this tutorial, you’ll need to have R and RStudio downloaded and installed on your system. This tutorial assumes some familiarity with the R programming language.

4.2.1 Input/Output

Our inputs include multiple CSV and SHP files, all of which can be found here, though the providers point file was generated in the Geocoding tutorial. Note that all four files are required (.dbf, .prj, .shp, and .shx) to constitute a shapefile.

Chicago Methadone Clinics, methadone_clinics.shp
1-mile buffer zone of Clinics, methadoneClinics_1mi.shp
Chicago Zip Codes, chicago_zips.shp
Chicago COVID case data by Zip, COVID-19_Cases__Tests__and_Deaths_by_ZIP_Code.csv

We will calculate the minimum distance between the resources and the centroids of the zip codes, then save the results as a shapefile and as a CSV. Our final result will be a shapefile/CSV with the minimum distance value for each zip.

4.2.2 Load Libraries

We will use the following packages in this tutorial:

sf: to manipulate spatial data
tmap: to visualize and create maps
units: to convert units within spatial data

Load the libraries for use.

library(sf)
library(tmap)

4.2.3 Load data

First we’ll load the shapefiles.

chicagoZips <- st_read("data/chicago_zips.shp")

## Reading layer `chicago_zips' from data source `/Users/yashbansal/Desktop/CSDS_RA/opioid-environment-toolkit/data/chicago_zips.shp' using driver `ESRI Shapefile'
## Simple feature collection with 85 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -88.06058 ymin: 41.58152 xmax: -87.52366 ymax: 42.06504
## geographic CRS: WGS 84

methadoneSf <- st_read("data/methadone_clinics.shp")

## Reading layer `methadone_clinics' from data source `/Users/yashbansal/Desktop/CSDS_RA/opioid-environment-toolkit/data/methadone_clinics.shp' using driver `ESRI Shapefile'
## Simple feature collection with 27 features and 8 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -87.7349 ymin: 41.68698 xmax: -87.57673 ymax: 41.96475
## geographic CRS: WGS 84

buffers<- st_read("data/methadoneClinics_1mi.shp")

## Reading layer `methadoneClinics_1mi' from data source `/Users/yashbansal/Desktop/CSDS_RA/opioid-environment-toolkit/data/methadoneClinics_1mi.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1141979 ymin: 1824050 xmax: 1195960 ymax: 1935751
## projected CRS:  NAD83 / Illinois East (ftUS)

Next, we’ll load some new data we’re interested in joining in: Chicago COVID-19 Cases, Tests, and Deaths by ZIP Code, found on the city data portal here. We’ll load in a CSV and inspect the data:

COVID <- read.csv("data/COVID-19_Cases__Tests__and_Deaths_by_ZIP_Code.csv")

4.3 Clean & Merge Data

First, let’s inspect COVID case data. What information do we need from this file? We may not need everything, so consider just identifying the field with the geographic identifier and main variable(s) of interest.

head(COVID)

##   ZIP.Code Week.Number Week.Start   Week.End Cases...Weekly
## 1    60603          39 09/20/2020 09/26/2020              0
## 2    60604          39 09/20/2020 09/26/2020              0
## 3    60611          16 04/12/2020 04/18/2020              8
## 4    60611          15 04/05/2020 04/11/2020              7
## 5    60615          11 03/08/2020 03/14/2020             NA
## 6    60603          10 03/01/2020 03/07/2020             NA
##   Cases...Cumulative Case.Rate...Weekly Case.Rate...Cumulative
## 1                 13                  0                 1107.3
## 2                 31                  0                 3964.2
## 3                 72                 25                  222.0
## 4                 64                 22                  197.4
## 5                 NA                 NA                     NA
## 6                 NA                 NA                     NA
##   Tests...Weekly Tests...Cumulative Test.Rate...Weekly
## 1             25                327               2130
## 2             12                339               1534
## 3            101                450                312
## 4             59                349                182
## 5              6                  9                 14
## 6              0                  0                  0
##   Test.Rate...Cumulative Percent.Tested.Positive...Weekly
## 1                27853.5                              0.0
## 2                43350.4                              0.0
## 3                 1387.8                              0.1
## 4                 1076.3                              0.1
## 5                   21.7                               NA
## 6                    0.0                               NA
##   Percent.Tested.Positive...Cumulative Deaths...Weekly
## 1                                  0.0               0
## 2                                  0.1               0
## 3                                  0.2               0
## 4                                  0.2               0
## 5                                   NA               0
## 6                                   NA               0
##   Deaths...Cumulative Death.Rate...Weekly Death.Rate...Cumulative
## 1                   0                   0                       0
## 2                   0                   0                       0
## 3                   0                   0                       0
## 4                   0                   0                       0
## 5                   0                   0                       0
## 6                   0                   0                       0
##   Population   Row.ID            ZIP.Code.Location
## 1       1174 60603-39 POINT (-87.625473 41.880112)
## 2        782 60604-39 POINT (-87.629029 41.878153)
## 3      32426 60611-16 POINT (-87.620291 41.894734)
## 4      32426 60611-15 POINT (-87.620291 41.894734)
## 5      41563 60615-11 POINT (-87.602725 41.801993)
## 6       1174 60603-10 POINT (-87.625473 41.880112)

From this we can assess the need for the following variables: ZIP Code and Percent Tested Positive - Cumulative. Let’s subset the data accordingly. Because this data file has extremely long header names (common in the epi world), let’s use the colnames function to get exactly what we need.

colnames(COVID)

##  [1] "ZIP.Code"                            
##  [2] "Week.Number"                         
##  [3] "Week.Start"                          
##  [4] "Week.End"                            
##  [5] "Cases...Weekly"                      
##  [6] "Cases...Cumulative"                  
##  [7] "Case.Rate...Weekly"                  
##  [8] "Case.Rate...Cumulative"              
##  [9] "Tests...Weekly"                      
## [10] "Tests...Cumulative"                  
## [11] "Test.Rate...Weekly"                  
## [12] "Test.Rate...Cumulative"              
## [13] "Percent.Tested.Positive...Weekly"    
## [14] "Percent.Tested.Positive...Cumulative"
## [15] "Deaths...Weekly"                     
## [16] "Deaths...Cumulative"                 
## [17] "Death.Rate...Weekly"                 
## [18] "Death.Rate...Cumulative"             
## [19] "Population"                          
## [20] "Row.ID"                              
## [21] "ZIP.Code.Location"

4.3.1 Subset Data

We can now subset to just include the fields we need. There are many different ways to subset in R – we just use one example here! Inspect your data to confirm it was pulled correctly.

COVID.sub <- COVID[, c("ZIP.Code", "Case.Rate...Cumulative")]
head(COVID.sub)

##   ZIP.Code Case.Rate...Cumulative
## 1    60603                 1107.3
## 2    60604                 3964.2
## 3    60611                  222.0
## 4    60611                  197.4
## 5    60615                     NA
## 6    60603                     NA

4.3.2 Identify Geographic Key

Before merging, we need to first identify the geographic identifier we would like to merge on. It is the field “ZIP.Code” in our subset. What about the zip code file, which we will be merging to?

head(chicagoZips)

## Simple feature collection with 6 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -88.06058 ymin: 41.73452 xmax: -87.58209 ymax: 42.04052
## geographic CRS: WGS 84
##   ZCTA5CE10 GEOID10 CLASSFP10 MTFCC10 FUNCSTAT10  ALAND10
## 1     60501   60501        B5   G6350          S 12532295
## 2     60007   60007        B5   G6350          S 36493383
## 3     60651   60651        B5   G6350          S  9052862
## 4     60652   60652        B5   G6350          S 12987857
## 5     60653   60653        B5   G6350          S  6041418
## 6     60654   60654        B5   G6350          S  1464813
##   AWATER10  INTPTLAT10   INTPTLON10
## 1   974360 +41.7802209 -087.8232440
## 2   917560 +42.0086000 -087.9973398
## 3        0 +41.9020934 -087.7408565
## 4        0 +41.7479319 -087.7147951
## 5  1696670 +41.8199645 -087.6059654
## 6   113471 +41.8918225 -087.6383036
##                         geometry
## 1 MULTIPOLYGON (((-87.86289 4...
## 2 MULTIPOLYGON (((-88.06058 4...
## 3 MULTIPOLYGON (((-87.77559 4...
## 4 MULTIPOLYGON (((-87.74205 4...
## 5 MULTIPOLYGON (((-87.62623 4...
## 6 MULTIPOLYGON (((-87.64775 4...

Aha – in this dataset, the zip is identified as either the ZCTA5CE10 field or GEOID10 field. Not that we are actually working with 5-digit ZCTA fields, not 9-digit ZIP codes… We decide to merge on the GEOID10 field. To make our lives easier, we’ll generate a duplicate field in our subset with a new name, GEOID10, to match. We also convert from the factor structure to a character field to match the data structure of the master zip file.

COVID.sub$GEOID10<- as.character(COVID.sub$ZIP.Code)

4.3.3 Merge Data

Let’s merge the data using the zip code geographic identifier, “ZIP Code” field, to bring in the the Percent Tested Positive - Cumalative dataset. Inspect the data to confirm it merged correctly!

zipsMerged <- merge(chicagoZips, COVID.sub, by = "GEOID10")
head(zipsMerged)

## Simple feature collection with 6 features and 11 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -87.63396 ymin: 41.88083 xmax: -87.6129 ymax: 41.88893
## geographic CRS: WGS 84
##   GEOID10 ZCTA5CE10 CLASSFP10 MTFCC10 FUNCSTAT10 ALAND10 AWATER10
## 1   60601     60601        B5   G6350          S  934226    60682
## 2   60601     60601        B5   G6350          S  934226    60682
## 3   60601     60601        B5   G6350          S  934226    60682
## 4   60601     60601        B5   G6350          S  934226    60682
## 5   60601     60601        B5   G6350          S  934226    60682
## 6   60601     60601        B5   G6350          S  934226    60682
##    INTPTLAT10   INTPTLON10 ZIP.Code Case.Rate...Cumulative
## 1 +41.8856419 -087.6215226    60601                 1451.4
## 2 +41.8856419 -087.6215226    60601                  872.2
## 3 +41.8856419 -087.6215226    60601                  919.9
## 4 +41.8856419 -087.6215226    60601                  558.8
## 5 +41.8856419 -087.6215226    60601                  156.7
## 6 +41.8856419 -087.6215226    60601                  477.0
##                         geometry
## 1 MULTIPOLYGON (((-87.63396 4...
## 2 MULTIPOLYGON (((-87.63396 4...
## 3 MULTIPOLYGON (((-87.63396 4...
## 4 MULTIPOLYGON (((-87.63396 4...
## 5 MULTIPOLYGON (((-87.63396 4...
## 6 MULTIPOLYGON (((-87.63396 4...

4.4 Visualize Data

Now we are ready to visualize our data! First we’ll make a simple map. We generate a choropleth map of case rate data using quantile bins, and the Blue-Purple color palette. You can find an R color cheatsheet useful for identifying palette codes here. (More details on thematic mapping are in tutorials that follow!) We overlay the buffers and clincal providers.

tmap_mode("plot")

## tmap mode set to plotting

tm_shape(zipsMerged) +
  tm_polygons("Case.Rate...Cumulative", style="quantile", pal="BuPu",
              title = "COVID Case Rate") +
  tm_shape(buffers) + tm_borders(col = "blue") +
  tm_shape(methadoneSf) + tm_dots(col = "black", size = 0.2)

Already we can generate some insight. Areas on the far West side of the city have some of the highest case rates, but are outside a walkable distance to Methadone providers. For individuals with opioid use disorder requiring medication access in these locales, they may be especially vulnerable during the pandemic.

Next, we adjust some tmap parameters to improve our map. Now we switch to a red-yellow-green palette, and specify six bins for our quantile map. We flip the direction of the palette using a negative sign, so that red corresponds to areas with higher rates. We adjust transparency using an alpha parameter, and line thickness using the lwd parameter.

tm_shape(zipsMerged) +
  tm_fill("Case.Rate...Cumulative", style="quantile", n=6, pal="-RdYlGn",
              title = "COVID Case Rate",alpha = 0.8) + 
  tm_borders(lwd = 0) + 
  tm_shape(buffers) + tm_borders(col = "gray") + tm_fill(alpha=0.5) +
  tm_shape(methadoneSf) + tm_dots(col = "black", size = 0.1) +
  tm_layout(main.title = "Walkable Methadone Service Areas",
            main.title.position = "center",
            main.title.size = 1,
            frame = FALSE)

To improve our map even further, let’s make it interactive! By switching the tmap_mode function to “view” (from “plot” the default), our newly rendered map is not interactive. We can zoom in and out, click on different basemaps or turn layers on our off, and click on resources for more information.

tmap_mode("view")

## tmap mode set to interactive viewing

tm_shape(zipsMerged) +
  tm_fill("Case.Rate...Cumulative", style="quantile", n=6, pal="-RdYlGn",
              title = "COVID Case Rate",alpha = 0.8) + 
  tm_borders(lwd = 0) + 
  tm_shape(buffers) + tm_borders(col = "gray") + tm_fill(alpha=0.5) +  
  tm_shape(methadoneSf) + tm_dots(col = "black", size = 0.1)

Using this approach, it’s clear that the zip codes on the West Side most vulnerable are 60651, 60644, 60632, 60629, and 60608. By updating the thresholds and parameters further, these can shift as well to be more or less conservative based on our assumptions.

4.4.1 Save Data

We save our newly merged ZCTA level data for future analysis.

write_sf(zipsMerged, "data/chizips_COVID.shp")

## Warning in abbreviate_shapefile_names(obj): Field names
## abbreviated for ESRI Shapefile driver