Tagged: data presentation

Dealing with “missing”/out of bounds values in heatmaps

I was tinkering around in R to see if I could plot better looking heatmaps, when I encountered an issue regarding how specific values are represented in plots with user-specified restricted ranges. When I’m plotting heatmaps, I usually want to restrict the range of the data being plotted in a specific way. In this case, I was plotting classifier accuracy across frequencies and time of Fourier-transformed EEG data, and I want to restrict the lowest value of the heatmap to be chance level (in this case, 50%).

I used the following code to generate a simple heatmap:

decoding <- readMat("/Users/Pablo/Desktop/testDecoding.mat")$plotData
plotDecoding <- melt(decoding)

quartz(width=5,height=4)
ggplot(plotDecoding, aes(x = Var2, y = Var1)) +
geom_tile(aes(fill = value)) +
scale_fill_viridis(name = "",limits = c(0.5,0.75))

And this is what I got:

heatmap1

See the gray areas? What’s happening here is that any data points that fall outside of my specified data range (see the limits argument in scale_fill_viridis) are being converted to NAs. Apparently, NA values are plotted as gray by default. Obviously these values are not missing in the actual data; how can we actually ensure that they're plotted? The solution comes from the scales package that comes with ggplot. Whenever you create a plot with specified limits, include the argument oob = squish (oob = out of bounds) in the same line where you set the limits (make sure that the scales package is loaded). Ex:

scale_fill_viridis(name = "",limits = c(0.5,0.75),oob=squish)

What this does is that values that fall below the specified range are represented as the minimum value (color, in this case), and values that fall above the range are represented as the maximum value (which seems to be what Matlab defaults to). The resulting heatmap looks like this:

heatmap2

A simple solution to an annoying little quirk in R!

Smoothing the map






Other ways to plot spatial data…

Here, I walk through some ggplot methods, and finally introduce the spatstat package, which allows one to plot smoothed heatmaps based on your variable of interest.

Basica Data Wrangling

library(zipcode)
library(dplyr)
library(ggplot2)
data(zipcode)
pzip<-read.csv('StudyFiveSampleOneWaveOne_forRclub.csv')
str(pzip)
## 'data.frame':    1192 obs. of  2 variables:
##  $ ZipCode: int  23231 85712 85281 23917 NA 84337 89104 60629 80503 43449 ...
##  $ PScore : num  3.37 4.3 2.97 3.2 2.9 ...
pzip_clean<-pzip %>%
    mutate(zip=clean.zipcodes(ZipCode))
str(pzip_clean)
## 'data.frame':    1192 obs. of  3 variables:
##  $ ZipCode: int  23231 85712 85281 23917 NA 84337 89104 60629 80503 43449 ...
##  $ PScore : num  3.37 4.3 2.97 3.2 2.9 ...
##  $ zip    : chr  "23231" "85712" "85281" "23917" ...
pzip_latlong <- left_join(pzip_clean,zipcode) %>%
    filter(longitude > -140) %>%
    na.omit
## Joining by: "zip"
str(pzip_latlong)
## 'data.frame':    1080 obs. of  7 variables:
##  $ ZipCode  : int  23231 85712 85281 23917 84337 89104 60629 80503 43449 93030 ...
##  $ PScore   : num  3.37 4.3 2.97 3.2 3.17 ...
##  $ zip      : chr  "23231" "85712" "85281" "23917" ...
##  $ city     : chr  "Richmond" "Tucson" "Tempe" "Boydton" ...
##  $ state    : chr  "VA" "AZ" "AZ" "VA" ...
##  $ latitude : num  37.5 32.2 33.4 36.6 41.7 ...
##  $ longitude: num  -77.4 -110.9 -111.9 -78.3 -112.2 ...

Points and Density

Close your eyes and infer the shape of the nation….

pzip_latlong %>% filter(longitude > -140) %>%
    ggplot(aes(y=latitude,x=longitude)) + 
    geom_point(aes(color=PScore))

pzip_latlong %>% filter(longitude > -140) %>%
    ggplot(aes(y=latitude,x=longitude)) + 
    stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE)

Spatially smoothed data on an actual map

Thanks, https://stat.ethz.ch/pipermail/r-sig-geo/2011-March/011284.html

library(spatstat)
library(maps)
library(maptools)

#map('usa')
#points(pzip_latlong$longitude, pzip_latlong$latitude)
usmap <- map('usa', fill=TRUE, col="transparent", plot=FALSE)
uspoly <- map2SpatialPolygons(usmap, IDs=usmap$names, 
    proj4string=CRS("+proj=longlat +datum=wgs84"))
spatstat.options(checkpolygons=FALSE)
## Warning: spatstat.options('checkpolygons') will be ignored in future
## versions of spatstat
usowin <- as.owin.SpatialPolygons(uspoly)
spatstat.options(checkpolygons=TRUE)
## Warning: spatstat.options('checkpolygons') will be ignored in future
## versions of spatstat
pzip_noDupes <- pzip_latlong %>% 
    group_by(latitude, longitude) %>%
    summarise(pscore=mean(PScore))

pzipPoints <- ppp(x=pzip_noDupes$longitude,
          y=pzip_noDupes$latitude,
          marks=pzip_noDupes$pscore, 
          window=usowin)
## Warning: point-in-polygon test had difficulty with 1 point (total score not
## 0 or 1)
## Warning in ppp(x = pzip_noDupes$longitude, y = pzip_noDupes$latitude,
## marks = pzip_noDupes$pscore, : 18 points were rejected as lying outside the
## specified window
plot(density(pzipPoints))

plot(Smooth(pzipPoints))



Map maker, map maker, make me a map






[Editor’s note: The following code takes a dataframe with zipcodes and a mystery variable of interest and makes it into a map. Thanks for the code, Rose, and thanks for the data, Rita!]

if(!require(zipcode)) install.packages("zipcode"); library(zipcode)
df.raw <- read.csv("StudyFiveSampleOneWaveOne_forRclub.csv")
str(df.raw)
## 'data.frame':    1192 obs. of  2 variables:
##  $ ZipCode: int  23231 85712 85281 23917 NA 84337 89104 60629 80503 43449 ...
##  $ PScore : num  3.37 4.3 2.97 3.2 2.9 ...
if(!identical(x=clean.zipcodes(df.raw$ZipCode), y=as.character(df.raw$ZipCode)) ) message("Uh-oh") #check that zip codes in df are clean
## Uh-oh
df <- df.raw
df$zip <- clean.zipcodes(df$ZipCode)
df <- na.omit(df)

data(zipcode) # reads in a dataframe with the state for each zipcode
df <- merge(df, zipcode[,c(1,3)], by="zip", all.x=T) # add a column of state abbreviations to the datafile

library(dplyr)
df.state <- df %>%
  group_by(state) %>%
  summarize(PScore=mean(PScore))

if(!require(choroplethr)) install.packages("choroplethr"); library(choroplethr)
if(!require(choroplethrMaps)) install.packages("choroplethrMaps"); library(choroplethrMaps)

data(state.regions) # reads in a dataframe with the state name and state abbreviations

df.state <- merge(df.state, state.regions[,c(1,2)], by.x="state", by.y="abb", all.x=T)

map.data = data.frame(region=df.state$region, value=df.state$PScore)
state_choropleth(map.data)
## Warning in super$initialize(map.df, user.df): Your data.frame contains the
## following regions which are not mappable: NA
## Warning in left_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector



Interactive Embedded Plots with Plotly and ggplot2




Largley lifted from this r-bloggers post

install.packages("devtools")  # so we can install from GitHub
devtools::install_github("ropensci/plotly")  # plotly is part of rOpenSci
library(plotly)

py <- plotly(username="jflournoy", key="mg34ox914h")  # open plotly connection
# I'll change my key after this, but you can still use: plotly(username="r_user_guide", key="mw5isa4yqp")
# Or just sign up for your own account!

 
gg <- ggplot(iris) +
    geom_point(aes(Sepal.Length, Sepal.Width,color=Species,size=Petal.Length))
gg

#This looks a little object-oriented like python  
py$ggplotly(gg)

You can embed code like this (which you get from the plotly ‘share’ dialogue):

<div>
<a href="https://plot.ly/~jflournoy/16/" target="_blank" title="Sepal.Width vs Sepal.Length" style="display: block; text-align: center;"><img src="https://plot.ly/~jflournoy/16.png" alt="Sepal.Width vs Sepal.Length" style="max-width: 100%;width: 797px;"  width="797" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
<script data-plotly="jflournoy:16" src="https://plot.ly/embed.js" async></script>
</div>
Sepal.Width vs Sepal.Length

You can also directly embed a plotly plot using a code chunk if you set plotly=TRUE for the chunk, and include session="knitr" in the call.

#Set `plotly=TRUE`
py$ggplotly(gg, session="knitr")

There’s a wide world of plotly fun just waiting out there.