In Lesson 03 we used the ‘dplyr’ package and its “grammar of data manipulation” to create summary tables from large, multi-column dataframes. In this lesson we will cover two plot types useful for displaying tabular information – bar and pie plots.

Note: The author of ‘dplyr’, Hadley Wickam, is also the author of the very popular ‘ggplot2’ package which implements a “grammar of graphics”. We will not be covering ggplot2 in this lesson, instead using base R functions barplot() and pie(). Before wrapping your head around the entire “grammar of graphics” it is important to become familiar with core R graphics routines and how to customize their plots.

We will begin our foray into data visualization by creating a summary data table similar to those created in Lesson 03 and storing it in ‘wf’.

library(dplyr)
url <- "http://smoke.airfire.org/bluesky-daily/output/hysplit-pp/NAM-4km/2014080100/data/fire_locations.csv"
fires <- read.csv(url, stringsAsFactors=FALSE)

fires %>%
  filter(type == "WF") %>%
  group_by(state) %>%
  summarize(areaMax=max(area, na.rm=TRUE),
            areaSum=sum(area, na.rm=TRUE)) %>%
  arrange(desc(areaSum)) ->
  wf

barplot()

Plotting functions in R often accept a large number of arguments that control the appearance of the plot along with the special argument ‘…’ for extra arguments that are passed on to other functions called at a lower level. Here is what the ‘barplot’ function looks like:

# barplot(height, width = 1, space = NULL,
#         names.arg = NULL, legend.text = NULL, beside = FALSE,
#         horiz = FALSE, density = NULL, angle = 45,
#         col = NULL, border = par("fg"),
#         main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
#         xlim = NULL, ylim = NULL, xpd = TRUE, log = "",
#         axes = TRUE, axisnames = TRUE,
#         cex.axis = par("cex.axis"), cex.names = par("cex.axis"),
#         inside = TRUE, plot = TRUE, axis.lty = 0, offset = 0,
#         add = FALSE, args.legend = NULL, ...)

Many of these arguments are core graphical parameters that are explained in ?par. In fact, for many plotting functions you can specify graphical parameters as arguments for temparary adjustment of default parameter values.

Using mostly default values, barplot will give us a serviceable plot that shows relative sizes. We will first create a plot of total area burned and then overlay another plot showing the maximum fire size.

# Create the basic barplot
barplot(height=wf$areaSum,
        names.arg=wf$state,
        main="Total area burned in wildfires.")

# Add another barplot on top
barplot(height=wf$areaMax,
        names.arg=wf$state,
        col='firebrick2',
        add=TRUE)

# Add margin text using the default graphical parameter settings for subtitles
mtext("Maximum area of a single fire.",
      col='firebrick2', cex=par('cex.sub'), font=par('font.sub'))

plot of chunk barplot1

The barplot function can also be passed matrices and present them as either ‘stacked’ or ‘side by side’ bars. In the following example we will use this functionality to show which states made the largest contribution to each pollutant.

# Use the mutate() function to add normalization variables before we group by state
fires %>%
  filter(type == "WF") %>%
  mutate(pm25Total=sum(pm25, na.rm=TRUE),
         pm10Total=sum(pm10, na.rm=TRUE),
         coTotal=sum(co, na.rm=TRUE),
         noxTotal=sum(nox, na.rm=TRUE),
         so2Total=sum(so2, na.rm=TRUE)) %>%
  group_by(state) %>%
  summarize(pm25=sum(pm25/pm25Total, na.rm=TRUE),
            pm10=sum(pm10/pm10Total, na.rm=TRUE),
            co=sum(co/coTotal, na.rm=TRUE),
            nox=sum(nox/noxTotal, na.rm=TRUE),
            so2=sum(so2/so2Total, na.rm=TRUE)) %>%
  arrange(desc(pm25)) ->
  wf

# Create a matrix from this dataframe, dropping the first 'group_by' column and multiplying
# by 100 to get percentages.  Add rownames for better labeling.
m <- as.matrix(wf[,-1]) * 100
rownames(m) <- wf$state
m
##        pm25     pm10       co      nox      so2
## CA 50.76944 50.76944 50.79842 50.47806 50.64559
## WA 39.20885 39.20885 39.53281 35.95115 37.82425
## OR  6.84281  6.84281  6.55965  9.69030  8.05308
## MT  2.37165  2.37165  2.37158  2.37231  2.37193
## AZ  0.77135  0.77136  0.70697  1.41883  1.04655
## NM  0.03589  0.03589  0.03058  0.08934  0.05861
# Save default graphical parameters
oldPar <- par()

# Create a 2-panel layout
layout(matrix(seq(2)))

# Set graphical parameters
par(mar=c(5,4,4,4)+.1, # a little more room on the right
    xpd=NA)            # don't limit plotting (of legend) to the plot region


# Plot the contribution of each state by pollutant as stacked bars
# Use special arguments to get the legend positioned nicely.
barplot(m, las=1, beside=FALSE,
        axes=FALSE,
        main="State contributions to total wildfire pollutant load (stacked)",
        legend.text=wf$state, args.legend=list(x="topright",inset=c(-0.12,0)))

# Plot the contribution of each state by pollutant as side-by-side bars
barplot(m, las=1, beside=TRUE,
        main="State contributions (in %) to total wildfire pollutant load (side-by-side)",
        legend.text=wf$state, args.legend=list(x="topright",inset=c(-0.12,0)))

plot of chunk unnamed-chunk-3

# Restore default parameters
par(oldPar)

If you like colors, you might familiarize yourself with the ‘RColorBrewer’ package which defines a number of well vetted palettes. Here are the same plots using the ‘Set2’ palette, appropriate for assigning colors to discrete factors as opposed to continuous numeric values.

library(RColorBrewer)
cols <- brewer.pal(nrow(m),'Set2')

# Save default graphical parameters
oldPar <- par()

# Create a 2-panel layout
layout(matrix(seq(2)))

# Set graphical parameters
par(mar=c(5,4,4,4)+.1, # a little more room on the right
    xpd=NA)            # don't limit plotting (of legend) to the plot region

# Plot the contribution of each state by pollutant as stacked bars
barplot(m, las=1, beside=FALSE, col=cols,
        axes=FALSE,
        main="State contributions to total wildfire pollutant load (stacked)",
        legend.text=wf$state, args.legend=list(x="topright",inset=c(-0.12,0)))

# Plot the contribution of each state by pollutant as side-by-side bars
barplot(m, las=1, beside=TRUE, col=cols,
        main="State contributions (in %) to total wildfire pollutant load (side-by-side)",
        legend.text=wf$state, args.legend=list(x="topright",inset=c(-0.12,0)))

plot of chunk unnamed-chunk-4

# Restore default parameters
par(oldPar)

Task 1: Improved barplot

Use ‘layout(1)’ to return to single plot layout and improve the side-by-side barplot in the following ways:


pie()

Pie plots are often maligned but are very readily recognizable as depicting relative contributions to a whole ‘pie’. When presented with a hole in the middle – a ‘donut plot’ – these can be thought of as stacked barplots projected into polar coordinates.

A basic pie plot is straightforward but ugly and uses a physicist’s sensibilities about the location of zero and the positive direction of rotation:

pie(m[,1], labels=wf$state,
    main=paste0("State contributions to total wildfire ",colnames(m)[1]," load"))

plot of chunk pie1

We can improve things with the following:

# Add stateName variable to wf dataframe 
wf$stateName <- c("California","Washington","Oregon","Montana","Arizona","New Mexico")

# Improved pie plot
pie(m[1:5,1],
    wf$stateName[1:4],
    radius=0.8, clockwise=TRUE,
    col=cols, border='white', lwd=2)

# Use par(new=TRUE) to tell R to add a new plot on top of the old one
par(new=TRUE)

# Now add the white center and title
pie(1, radius=0.5, col='white', border='white', labels='')
titleText <- paste0("State\nContributions\nto Wildfire\nPM 2.5")
text(0, 0, labels=titleText,
     cex=par('cex.main'), col=par('col.main'), font=par('font.main'))

plot of chunk pie2

Pie plots are well suited to the concept of small multiples – presenting multiple, similar plots at the same time. Let’s use simplified verions of the pie plots to compare state contributions to different pollutants.

# Restore default graphical parameters
par(oldPar)

# Set up 3x2 matrix for plots and an overall title
layout(matrix(c(5,5,1,2,3,4),ncol=2,byrow=TRUE),heights=c(0.2,0.4,0.4))
layout.show(5) # to see the layout 

plot of chunk unnamed-chunk-5

# Set up a list (dictionary) to get better pollutant names. (Accessed with '[[ ]]' notation)
pollutant <- list(pm25="PM 2.5",
                  pm10="PM 10",
                  co="CO",
                  nox="NOX",
                  so2="SO_2")

# Reduce the margin around each plot
par(mar=c(1,1,1,1))

# Loop over the first four columns in matrix m
for (i in 1:4) {
  pie (m[,i], labels=wf$state[1:4], radius=0.9, clockwise=TRUE, col=cols, border='white', lwd=2)
  par(new=TRUE)
  pie(1, radius=0.4, col='white', border='white', lwd=2, labels='')
  titleText <- pollutant[[colnames(m)[i]]]
  text(0, 0, labels=titleText, cex=1.5, font=2)
}

# Finally, in position #5, a blank plot and the title
plot(0, 0, axes=FALSE, xlab='', ylab='', col='white')
text(0, 0, labels="State Contributions to Different Pollutants", cex=2.5, font=2)

plot of chunk unnamed-chunk-5


Task 2: More Pie Plot Multiples

Create a small-multiples plot that lets us easily compare two dates:

Extra credit