In this lesson we will bring together all of the tools we have learned in previous lessons to begin creating customized plot functions. Over the course of this exercise we will learn about R’s graphical parameters as well as a little about R as a programming language.
The most important new topic is the creation of functions. Functions in R are objects in their own right and can be passed around just like any other variable.
For our purposes, the simple syntax of functions is:
myfunction <- function(arg1, arg2, ... ) {
~statements~
return(object)
}
Arguments to functions are named and are referenced inside the function by name. The call to the return()
function is optional if your function is generating plots rather than returning a result.
Note: R uses pass-by-value rather than pass-by-reference. This means that a function cannot change the value of anything outside of its curly braces – its ‘scope’. However, a function does have read access to global variables
Our first function will capture everything we did in Lesson 02 to read in and convert output from MUSTANG web services into a nice dataframe:
# getMetricDF accepts a MUSTANG url and returns a metric dataframe
getMetricDF <- function(url, skip=1) {
# Read in the data
DF <- read.csv(url, stringsAsFactors=FALSE, skip=skip)
# Convert character strings to class 'POSIXct'.
DF$start <- as.POSIXct(DF$start, "%Y/%m/%d %H:%M:%OS", tz="GMT")
DF$end <- as.POSIXct(DF$end, "%Y/%m/%d %H:%M:%OS", tz="GMT")
# Add columns to the dataframe
char_matrix <- stringr::str_split_fixed(DF$target,'\\.',5)
DF$net <- as.factor(char_matrix[,1])
DF$sta <- as.factor(char_matrix[,2])
DF$loc <- as.factor(char_matrix[,3])
DF$cha <- as.factor(char_matrix[,4])
DF$qual <- as.factor(char_matrix[,5])
return(DF)
}
# Try it out
url <- "http://service.iris.edu/mustang/measurements/1/query?net=IU&sta=FUNA&loc=00&cha=BHZ&timewindow=2015-01-01,2015-03-31&output=text&metric=sample_mean&orderby=start"
metricDF <- getMetricDF(url)
head(metricDF)
## value target start end
## 1 -4740440 IU.FUNA.00.BHZ.M 2015-01-01 2015-01-02
## 2 -4740420 IU.FUNA.00.BHZ.M 2015-01-02 2015-01-03
## 3 -4586580 IU.FUNA.00.BHZ.M 2015-01-03 2015-01-04
## 4 -3061230 IU.FUNA.00.BHZ.M 2015-01-04 2015-01-05
## 5 -2856150 IU.FUNA.00.BHZ.M 2015-01-05 2015-01-06
## 6 -3617690 IU.FUNA.00.BHZ.M 2015-01-06 2015-01-07
## lddate net sta loc cha qual
## 1 2015/02/14 01:29:26.044289 IU FUNA 00 BHZ M
## 2 2015/02/13 23:56:08.871530 IU FUNA 00 BHZ M
## 3 2015/02/14 01:33:07.168904 IU FUNA 00 BHZ M
## 4 2015/02/14 05:01:36.007208 IU FUNA 00 BHZ M
## 5 2015/02/14 02:17:11.509053 IU FUNA 00 BHZ M
## 6 2015/02/14 01:12:17.870723 IU FUNA 00 BHZ M
Several features of R’s syntax are worth noting:
… function(url, skip=1)
…
R functions accept arguments as named parameters with optional default values as in: skip=1
. If no default value is specified for parameter, a default of NULL
will be used.
… read.csv(url, stringsAsFactors=FALSE, skip=skip)
…
R’s function parsing understands name=value
whether the parameter name is in quotes or not. The above line could also be written:
… read.csv(url, 'stringsAsFactors'=FALSE, 'skip'=skip)
…
With our dataframe creation function in hand it’s time to write our first plotting function whose sole purpose is to wrap the existing plot()
function:
# Function to create a timeseries plot
myTimeseriesPlot <- function(DF, varName1, varName2, ...) {
# Extract data
x <- DF[[varName1]]
y <- DF[[varName2]]
# Sanity check that varName1 has class 'POSIXct'
if ( !("POSIXct" %in% class(x)) ) {
stop(paste0('Variable ',varName1,' is not POSIXct'))
}
# Sanity check that varName2 is numeric
if ( !is.numeric(y) ) stop(paste0('Variable ',varName2,' is not numeric.'))
# Create plot
plot(x, y, ...)
}
# Try it out
myTimeseriesPlot(metricDF, 'start', 'value',
xlab='Date', ylab='Mean',
main='Daily Mean Amplitude')
Here are some features to note:
…function(DF, varName1, varName2, ...)
…
The ...
argument is there to accept any additional arguments you pass in. This is particularly useful when you are wrapping a function that people may already be familiar with. Our function simply passes the unidentified arguments on with plot(x, y, ...)
. (See ?plot.default
for a coomplete list of possible arguments.)
x <- DF[[varName1]]
This is the correct syntax for referencing a list element or dataframe column given the name of that element or column.
stop(paste0('Variable ',varName1,' is not POSIXct'))
The stop()
function stops evaluation, immediately returns from the function and throws an error message that can be trapped. for more detailed disucssion of error handling please see:
myTimeseriesPlot(metricDF, 'start', 'value',
xlab='Date', ylab='Mean',
main='Daily Mean Amplitude')
Function calls with many arguments can be broken up over several lines to make for more readable code.
Task 1: Improve upon the plot:
?plot.default
and ?par
to learn about the following graphical parameters: pch, cex, col, las, xlim, ylim, xlab, ylab, main, sub, btyThe Mustang Databrowser uses MATLAB styling for its timeseries plots. The rest of this lesson is a script that produces a MATLAB style timeseries plot. This script is an abridged version of the various techniques that were used to stlye these plots.
As you will see, being persnickety about how things look can cause you to write a lot of code.
Your task is the following:
Good luck!
myMatlabTimeseries <- function(DF, varName1, varName2, snclName, metricTitle, ...) {
# ----- Data -----------------------------------------------------------------
# Extract data
time <- DF[[varName1]]
metric <- DF[[varName2]]
# Sanity check that varName1 has class "POSIXct"
if ( !("POSIXct" %in% class(time)) ) {
stop(paste0('Variable ',varName1,' is not POSIXct'))
}
# Sanity check that varName2 is numeric
if ( !is.numeric(metric) ) stop(paste0('Variable ',varName2,' is not numeric.'))
# ----- Style ----------------------------------------------------------------
# Save current par() settings
oldPar <- par()
# Global graphical parameter settings
par(bg='gray95', mar=c(.5,3,.5,3), oma=c(5,2,4,0))
# Plotting parameters
pch <- 15; col <- 'red'; cex <- 1.0
las <- 1
par(cex=1.0)
# x axis range from data we know exists in the metric dataframe
xlim <- range(DF$start,DF$end)
starttime <- xlim[1]
endtime <- xlim[2]
# ----- Plot -----------------------------------------------------------------
plot(time, metric,
panel.first=rect(par("usr")[1],par("usr")[3],par("usr")[2],par("usr")[4], col="white"),
xlim=xlim, bty='n',
pch=pch, col=col, cex=cex,
axes=FALSE, xlab="", ylab="")
# Draw a box
box()
# Create x axis ticks and labels
dayCount <- as.numeric( difftime(endtime,starttime,units="days") )
if (dayCount < 14) { # < 2 weeks with ticks on days
xAxTicks <- seq(starttime,endtime,by="day")
xAxLabels <- format(xAxTicks,"%d-%b",tz="GMT")
} else if (dayCount >= 14 && dayCount < 42) { # < 1 month with ticks every week
xAxTicks <- seq(starttime,endtime,by="week")
xAxLabels <- format(xAxTicks,"%d-%b",tz="GMT")
} else if (dayCount >= 42 && dayCount < 95) { # < 3 months with ticks every 2-weeks
xAxTicks <- seq(starttime,endtime,by="2 weeks")
xAxLabels <- format(xAxTicks,"%d-%b",tz="GMT")
} else { # > 3 months with ticks on ~ month boundaries
xAxTicks <- seq(starttime,endtime,by="month")
xAxLabels <- format(xAxTicks,"%b-%Y",tz="GMT")
}
# Create y axis ticks and labels
yAxTicks <- axTicks(2)
if (max(yAxTicks) >= 1e5) {
yAxLabels <- format(yAxTicks,scientific=TRUE,digits=2)
} else {
yAxLabels <- yAxTicks
}
# X axis and grid
axis.POSIXct(1, at=xAxTicks, labels=xAxLabels, line=0)
abline(v=xAxTicks, col="lightgray", lty="dotted", lwd=par("lwd"))
# Y axis and grid
axis(side=2, las=1, at=yAxTicks, labels=yAxLabels)
abline(h=yAxTicks, col="lightgray", lty="dotted", lwd=par("lwd"))
# ----- Annnotation ----------------------------------------------------------
# Title at the top
text <- paste(snclName,'--',metricTitle)
title(text, outer=TRUE)
# Dates at the bottom
dataDateRange <- paste(format(starttime,"%b %d, %Y",tz="GMT"),"-",format(endtime,"%b %d, %Y",tz="GMT"))
dataJulianDateRange <- paste(format(starttime,"%Y.%j",tz="GMT"),"-",format(endtime,"%Y.%j",tz="GMT"))
dataRange <- paste("Data for",dataDateRange,'(',dataJulianDateRange,')')
line <- par('oma')[1] - 1.5 # 1.5 lines off the outer margin bottom
mtext(dataRange, side=1, line=line, cex=1.2)
# Restore old par settings
par(oldPar)
}
# Try it out
myMatlabTimeseries(metricDF, 'start', 'value',
snclName='IU.FUNA.00.BHZ',
metricTitle='Daily Mean Amplitude')
Strings and Dates < prev | next > ‘seismic’ Packages