In this chapter we will learn more about R’s classes for dates and times.

POSIXct class

R provides a ‘Date’ class for working with dates as well as ‘POSIXlt’ and ‘POSIXct’ for working with date-times. You should always use the POSIXct class for any work in the time dimension. Data of class POSIXct are stored as the number of seconds since 1970-01-01 00:00:00 UTC.

Use the strptime() function to parse ASCII strings into POSIXct and the strftime() function to format POSIXct into ASCII. It is important to always specify the timezone argument when using these functions as the default is to use the local timezone, not “GMT”. Some examples will demonstrate what can be done with time and date functions:

# Start of the Rim Fire in two different ASCII representations
iso8601 <- "2014-08-17 15:25:00"
wikipedia <- "August 17, 2013 at 3:25pm"

# Parsing
RimFireStart <- strptime(iso8601,format="%Y-%m-%d %H:%M:%S",tz="America/Los_Angeles")
RimFireStart
## [1] "2014-08-17 15:25:00 PDT"
RimFireStart <- strptime(wikipedia,format="%B %d, %Y at %I:%M%p",tz="America/Los_Angeles")
RimFireStart
## [1] "2013-08-17 15:25:00 PDT"
# Formatting
strftime(RimFireStart,format="Rim Fire started on %d/%m/%Y, at %I:%M %p -- it was a %A")
## [1] "Rim Fire started on 17/08/2013, at 03:25 PM -- it was a Saturday"
strftime(RimFireStart,format="Rim Fire started in %Y on day %j")
## [1] "Rim Fire started in 2013 on day 229"

If your data are always in UTC then you must always specify ‘tz=“GMT”’ as an argument when you read in or write out dates. If your data is sometimes in local time then you should become familiar with the ‘lubridate’ package. This package has a variety of utilities that make working with dates much easier.

‘lubridate’ package

Here are some examples using the lubridate package. We’ll introduce the optional ‘::’ syntax that allows us to write code that specifies which package we are using. The code will run fine if we omit ‘lubridate::’ but sometimes it helps to be clear about which functions are from base R and which are from a package.

library(stringr)    # for str_~ functions
library(lubridate)  # for time and date utilities

# How many time zones are in Indiana?
zones <- lubridate::olson_time_zones()
zones[str_detect(zones,'Indiana')]
## [1] "America/Indiana/Indianapolis" "America/Indiana/Knox"        
## [3] "America/Indiana/Marengo"      "America/Indiana/Petersburg"  
## [5] "America/Indiana/Tell_City"    "America/Indiana/Vevay"       
## [7] "America/Indiana/Vincennes"    "America/Indiana/Winamac"
# What time did clocks on the East Coast show when the Rim Fire started
newyork <- lubridate::with_tz(RimFireStart, tz="America/New_York")
newyork
## [1] "2013-08-17 18:25:00 EDT"
# Let's make an 'airport clock' for the Rim Fire
london <- lubridate::with_tz(RimFireStart, tz="Europe/London")
moscow <- lubridate::with_tz(RimFireStart, tz="Europe/Moscow")
hongkong <- lubridate::with_tz(RimFireStart, tz="Asia/Hong_Kong")

text <- paste0("  Rim Fire arriving on\n",
               strftime(newyork,"      %A %B %d %H:%M %Z\n"),
               strftime(london,"      %A %B %d %H:%M %Z\n"),
               strftime(moscow,"      %A %B %d %H:%M %Z\n"),
               strftime(hongkong,"      %A %B %d %H:%M %Z"))

cat(text)
##   Rim Fire arriving on
##       Saturday August 17 18:25 EDT
##       Saturday August 17 23:25 BST
##       Sunday August 18 02:25 MSK
##       Sunday August 18 06:25 HKT
# ISO 8601 is understood
RimFireEnd <- as.POSIXct("2013-10-24 18:00:00",tz="America/Los_Angeles")

# Time ranges
difftime(RimFireEnd,RimFireStart,unit="days")
## Time difference of 68.10764 days
difftime(RimFireEnd,RimFireStart,unit="hours")
## Time difference of 1634.583 hours
# Adding 'durations'
RimFireStart + lubridate::ddays(3) # also dseconds(), dminutes(), etc.
## [1] "2013-08-20 15:25:00 PDT"
# Time axis creation
timeline_1H <- seq(RimFireStart, RimFireEnd, by="hours")
timeline_1H[1:3]
## [1] "2013-08-17 15:25:00 PDT" "2013-08-17 16:25:00 PDT"
## [3] "2013-08-17 17:25:00 PDT"
seq(RimFireStart, RimFireEnd, by="3 hours")[1:3]
## [1] "2013-08-17 15:25:00 PDT" "2013-08-17 18:25:00 PDT"
## [3] "2013-08-17 21:25:00 PDT"
seq(RimFireStart, RimFireEnd, by="week")[1:3]
## [1] "2013-08-17 15:25:00 PDT" "2013-08-24 15:25:00 PDT"
## [3] "2013-08-31 15:25:00 PDT"

‘openair’ package

The ‘openair’ package is a compilation of “tools for the analysis of air pollution data” developed by the Environmental Research Group at King’s College London. It is a very well maintained package with its own website at http://www.openair-project.org.

We will explore some of the timeseries capabilities of the openair package in the context of the BlueSky netcdf file we have been working with.

library(openair)
library(ncdf4)

# Open the file
fileName <- 'PNW-1.33km_2014092500_v2.nc'
nc <- nc_open(fileName)

# Read in all of the data
pm25 <- ncvar_get(nc, "PM25") 
time <- ncvar_get(nc, "time")
time <- as.POSIXct(time, origin="1970-1-1", tz="GMT")
lat <- ncvar_get(nc, "lat")
lon <- ncvar_get(nc, "lon")

# Close the file now that everything is in memory
nc_close(nc)

Many openair functions operate on dataframes with columns: ‘date’ and named pollutant

rollingMean()

We start off by creating the required input dataframe and then use the rollingMean() function to calculate a variety of rolling means.

df <- data.frame(date=time, pm2.5=pm25[910,199,]) # data.frame(key1=value1, key2=value2)

# rollingMean() adds a column to the dataframe so we can build up rolling means like this
df <- rollingMean(df, pollutant="pm2.5", width=3, align="center", new.name="pm2.5_rm3C")
df <- rollingMean(df, pollutant="pm2.5", width=8, align="center", new.name="pm2.5_rm8C")
df <- rollingMean(df, pollutant="pm2.5", width=8, align="left", new.name="pm2.5_rm8L")

names(df)
## [1] "date"       "pm2.5"      "pm2.5_rm3C" "pm2.5_rm8C" "pm2.5_rm8L"
# Plot raw data and rolling means as a function of time
plot(df$pm2.5 ~ time, pch=16, type='b', las=1, ylab='PM 2.5')
points(df$pm2.5_rm3C ~ time, col='red', type='b')
points(df$pm2.5_rm8C ~ time, col='blue', type='b')
points(df$pm2.5_rm8L ~ time, col='gray50', pch=4, type='b')

legend('topleft',
       legend=c('hourly','3-hour rolling mean','8-hour rolling mean','8-hour rolling mean left aligned'),
       pch=c(16,1,1,4), col=c('black','red','blue','gray50'))

timeAverage()

Where the rollingMean() function returns values on the same time axis as the input dataset, the timeAverage() function aggreates or expands the time axis by using one of several different statistics. As explained in the documentation:

timeAverage() should be useful in many circumstances where it is necessary to work with different time average data. For example, hourly air pollution data and 15-minute meteorological data. To merge the two data sets timeAverage can be used to make the meteorological data 1-hour means first. Alternatively, timeAverage can be used to expand the hourly data to 15 minute data.

Working with our just-generated dataframe:

# Aggregate hourly data into 8 six-hour-wide bins using averaging
timeAverage(df, avg.time="6 hour", statistic="mean")
##                  date        pm2.5   pm2.5_rm3C  pm2.5_rm8C pm2.5_rm8L
## 1 2014-09-25 01:00:00 5.984559e-04 5.984559e-04  0.04929793  0.1276084
## 2 2014-09-25 07:00:00 2.282349e-01 2.264853e-01  0.26719270  0.4664583
## 3 2014-09-25 13:00:00 8.012846e-01 7.121269e-01  1.01900718  1.8629434
## 4 2014-09-25 19:00:00 2.722283e+00 2.690018e+00  2.27438190  2.1044175
## 5 2014-09-26 01:00:00 1.965522e+00 1.763010e+00  1.98160796  2.4627090
## 6 2014-09-26 07:00:00 2.556541e+00 2.743821e+00 20.16026878 49.1983661
## 7 2014-09-26 13:00:00 8.076463e+01 8.042647e+01 60.52995326 40.3783036
## 8 2014-09-26 19:00:00 2.857802e+00 2.894068e+00  7.67405557        NaN
# Aggregate hourly data into daily bins and keep each daily maximum
timeAverage(df, avg.time="day", statistic="max")
##         date      pm2.5 pm2.5_rm3C pm2.5_rm8C pm2.5_rm8L
## 1 2014-09-25   5.947585   3.791743   2.456907   2.456907
## 2 2014-09-26 346.059875 153.072254  61.206863  61.206863
# Expand the time axis to 15-minute inervals and display first twelve rows
timeAverage(df, avg.time="15 min", fill=TRUE)[1:12,]
##                   date pm2.5 pm2.5_rm3C pm2.5_rm8C  pm2.5_rm8L
## 1  2014-09-25 01:00:00     0         NA         NA 0.005912102
## 2  2014-09-25 01:15:00     0         NA         NA 0.005912102
## 3  2014-09-25 01:30:00     0         NA         NA 0.005912102
## 4  2014-09-25 01:45:00     0         NA         NA 0.005912102
## 5  2014-09-25 02:00:00     0         NA         NA 0.101354934
## 6  2014-09-25 02:15:00     0         NA         NA 0.101354934
## 7  2014-09-25 02:30:00     0         NA         NA 0.101354934
## 8  2014-09-25 02:45:00     0         NA         NA 0.101354934
## 9  2014-09-25 03:00:00     0          0         NA 0.147893801
## 10 2014-09-25 03:15:00     0          0         NA 0.147893801
## 11 2014-09-25 03:30:00     0          0         NA 0.147893801
## 12 2014-09-25 03:45:00     0          0         NA 0.147893801

The rollingMean() and timeAverage() functions also both include a ‘data.thresh’ argument that lets you specify the minimum percentage of valid measurements required. Let’s put everything together with a couple of tasks.


Task 1: Location of maximum PM2.5 levles



Task 2: Timeseries plots



Task 3: Space vs. Time averaging