In lesson 09 we will go through the steps required to create an entirely new SingleValueMetric including various sanity checks and error reporting required in a robust metric algorithm.

Packages and Task Views

R has many packages (6,000+) that are often written by experts in some field of science statistics. By utilizing these packages we get to benefit from the hard work of others. The R website hosts a set of CRAN Task Views, each maintained by a recognized expert, that describe the different packages available in different fields of inquiry. For basic numerical methods, the Task View on Numerical Mathematics describes packages that deal with Linear Algebra, Special Functions, Polynomials, Differentiation and Integration, etc. There are even packages with matlab functions that have been ported to R. The IRISSeismic package already utilizes functions from the pracma and signal packages mentioned in this Task View.

There is also a Task View for Time Series Analysis that may have useful packages for analyzing seismic signals.

Note 1: Many of the packages in Time Series Analysis are not tailored to the needs of high-resolution signal processing. Instead, they often focus on statistically robust analysis of timeseries data with many fewer data points.

Note 2: The RSEIS package by Jonathan Lees defines a suite of functionality for working with seismic data but also creates an entire interactive environment. While this package may contain source code defining useful functions, the package functions are too tightly bound to the RSEIS internal data structure to be useful in the context of the seismic~ packages.

White Noise

To create a new metric we first need to come up with something we wish to measure. One of they many ways in which a seismic channel can go bad is when the generated signal contains mostly white noise. It would be useful to spot check seismic data for white noise, perhaps multiple times per day and come up with a metric that measures the likelihood that white noise occcurred on that day.

Doing a simple search for “white noise” on the Time Series Analysis Task View we see:

Tests of white noise using wavelets are provided by hwwntest.

Quickly perusing the package manual we see that this package references a paper published in December 2014 in the journal Stat: White noise testing using wavelets. From the abstract:

Testing whether a time series is consistent with white noise is an important task within time series analysis and for model fitting and criticism via residual diagnostics. We introduce three fast and efficient white noise tests that assess spectral constancy via the wavelet coefficients of a periodogram. The Haar wavelet white noise test derives the exact distribution of the Haar wavelet coefficients of the asymptotic periodogram under mild conditions. The single-coefficient white noise test uses a single Haar wavelet coefficient obtaining a test statistic as a linear combination of odd-indexed autocorrelations. The general wavelet white noise test uses compactly supported Daubechies wavelets, shows that its coefficients are asymptotically normal and derives its theoretical power for an arbitrary spectrum. All our tests are available in the freely available hwwntest package for the R system. We present a comprehensive simulation study that shows the good performance of our new tests against alternatives commonly found in available software and show an example applied to a wind power time series. © 2014 The Authors. Stat published by John Wiley & Sons Ltd.

Sounds promising!

‘hwwntest’ package

After installing the hwwntest package with RStudio we can review available functions in the ‘Help’ tab.

We will short circuit the learning process and suggest that we use the genwwn.test() function. Here is a quick example to show what it does.

library(hwwntest)

# Create two time series, one white noise and one generated by an ARIMA model
# NOTE:  time series must have a length that is a power of 2
white <- rnorm(1024)
model <- arima.sim(n=1024, model=list(ar=0.8))
mixed <- white + 0.2*model

# Try out the test
genwwn.test(white)
## 
##  Wavelet Test of White Noise
## 
## data:  
## p-value = 0.688
genwwn.test(model)
## 
##  Wavelet Test of White Noise
## 
## data:  
## p-value < 2.2e-16
genwwn.test(mixed)
## 
##  Wavelet Test of White Noise
## 
## data:  
## p-value = 0.1009
# What does genwwn.test return?
whiteTest <- genwwn.test(white)
names(whiteTest)
## [1] "p.val.collector" "p.val.adjust"    "p.value"         "method"
# Capture the p-values
whitePValue <- genwwn.test(white)$p.value
modelPValue <- genwwn.test(model)$p.value
mixedPValue <- genwwn.test(mixed)$p.value

# Set up a 'stacked' plot
layout(matrix(seq(3)))
par(mar=c(3,4,4,2)+0.1)

# Plot data
plot(white,type='p',las=1)
title(paste0('White Noise, p-value=',sprintf("%g",whitePValue)),line=2.5)
mtext('NEAR CERTAINTY this signal contains white noise',side=3,line=0.5,col='black')
plot(model,type='p',las=1)
title(paste0('Model, p-value=',sprintf("%g",modelPValue)),line=2.5)
mtext('NEAR CERTAINTY this signal does NOT contain white noise',side=3,line=.5,col='black')
plot(mixed,type='p',las=1)
title(paste0('White + 0.2*Model, p-value=',sprintf("%g",mixedPValue)),line=2.5)
mtext('GOOD CHANCE this contains white noise',side=3,line=.5,col='black')

# Restore single plot layout
par(mar=c(5,4,4,2)+0.1)
layout(1)

creating a white noise metric

Before we start coding, we need to design our metric function. All metric functions that define a SingleValueMetric should accept a Stream object as their first argument. Typically, the function will be called with an entire day’s worth of data for a particular SNCL. After the Stream object, additional arguments may be passed in that allow us to modify the calculation of our metric.

Here is some pseudo-code that will describe our basic plan:

  1. mergeTraces in the Strem
  2. extract the first (only) trace
  3. examine the TraceHeader to determine the sampling rate
  4. determine starting indices associated with each hour
  5. create an empty vector of p-values, one for each hour
  6. loop through the hours
  7. calculate genwwn.test p-value for 1024 points at each hour
  8. convert p-values to integer codes using .bincode()
  9. use the maximum integer code for the day as our SingleValueMetric

Task 1: Create white noise metric

  • create a new R Script in RStudio calleld ‘whiteNoiseMetric.R’
  • define a function that accepts the following arguments: (st, sampleLength=1024, incrementSecs=3600, breaks=c(0,1e-12,1e-10,1e-8,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1))
  • use seismicMetrics::DCOffsetTimeseriesMetric and seismicMetrics::basicStatsMetric as starting points for creating your function
  • write code to implement the basic plan
  • use st <- getSNCL(iris,"IU.GRFO..BHZ", as.POSIXct("2011-05-18",tz="GMT"), as.POSIXct("2011-05-19",tz="GMT")) to test your function


Task 2: Examine p-value codes

  • modify your function to return the vector of p-value codes
  • calculate p-value codes for “IU.GRFO..BHZ” for May and June, 2011
  • plot pValueCodes
  • use knowledge of hourly sampling to determine when the first and last non-zero codes were
  • check out other metrics for this period in the MUSTANG Databrowser
  • pick your own favorite problematic SNCL to examine the distributionof p-value codes


Task 3: Collaborative Improvement of the white noise metric

  • get together with a colleague to discus the pros and cons of our white noise metric code
  • suggest changes and try them out in your code
  • add error handling as needed
  • create plots showing the behavior of the original metric and your modified version

ggplot2 < prev | next > Modifying ‘seismic’ Packages