Test pixelwise correlation between two time series of raster data in R

Satellite time series data are useful for studying biophysical how variables change over time and understanding what causes those changes. Recently, I was looking into correlating two time series datasets over Africa to look at the relationship between net primary production (NPP) and rainfall.

After a futile attempt to find an “out-of-the-box” software package that does this, I created an R function to speed things up. Rather unimaginatively, I called it “gridcorts” (Gridded Correlation for Time Series Raster Data). I’ll apply it to two time series datasets of 11 layers (grids) of annual rainfall and 11 layers annual NPP over Africa, each layer represents a year between 2000 and 2010. Stacking these two datasets results in a raster stack of 22 layers. Both datasets are in 0.05 degree grid cell size, meaning that each layer has 1989897 cells (1443 rows, 1379 columns). For each pixel in a dataset (e.g. rainfall), the function skewers (see example figure above) the entire 11-year time series to get a vector of values then correlates that pixel with the corresponding one in the other dataset (e.g. NPP). Depending on which method you use, this can tell you how two variables are related across time and space.

require(raster)
rstack <- stack(npp,rainfall)
system.time(cg1 <- gridcorts(rasterstack = rstack, method = "spearman", type = "both"))
[1] "Start Gridcorts: 2015-01-01 19:00:19"
[1] "Loading parameters"
[1] "Done loading parameters"
[1] "Initiating loop operation"
======================================================================
[1] "Populating raster brick"
[1] "Ending Gridcorts on 2015-01-01 19:11:07"
user     system  elapsed 
647.018  0.618   647.414 

The code took 647 seconds, or about 11 minutes, on my two year old, quad-core, i5-2450M laptop running 64-bit Ubuntu 14.04 and R 3.1.2. I think that’s a reasonable time to process 2 million pairwise correlations and produce two grids of Spearman’s rank correlation coefficient and significance values. The engine of the function is the stats package’s “cor.test” function that tests for association between two paired samples based on the either parametric Pearson’s r, or the nonparametric Spearman’s rho and Kendall’s tau.

And here’s an example output using Spearman’s rank correlation coefficient. Please note that Spearman’s rho might give you a warning message saying that it cannot compute exact p-value with ties, so if you choose a nonparametric method, I would recommend Kendall’s tau instead. More on this topic..

The function is detailed below. Please feel free to use it and modify it as you see fit. It is also available as a Gist on Github. The data I used to test the function are here. And I’d be very interested to know if any of you find a way to further optimize it, contact me via gmail: hakim.abdi. 

gridcorts <- function(rasterstack, method, type=c("corel","pval","both")){
  # Values for (layers, ncell, ncol, nrow, method, crs, extent) come straight from the input raster stack
  # e.g. nlayers(rasterstack), ncell(rasterstack)... etc.
  print(paste("Start Gridcorts:",Sys.time()))
  print("Loading parameters")
  layers=nlayers(rasterstack);ncell=ncell(rasterstack);
  ncol=ncol(rasterstack);nrow=nrow(rasterstack);crs=crs(rasterstack);
  extent=extent(rasterstack);pb = txtProgressBar(min = 0, max = ncell, initial = 0)
  print("Done loading parameters")
  mtrx <- as.matrix(rasterstack,ncol=layers)
  empt <- matrix(nrow=ncell, ncol=2)
  print("Initiating loop operation")
  if (type == "corel"){
    for (i in 1:ncell){
      setTxtProgressBar(pb,i)
      if (all(is.na(mtrx[i,1:(layers/2)])) | all(is.na(mtrx[i,((layers/2)+1):layers]))){ 
        empt[i,1] <- NA 
      } else 
        if (sum(!is.na(mtrx[i,1:(layers/2)]/mtrx[i,((layers/2)+1):layers])) < 4 ){
          empt[i,1] <- NA 
        } else 
          empt[i,1] <- as.numeric(cor.test(mtrx[i,1:(layers/2)], mtrx[i,((layers/2)+1):layers],method=method)$estimate)
    }
    print("Creating empty raster")
    corel <- raster(nrows=nrow,ncols=ncol,crs=crs)
    extent(corel) <- extent
    print("Populating correlation raster")
    values(corel) <- empt[,1]
    print(paste("Ending Gridcorts on",Sys.time()))
    corel
  } 
  else
    if (type == "pval"){
      for (i in 1:ncell){
        setTxtProgressBar(pb,i)
        if (all(is.na(mtrx[i,1:(layers/2)])) | all(is.na(mtrx[i,((layers/2)+1):layers]))){ 
          empt[i,2] <- NA 
        } else 
          if (sum(!is.na(mtrx[i,1:(layers/2)]/mtrx[i,((layers/2)+1):layers])) < 4 ){
            empt[i,2] <- NA 
          } else 
            empt[i,2] <- as.numeric(cor.test(mtrx[i,1:(layers/2)], mtrx[i,((layers/2)+1):layers],method=method)$p.value)
      }
      pval <- raster(nrows=nrow,ncols=ncol,crs=crs)
      extent(pval) <- extent
      print("Populating significance raster")
      values(pval) <- empt[,2]
      print(paste("Ending Gridcorts on",Sys.time()))
      pval
    }
  else
    if (type == "both"){
      for (i in 1:ncell){
        setTxtProgressBar(pb,i)
        if (all(is.na(mtrx[i,1:(layers/2)])) | all(is.na(mtrx[i,((layers/2)+1):layers]))){ 
          empt[i,] <- NA 
        } else 
          if (sum(!is.na(mtrx[i,1:(layers/2)]/mtrx[i,((layers/2)+1):layers])) < 4 ){
            empt[i,] <- NA 
          } else {
            empt[i,1] <- as.numeric(cor.test(mtrx[i,1:(layers/2)], mtrx[i,((layers/2)+1):layers],method=method)$estimate) 
            empt[i,2] <- as.numeric(cor.test(mtrx[i,1:(layers/2)], mtrx[i,((layers/2)+1):layers],method=method)$p.value)
          }
      }
      c <- raster(nrows=nrow,ncols=ncol,crs=crs)
      p <- raster(nrows=nrow,ncols=ncol,crs=crs)
      print("Populating raster brick")
      values(c) <- empt[,1]
      values(p) <- empt[,2]
      brk <- brick(c,p)
      extent(brk) <- extent
      names(brk) <- c("Correlation","Pvalue")
      print(paste("Ending Gridcorts on",Sys.time()))
      brk
    }
}

UPDATE: This code was modified on April 2nd, 2018 based a suggestion from Tao Xiong, a PhD student at Northeast Normal University in Changchun, China. Specifically, Tao's message was: "The code "if (sum(!is.na(mtrx[i,1:(layers/2)])) < 4 | sum(!is.na(mtrx[i,((layers/2)+1):layers])) < 4)" cannot include all kinds of situations such as (1,NA,3,NA,5,NA,7,8,9) and (NA,2,NA,4,NA,6,7,8,9). I suggest to change it to "if (sum(!is.na(mtrx[i,1:(layers/2)]/mtrx[i,((layers/2)+1):layers])) < 4 )". Thank you Tao, this is much appreciated and I've modified the code.