Relationships that vary in space and time: A challenge for simple linear regression
Regression is a fundamental analytical tool for any researcher interested in relating cause and effect. A basic assumption of regression is that of stationarity, the principal that the relationship between a predictor variable and its response is constant across its sample space; that a relationship is true in all regions where it is being applied. This assumption is a particularly poor assumption in spatially based analyses, where we know that interactions may exist between known and often unknown factors in how response variable relates to a given explanatory variable. While this is a challenge to simple linear regression, it is also what generally makes spatial problems interesting: the fact that relationships are not constant across space and time.
Spatially weighted regression challenges the assumption of stationary in that where simple linear regression develops a single relationship to describes a phenomena, spatially weighted regression allows the relationship to vary spatially. Unhinging the relationship between a explanatory variable and its response spatially creates a set of local coefficient for each instance where an explanatory variable is offered. This is done through the use of a weighting function. Wherein simple linear regression, each data point assumes equal weight with regards to the final relationship, a weighting function applies greater import to values closer to where a regression point would be calculated.
Fig 1: A spatial weighting function weights data points closer to a regression point. In this way bandwidths can vary across a feature space, such that two local regression values may be constructed of a different number of data points.
Fig 2: Spatially weighted regression allows the relationship between a response and explanatory to vary across a study region.
NDVI and weed density: A case study in spatially weighted regression
Normalized difference vegetation index (NDVI) has been used in remote sensing as a proxy for phenology in many remote sensing and cropping systems studies. NDVI is calculated as the ratio of red to near-infrared light, and is generally related to the amount of green photo-synthetically active tissue. In principal, weeds and crops should be distinguishable based on how their NDVI response varies in time.
Question: Can NDVI be used to predict weed density? Does the relationship between NDVI and weed density vary spatially?
Fig 3: A general hypothesis for how weeds and crop may in their NDVI response. Weeds may mature earlier or later than a crop, but this relationship may also vary spatially.
Here, spatially weighted regression is offered as a method for distinguishing relationships between weed density and NDVI. Allowing the relationship between NDVI and weed density to vary spatially may allow one classify areas of the field based on the nature of these relationships.
Fig 4: NDVI over the course of the growing season at a 17 acre field located NE of Pendleton, OR, Summer 2015. Note that the field does not increase or decrease in NDVI evenly, but rather peak NDVI passes as a wave across the field. [ Note: GIF does not appear animated in preview. Click on the GIF directly if animation is not working ]
An important first step in deciding if a data set is suitable for spatially weighted regression is to look at the residuals of the linear relationship you are choosing to model. Here we examine the following function for predicting weed density:
This function uses 3 samples of NDVI over the course of the growing season, centered around the time of peak NDVI. The purpose of using these 3 times was to try and emphasize time periods where weeds and crop would vary in their relative response in NDVI.
Fig 5: Weed densities were calculated based on linear transects made prior to harvest. Weed hot spots were calculated using the Getis-Ord Gi* statistic in ArcMap. Weed hot spots from a prior analysis were used as input for predicting weed density in this exercise.
Fig 6: Predicted weed density for a multiple regression model based on 3 measurements of NDVI surrounding peak NDVI. Multiple R2: 0.044, P-value < 0.001
Fig 7: Residuals from a multiple regression model based on 3 measurements of NDVI surrounding peak NDVI. Both high and low residuals cluster spatially, indicating that the relationship between NDVI and weed density may vary spatially and may be a good candidate for geographically weighted regression.
Fig 8: Predicted weed density from a spatially weighted regression. Quasi-global R2: 0.92.
By using a spatially weighted regression, we’re able to account for 92 percent of the variance that occurs in the distribution of weeds in this field. Unlike in a standard regression, the result of this process is a collection of local regression formula. In this sense, the result is not a result that can be easily extrapolated to predict weeds distributions in future data sets. However, these coefficients do offer us the opportunity to look for some spatial patterns that may yield additional information as to what the nature of these local spatial relationships might be
Fig 10: Map classified by coefficient slope.