Mapping Weeds in Dryland Wheat: A spectral trajectory approach

Introductions: Into the weeds

When I started this course, I had a central goal in mind: I wanted to use remotely sensed data to identify weeds. What I didn’t have, was a clue as to how I was going to get there. I had a few data sets to work with, and although I had made some limited progress with them, I had not yet been able to ask the kind of spatial questions I knew were possible with them.I had data that I knew As a part of this course, I formed this central goal into two research questions:

  • How does spectral trajectory relate to weed density? Can variation in spectral trajectory be used to distinguish weed species from crop species?
  • Can this information be used to distinguish the spatial extent of weeds in dryland cropping systems? How well do different methods of assessing weed density perform?

Originally, I had planned on using some low altitude aerial imagery I had taken in 2015 for addressing this question. After many many late nights trying to get these data into a form that was suitable for analysis, I decided for the sake of brevity and actually getting something done in this class, I would use an alternative data set.

My alternative data set consisted of 3 kinds of reference data with regards to the location and species of weeds in a 17 acre field NE of Pendleton OR, as well as Landsat 8 images taken over the course of summer 2015. The reference data sets I had consisted of transect data where species level density counts were made, spectral measurements of green foreign material made during harvest, and observations made from the front of a combine at the time of harvest. In the landsat data set I used in this class, I only included cloud free images from the duration of the growing season, 2015.

Boettcher_Geo584

 

Motivations and Relevance:

My goal with the work I had hoped to achieve in this class was to use variation in phenology between weed and crop species to distinguish weed density using time series landsat data. One of the major issues with weeds in cropping systems however, is that they typically represent only a very small fraction of the total cover in a pixel. The revised hypotheses I ended up testing in this class were:

  1. Weeds and crop can be distinguished based on their relative rates of change in greenness (NDVI).
  2. Visual estimates of weed density will be more accurate at estimating weed density than automated inline assessments.

The main reasons why I find these questions interesting, is the fact that species level discrimination in satellite based remote sensing is still such a challenge. In most remote sensing efforts to identify weeds using satellite data generally perform poorly. A major reason for this, is the typically low signal weeds will have in a cropping system. Advancement in the methodologies for identifying species in satellite imagery would be a significant contribution. As well, there have been few attempts at using a time series approach to distinguishing weed to the species level. Most work has concentrated on increasing the spatial resolution or the spectral resolution of imaging efforts for making species level classifications. As well, the data I was working represented as complete a sampling of the field as is reasonably possible. All plant material from the field had been cut and measured by the spectrophotometer. In this way, the spatial resolution of my reference data was very fine.

Methods:

In this class, I ended up attempting to answer these hypotheses using two general approaches that were made available to us in the class. I used hotspot analysis to identify areas of the field that were statistically significant positive values as “weedy”.  I then used the output of the hotspot analysis as a predictor variable in a geographically weighted regression, with NDVI as the explanatory variable. With previous attempts at classification, I typically ended up with poor results when comparing the spectrally generated data set to my reference data set. Hotspot allowed me to overcome this, in that the classification was not as sensitive to extremely high or low values. The improved performance is likely a result of the fact that hotspot analysis takes into consideration the neighborhood of values a measurement resides in. I then used the output of the hotspot analysis as the predictor variable in a geographically weighted regression. With geographically weighted regression, we remove the assumption of stationarity, and fit regression models based on a local neighborhood rather than to the entire dataset. Finally, I classified the output of the regression analysis based on the coefficients for each of the terms of the local linear models. My goal here was to attempt to get at why the relationships between weed density and NDVI appeared to vary as they do.

Results:

The specific details of each of these analyses are available in their individual posts, but overall the results were very promising. I was able to improve the classification accuracy of a spectrophotometric assessment of weed density from ~50% to 85%, as good as a human observer. This results indicates that the in-situ assessment of weed density may be a real possibility in the near future, which has implications for farmers as well as scientists and land managers. Generation of quality reference data is very difficult to do over broad spatial areas, and any way to improve that process is a good thing.

Regarding the output of the geographically weighted regression, I find I’m still grappling with the results. While I did end up with distinct areas of the field having differing responses of weed density to NDVI, i’m finding difficulty in interpreting the output. I think that if additional explanatory variables for why weeds respond the way they do, and if this is predicted by geographically weighted regression, wouldn’t one simply include those variables into their original model for predicting weed density? So while geographically weighted regression may be suitable for a data exploration exercise, I’m not sufficiently convinced of its utility in predicting weed density. More work is necessary to come to a conclusive answer as to why local relationships between NDVI and weed density appear to vary as they do.

 

Learning:

I learned an incredible amount this quarter, mostly from the incredibly high quality presentations and conversations I had with my colleagues in this class. An incredibly high bar was set by the work done by this cohort. Although I didn’t get as far as I might have liked to with these data, this was my first foray into spatial statistics. I had the advantage of being very comfortable in R, but this class forced me to branch out into ArcMap, and even by the end of the quarter, into Python and Earth Engine. One of the major issues I had, was that the data set I was working with was not as high a resolution as I had originally intended. I wanted to be using methods from machine learning on an a very high resolution multispectral data set i’ve been working on, but it wasn’t ready for analysis by the time I needed to produce results.  I ended up doing a lot of work that never saw the light of day, looking at issues in spatial variance and resolution, that with more time, I would have liked to present on.

 

Relationships that vary in space and time: A challenge for simple linear regression

Regression is a fundamental analytical tool for any researcher interested in relating cause and effect. A basic assumption of regression is that of stationarity, the principal that the relationship between a predictor variable and its response is constant across its sample space; that a relationship is true in all regions where it is being applied. This assumption is a particularly poor assumption in spatially based analyses, where we know that interactions may exist between known and often unknown factors in how response variable relates to a given explanatory variable. While this is a  challenge to simple linear regression, it is also what generally makes spatial problems interesting: the fact that relationships are not constant across space and time.

Spatially weighted regression challenges the assumption of stationary in that where simple linear regression develops a single relationship to describes a phenomena, spatially weighted regression allows the relationship to vary spatially. Unhinging the relationship between a explanatory variable and its response spatially creates a set of local coefficient for each instance where an explanatory variable is offered. This is done through the use of a  weighting function. Wherein simple linear regression, each data point assumes equal weight with regards to the final relationship, a weighting function applies greater import to values closer to where a regression point would be calculated.


Spatial weighting function (1)

Fig 1: A spatial weighting function weights data points closer to a  regression point. In this way bandwidths can vary across a feature space, such that two local regression values may be constructed of a different number of data points.


SWR_drawing

Fig 2: Spatially weighted regression allows the relationship between a response and explanatory to vary across a study region.

 

NDVI and weed density: A case study in spatially weighted regression

Normalized difference vegetation index (NDVI) has been used in remote sensing  as a proxy for phenology in many remote sensing and cropping systems studies. NDVI is calculated as the ratio of red to near-infrared light, and is generally related to the amount of green photo-synthetically active tissue. In principal, weeds and crops should be distinguishable based on how their NDVI response varies in time.


Question: Can NDVI be used to predict weed density? Does the relationship between NDVI and weed density vary spatially? 


NDVI_swr

Fig 3: A general hypothesis for how weeds and crop may in their NDVI response. Weeds may mature earlier or later than a crop, but this relationship may also vary spatially.

Here, spatially weighted regression is offered as a method for distinguishing relationships between weed density and NDVI. Allowing the relationship between NDVI and weed density to vary spatially may allow one classify areas of the field based on the nature of these relationships.


NDVI_gif

Fig 4: NDVI over the course of the growing season at a 17 acre field located NE of Pendleton, OR, Summer 2015. Note that the field does not increase or decrease in NDVI evenly, but rather peak NDVI passes as a wave across the field. [ Note: GIF does not appear animated in preview. Click on the GIF directly if animation is not working ]

An important first step in deciding if a data set is suitable for spatially weighted regression is to look at the residuals of the linear relationship you are choosing to model. Here we examine the following function for predicting weed density:

NDVI_form (1)

This function uses 3 samples of NDVI over the course of the growing season, centered around the time of peak NDVI. The purpose of using these 3 times was to try and emphasize time periods where weeds and crop would vary in their relative response in NDVI.


PlantHS_obs

Fig 5: Weed densities were calculated based on linear transects made prior to harvest. Weed hot spots were calculated using the Getis-Ord Gi* statistic in ArcMap. Weed hot spots from a prior analysis were used as input for predicting weed density in this exercise.


PlantHS_glm_pred

Fig 6: Predicted weed density for a multiple regression model based on 3 measurements of NDVI surrounding peak NDVI.  Multiple R2:  0.044, P-value < 0.001

 


PlantHS_glm_resid

Fig 7: Residuals from a multiple regression model based on 3 measurements of NDVI surrounding peak NDVI.  Both high and low residuals cluster spatially, indicating that the relationship between NDVI and weed density may vary spatially and may be a good candidate for geographically weighted regression.


PlantHS_swr_pred

Fig 8: Predicted weed density from a spatially weighted regression. Quasi-global R2: 0.92.

By using a spatially weighted regression, we’re able to account for 92 percent of the variance that occurs in the distribution of weeds in this field. Unlike in a standard regression, the result of this process is a collection of local regression formula. In this sense, the result is not a result that can be easily extrapolated to predict weeds distributions in future data sets. However, these coefficients do offer us the opportunity to look for some spatial patterns that may yield additional information as to what the nature of these local spatial relationships might be


 


Slope_coef

Fig 10:  Map classified by coefficient slope.

Introduction:

Hotspot analysis is a method of statistical analysis for spatially distributed data whereby both the magnitude of a point and its relationship to the points around it are taken into consideration to identify statistically significant extreme values in a data set. While a single spatial data point may represent an extreme value, in the context of weeds management what often matters is being able to identify aggregates of values which taken as a whole represent an extreme condition. Where one large weed may have some impact, many medium and smaller weeds are likely to have a disproportionate impact. This has implications for weed management and weed science, specifically of the variety of techniques available to users for mapping the spatial distribution of invasive weeds, most rely on expert knowledge to decide what constitutes a “weedy” versus a “non-weedy” condition. This consideration is made independently of the geographic context of a value, usually by visually evaluation of the distribution of measured values, and deciding some cutoff, real or perceived, in the distribution of values. These cutoffs are often subjective, and are typically yield poor results when applied to independently generated data sets. Hotspot  analysis offers us a road forward where statistically significant extreme regions can be identified, and various mapping techniques compared by how well they predict these extreme regions.

My goal for this project was to see if hotspot analysis provided a better technique for mapping weeds populations in wheat fields at the time of harvest. Previously, we visually evaluated the histograms, and using some knowledge of how the weeds were distributed to determine cutoffs for “weedy” and “non-weedy” values. Here I wanted to see if I could generate more accurate maps of weed density relative to a reference data set.

Sources of data:

Combine

Fig. 1: Sampling for weed density was done at the time of harvest using a spectrometer in the grain stream of a combine, and with a visual evaluator watching for green weeds during harvest.
In Summer 2015 we harvested a field winter wheat NE of Pendleton, OR using combine outfitted with a spectrometer in the grain. A ratio of red to near infrared reflectance was taken, along with visual observations made by an observer from inside the cab of the combine. Prior to harvest, we walked the field on gridded transects, recording observations of all weeds at the species level. These data were all re-gridded to a common 7m X 7m grid using bilinear interpolation, and brought into ArcGIS for analysis.

Classification:

In ArcGIS, hotspot analysis was conducted using Hotspot analysis tool. The Getis-Ord Gi* statistic identifies statistically significant hot and cold spots in whatever spatially referenced value you use as an feature class for classification. For the purposes of this exercise, positive Getis-Ord Gi* statistics were considered to be “weedy”, while not-significant and negative scores were considered to be “non-weedy”. Cutoffs based on a visual evaluation of histogram, and knowledge of what should represent a weedy condition used to distinguish between “weedy” and “non-weedy” values  and their associated maps classified accordingly.

 

Combine_hotspot_ClassificationCombine_histogram_Classification

Fig 2: Ground reference data set classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.

Spectral_histogram_ClassificationSpectral_Hotspot_Classification

Fig 3: Spectral assessment of weeds distribution classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.

 

Greenwalking_histogram_ClassificationGreenwalking_hotspot_Classification

Fig 4: Visual assessment of weeds distribution classified using Hotspot analysis and histogram classification.  All green values were considered positive for weeds.

Comparing techniques
Classified maps were then compared for accuracy using a confusion matrix approach. Accuracy here is defined as the rate of true positives and true negatives, divided by the total number of samples. Classified maps were also compared for their precision and sensitivity, where precision is the rate of true positives of the condition positives, and precision is the rate of true positives over the predicted positives. Precision addresses the question of if I predict it positive, how often am I correct? Sensitivity addresses the question, when the reference data predicts it positive, how often was my prediction correct?

ERROR_MATRIX

Fig 5: Error assessment comparing spectral assessment of weediness to ground reference data using the histogram classification.RESULTS

Fig 6: Results of the error assessments for all mapping techniques compared with their ground reference data.

Results:

These results show that hotspot analysis was a superior method for image classification when compared with a visual evaluation of the distribution.  This is probably a result of the fact that hotspot analysis takes into consideration the neighborhood of values a measurement resides in. Hotspot analysis increased the accuracy for both the spectral and the on-combine visual assessments. There were mixed results regarding its impacts on sensitivity and precision however. Hotspot classification increased the precision of the visual assessment, but did so at a cost to sensitivity. Conversely, the hotspot technique increased the sensitivity of the spectral assessment, but did so at a cost to its precision. It may be that precision comes at a cost to sensitivity using this method for comparing classifications. In general however, the most substantial gains were in accuracy, which is the most important factor for this mapping effort.

Question:

Foundational to effectiveness of management of invasive species in agricultural and native landscapes is the question of spatial extent and the ability to quantify the impacts of invasive species on agronomic efforts and native vegetation. Invasive plants represent a threat to both agricultural and native landscapes in the form of reduced ecosystem function, increased resource consumption, and reduced yields from agricultural systems. In spite of significant efforts to control and reduce impacts from invasive species, invasive weeds cause an estimated loss of $2 billion annually in the US. Currently, interspecific competition is one of the major limitations for oilseed  and grain production in dryland cropping systems of the Pacific Northwest (PNW). Opportunities for the for the precision monitoring of managed and native ecosystems have become available through the use of low altitude remote sensing systems and high resolution satellite systems. However, methods for resolving species level classification in high resolution multispectral remote sensing systems remain lacking. This is partially due to the relative novelty of these systems, but is also related the lack of suitable reference data at spatial and temporal scales for regionally based models. The broad research question I’m is how does spectral trajectory relate to weed density, and can this information be used to distinguish the spatial extent of weeds in dryland cropping systems? My prediction, is that by increasing the spatial and temporal resolution of these data, crop and non-crop species will be distinguishable based on their relative rate of change in greenness.

The objectives I have for this class are to 1) determine the spatial and temporal resolutions at which weed species are distinguishable from crop species using a spectral trajectory technique, 2) compare these methods with ground reference data in a dryland cropping systems study. The major outcome of this work would be a method for distinguishing weed species from crop species in a dryland environment, and the identification of the minimum temporal resolution for distinguishing species in multispectral imagery.

Dataset:

The data set I have for addressing these questions is a composite of 7 flights of images taken with a multispectral camera in a cropping systems study in Eastern Oregon. Flights were conducted in conjunction with visual estimates of weed density in semi-permanent monitoring frames installed into the cropping systems study. The images are currently at a low level of processing. One of my goals as a part of this project will be to orthomosaic the images such that I can perform a time series analysis across image collection dates. The temporal resolution is from 3-20 days between flights, whiles the spatial resolution is 3 cm. The images cover the entire spatial extent of the experiment.

Hypothesis:

What I plan to do for this experiment is that after generating an orthoraster, I will be able to distinguish between the quantify of weed species and crop species in a frame based on the spectral trajectory of individual pixels in that frame. The question I hope to answer will be how distant in time do sample dates have to be before weed species can be distinguished from crop species based on their spectral trajectory.

Approach:

I plan on using a trial version of Agisoft to orthomosaic my images. It may be possible to conduct this analysis without performing an ortho mosaicing of the images, however, an orthomosaic will have a number of advantages to non-mosaiced images. Without an orthomosaic every individual analysis will have to be hard coded, whereas with an orthomosaic, I can automate much of the processing of these data.

Outcome:

The goal of this analysis will be to identify the minimum time required to discern species, and to describe the statistical relationship between species abundance based on spectral trajectory and ground reference data.

Significance:

While there has been a significant surge in the interest of UAV’s and low altitude remote sensing, the actual number of useful products for land managers to make decisions based on these data is very minimal. This work would identify the minimum temporal resolution a resource manager would need to have before they can identify weed species based on spectral characteristics.

Preparation:

I would say I have minimal experience in Arc-info, model builder and Python. I have moderate to high level experience in R. I would also consider myself to have moderate to high levels of experience in image processing and image analysis.