Mapping Weeds in Dryland Wheat: A spectral trajectory approach
Introductions: Into the weeds
When I started this course, I had a central goal in mind: I wanted to use remotely sensed data to identify weeds. What I didn’t have, was a clue as to how I was going to get there. I had a few data sets to work with, and although I had made some limited progress with them, I had not yet been able to ask the kind of spatial questions I knew were possible with them.I had data that I knew As a part of this course, I formed this central goal into two research questions:
- How does spectral trajectory relate to weed density? Can variation in spectral trajectory be used to distinguish weed species from crop species?
- Can this information be used to distinguish the spatial extent of weeds in dryland cropping systems? How well do different methods of assessing weed density perform?
Originally, I had planned on using some low altitude aerial imagery I had taken in 2015 for addressing this question. After many many late nights trying to get these data into a form that was suitable for analysis, I decided for the sake of brevity and actually getting something done in this class, I would use an alternative data set.
My alternative data set consisted of 3 kinds of reference data with regards to the location and species of weeds in a 17 acre field NE of Pendleton OR, as well as Landsat 8 images taken over the course of summer 2015. The reference data sets I had consisted of transect data where species level density counts were made, spectral measurements of green foreign material made during harvest, and observations made from the front of a combine at the time of harvest. In the landsat data set I used in this class, I only included cloud free images from the duration of the growing season, 2015.
Motivations and Relevance:
My goal with the work I had hoped to achieve in this class was to use variation in phenology between weed and crop species to distinguish weed density using time series landsat data. One of the major issues with weeds in cropping systems however, is that they typically represent only a very small fraction of the total cover in a pixel. The revised hypotheses I ended up testing in this class were:
- Weeds and crop can be distinguished based on their relative rates of change in greenness (NDVI).
- Visual estimates of weed density will be more accurate at estimating weed density than automated inline assessments.
The main reasons why I find these questions interesting, is the fact that species level discrimination in satellite based remote sensing is still such a challenge. In most remote sensing efforts to identify weeds using satellite data generally perform poorly. A major reason for this, is the typically low signal weeds will have in a cropping system. Advancement in the methodologies for identifying species in satellite imagery would be a significant contribution. As well, there have been few attempts at using a time series approach to distinguishing weed to the species level. Most work has concentrated on increasing the spatial resolution or the spectral resolution of imaging efforts for making species level classifications. As well, the data I was working represented as complete a sampling of the field as is reasonably possible. All plant material from the field had been cut and measured by the spectrophotometer. In this way, the spatial resolution of my reference data was very fine.
Methods:
In this class, I ended up attempting to answer these hypotheses using two general approaches that were made available to us in the class. I used hotspot analysis to identify areas of the field that were statistically significant positive values as “weedy”. I then used the output of the hotspot analysis as a predictor variable in a geographically weighted regression, with NDVI as the explanatory variable. With previous attempts at classification, I typically ended up with poor results when comparing the spectrally generated data set to my reference data set. Hotspot allowed me to overcome this, in that the classification was not as sensitive to extremely high or low values. The improved performance is likely a result of the fact that hotspot analysis takes into consideration the neighborhood of values a measurement resides in. I then used the output of the hotspot analysis as the predictor variable in a geographically weighted regression. With geographically weighted regression, we remove the assumption of stationarity, and fit regression models based on a local neighborhood rather than to the entire dataset. Finally, I classified the output of the regression analysis based on the coefficients for each of the terms of the local linear models. My goal here was to attempt to get at why the relationships between weed density and NDVI appeared to vary as they do.
Results:
The specific details of each of these analyses are available in their individual posts, but overall the results were very promising. I was able to improve the classification accuracy of a spectrophotometric assessment of weed density from ~50% to 85%, as good as a human observer. This results indicates that the in-situ assessment of weed density may be a real possibility in the near future, which has implications for farmers as well as scientists and land managers. Generation of quality reference data is very difficult to do over broad spatial areas, and any way to improve that process is a good thing.
Regarding the output of the geographically weighted regression, I find I’m still grappling with the results. While I did end up with distinct areas of the field having differing responses of weed density to NDVI, i’m finding difficulty in interpreting the output. I think that if additional explanatory variables for why weeds respond the way they do, and if this is predicted by geographically weighted regression, wouldn’t one simply include those variables into their original model for predicting weed density? So while geographically weighted regression may be suitable for a data exploration exercise, I’m not sufficiently convinced of its utility in predicting weed density. More work is necessary to come to a conclusive answer as to why local relationships between NDVI and weed density appear to vary as they do.
Learning:
I learned an incredible amount this quarter, mostly from the incredibly high quality presentations and conversations I had with my colleagues in this class. An incredibly high bar was set by the work done by this cohort. Although I didn’t get as far as I might have liked to with these data, this was my first foray into spatial statistics. I had the advantage of being very comfortable in R, but this class forced me to branch out into ArcMap, and even by the end of the quarter, into Python and Earth Engine. One of the major issues I had, was that the data set I was working with was not as high a resolution as I had originally intended. I wanted to be using methods from machine learning on an a very high resolution multispectral data set i’ve been working on, but it wasn’t ready for analysis by the time I needed to produce results. I ended up doing a lot of work that never saw the light of day, looking at issues in spatial variance and resolution, that with more time, I would have liked to present on.