Introduction:
Hotspot analysis is a method of statistical analysis for spatially distributed data whereby both the magnitude of a point and its relationship to the points around it are taken into consideration to identify statistically significant extreme values in a data set. While a single spatial data point may represent an extreme value, in the context of weeds management what often matters is being able to identify aggregates of values which taken as a whole represent an extreme condition. Where one large weed may have some impact, many medium and smaller weeds are likely to have a disproportionate impact. This has implications for weed management and weed science, specifically of the variety of techniques available to users for mapping the spatial distribution of invasive weeds, most rely on expert knowledge to decide what constitutes a “weedy” versus a “non-weedy” condition. This consideration is made independently of the geographic context of a value, usually by visually evaluation of the distribution of measured values, and deciding some cutoff, real or perceived, in the distribution of values. These cutoffs are often subjective, and are typically yield poor results when applied to independently generated data sets. Hotspot analysis offers us a road forward where statistically significant extreme regions can be identified, and various mapping techniques compared by how well they predict these extreme regions.
My goal for this project was to see if hotspot analysis provided a better technique for mapping weeds populations in wheat fields at the time of harvest. Previously, we visually evaluated the histograms, and using some knowledge of how the weeds were distributed to determine cutoffs for “weedy” and “non-weedy” values. Here I wanted to see if I could generate more accurate maps of weed density relative to a reference data set.
Sources of data:
Fig. 1: Sampling for weed density was done at the time of harvest using a spectrometer in the grain stream of a combine, and with a visual evaluator watching for green weeds during harvest.
In Summer 2015 we harvested a field winter wheat NE of Pendleton, OR using combine outfitted with a spectrometer in the grain. A ratio of red to near infrared reflectance was taken, along with visual observations made by an observer from inside the cab of the combine. Prior to harvest, we walked the field on gridded transects, recording observations of all weeds at the species level. These data were all re-gridded to a common 7m X 7m grid using bilinear interpolation, and brought into ArcGIS for analysis.
Classification:
In ArcGIS, hotspot analysis was conducted using Hotspot analysis tool. The Getis-Ord Gi* statistic identifies statistically significant hot and cold spots in whatever spatially referenced value you use as an feature class for classification. For the purposes of this exercise, positive Getis-Ord Gi* statistics were considered to be “weedy”, while not-significant and negative scores were considered to be “non-weedy”. Cutoffs based on a visual evaluation of histogram, and knowledge of what should represent a weedy condition used to distinguish between “weedy” and “non-weedy” values and their associated maps classified accordingly.
Fig 2: Ground reference data set classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.
Fig 3: Spectral assessment of weeds distribution classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.
Fig 4: Visual assessment of weeds distribution classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.
Comparing techniques
Classified maps were then compared for accuracy using a confusion matrix approach. Accuracy here is defined as the rate of true positives and true negatives, divided by the total number of samples. Classified maps were also compared for their precision and sensitivity, where precision is the rate of true positives of the condition positives, and precision is the rate of true positives over the predicted positives. Precision addresses the question of if I predict it positive, how often am I correct? Sensitivity addresses the question, when the reference data predicts it positive, how often was my prediction correct?
Fig 5: Error assessment comparing spectral assessment of weediness to ground reference data using the histogram classification.
Fig 6: Results of the error assessments for all mapping techniques compared with their ground reference data.
Results:
These results show that hotspot analysis was a superior method for image classification when compared with a visual evaluation of the distribution. This is probably a result of the fact that hotspot analysis takes into consideration the neighborhood of values a measurement resides in. Hotspot analysis increased the accuracy for both the spectral and the on-combine visual assessments. There were mixed results regarding its impacts on sensitivity and precision however. Hotspot classification increased the precision of the visual assessment, but did so at a cost to sensitivity. Conversely, the hotspot technique increased the sensitivity of the spectral assessment, but did so at a cost to its precision. It may be that precision comes at a cost to sensitivity using this method for comparing classifications. In general however, the most substantial gains were in accuracy, which is the most important factor for this mapping effort.