After considerable experimentation with a variety of ArcGIS’s Spatial Statistics tools, including Hot Spot Analysis, Cluster Analysis, Spatial Autocorrelation, Geographically Weighted Regression, and Ordinary Least Squares, I think I may have found a viable method for analyzing my SSURGO Soils dataset. For my final class presentation for this course, I employed the Grouping Analysis tool to explore the spatial patterns and clusters of high clay content within the sub-AVAs of the northern Willamette Valley. The visual correspondence between the resulting groups and the soil orders (i.e. taxonomy) was surprisingly accurate.
Reading through the literature on ESRI’s webpage about Grouping Analysis, I learned that one should start the Grouping Analysis using one variable, incrementally adding more with each subsequent run of the analysis. Following suit, I have experimented with both the addition of more variables as well as the total number of ideal groups for a given data set. While the soils present in each of the sub-AVAs are incredibly heterogenous and diverse, they do share some similarities, particularly with regard to clay content and soil taxonomy.
The results published here reflect an analysis using the variables of percent clay content, Ksat, Available Water Storage at 25cm, 50cm, and 150cm, respectively; choosing to parse the data into 5 groups. I also took advantage of the “Evaluate Optimal Number of Groups parameter” option within the toolbox, which generates additional statistics meant to identify the number of groups that will most readily distinguish one’s data set into distinct groups.
In addition, I generated Output Report Files with each run so that I could explore the statistical results in more depth. I’ve attached these for those of you who are interested in seeing what the results look like. I find it interesting that for almost all of my AVA data sets save for one, the resulting reports are suggesting that 15 is the optimal number of groups. I’m not sure if this is because 15 is the maximum number of groups that the tool can generate, or if this is a result of the particular variables I am using as inputs.
Additional variables that I plan on adding include percent sand, percent silt, bulk density, percent organic matter, and parent material. I am also considering incorporating raster data sets of slope, aspect, landform, vegetation zone, precipitation, minimum temperature, and maximum temperature. Performing multiple iterations of the Grouping Analysis will help me to identify a suitable combination of these variables, as well as the optmimal number of groups. Once those have been identified, I plan on performing the same analysis on each AVA, and then on buffered polygons of the AVAs at distances of 500m, 1000m, 1500m, 2000m, 2500m, and 3000m. In so doing, I hope to identify the degree to which different sub-AVAs in the northern Willamette Valley differ from directly adjacent landscapes. This will allow me to articulate those sub-AVAs which best correspond to the underlying soil classes in those areas.