After considerable experimentation with a variety of ArcGIS’s Spatial Statistics tools, including Hot Spot Analysis, Cluster Analysis, Spatial Autocorrelation, Geographically Weighted Regression, and Ordinary Least Squares, I think I may have found a viable method for analyzing my SSURGO Soils dataset. For my final class presentation for this course, I employed the Grouping Analysis tool to explore the spatial patterns and clusters of high clay content within the sub-AVAs of the northern Willamette Valley. The visual correspondence between the resulting groups and the soil orders (i.e. taxonomy) was surprisingly accurate.

Reading through the literature on ESRI’s webpage about Grouping Analysis, I learned that one should start the Grouping Analysis using one variable, incrementally adding more with each subsequent run of the analysis. Following suit, I have experimented with both the addition of more variables as well as the total number of ideal groups for a given data set. While the soils present in each of the sub-AVAs are incredibly heterogenous and diverse, they do share some similarities, particularly with regard to clay content and soil taxonomy.

Northern_Willamette_Valley_AVA_Soil_Taxonomy_20130612Northern_Willamette_Valley_AVA_Grouping_Analysis_20130612

The results published here reflect an analysis using the variables of percent clay content, Ksat, Available Water Storage at 25cm, 50cm, and 150cm, respectively; choosing to parse the data into 5 groups. I also took advantage of the “Evaluate Optimal Number of Groups parameter” option within the toolbox, which generates additional statistics meant to identify the number of groups that will most readily distinguish one’s data set into distinct groups.

In addition, I generated Output Report Files with each run so that I could explore the statistical results in more depth. I’ve attached these for those of you who are interested in seeing what the results look like. I find it interesting that for almost all of my AVA data sets save for one, the resulting reports are suggesting that 15 is the optimal number of groups. I’m not sure if this is because 15 is the maximum number of groups that the tool can generate, or if this is a result of the particular variables I am using as inputs.

chehalem_grouping5

dundee_grouping5

eola_amity_grouping5

ribbon_ridge_grouping5

yamhill_carlton_grouping5

Additional variables that I plan on adding include percent sand, percent silt, bulk density, percent organic matter, and parent material. I am also considering incorporating raster data sets of slope, aspect, landform, vegetation zone, precipitation, minimum temperature, and maximum temperature. Performing multiple iterations of the Grouping Analysis will help me to identify a suitable combination of these variables, as well as the optmimal number of groups. Once those have been identified, I plan on performing the same analysis on each AVA, and then on buffered polygons of the AVAs at distances of 500m, 1000m, 1500m, 2000m, 2500m, and 3000m. In so doing, I hope to identify the degree to which different sub-AVAs in the northern Willamette Valley differ from directly adjacent landscapes. This will allow me to articulate those sub-AVAs which best correspond to the underlying soil classes in those areas.

The following screenshots are the results that I have generated using Hot Spot Analysis, Anselin Moran’s and Global Moran’s I to investigate the clustering of soils with high clay content in the six sub-AVAs (Chehalem Mountains, Ribbon Ridge, Dundee Hills, Yamhill-Carlton, McMinnville, and Eola-Amity Hills) of the northern Willamette Valley. I have created quite a few data sets, and am in the process of identifying useful methods for further interogation of my data. Along those lines, I need some feedback regarding the interpretation of these results – any comments would be greatly appreciated.

Percent_clay_Location_Map_of_the_entire_Willamette_Valley_AVA

Percent clay Location Map of the entire Willamette Valley AVA

Percent_clay_of_the_entire_Willamette_Valley_AVA

Percent clay of the entire Willamette Valley AVA (including the six sub-AVAs in the northern portion of the Willamette Valley)

Percent_clay_DETAIL

Percent Clay detail of the northern Willamette Valley

Hot_Spot_clay_ZScore

Hot Spot Analysis (GiZScore) of Percent Clay; detailed

Hot_Spot_clay_PValue

Hot Spot Analysis (GiPValue) of Percent Clay; detailed

Anselin_Morans_clay_cluster_outlier_type

Anselin Moran’s (Cluster/Outlier Type) of Percent Clay; detailed

Anselin_Morans_clay_ZScore

Anselin Moran’s (LMiZScore) of Percent Clay; detailed

anselin_morans_PValue

Anselin Moran’s (LMiPValue) of Percent Clay; detailed

global_morans_I_clay_1000   global_morans_I_clay_5000

global_morans_I_clay_10000global_morans_I_clay_15000

Global Moran’s I using a fixed distance of 1,000 meters, 5,000 meters, 10,000 meters, and 15,000 meters

The following is the abstract of the paper I presented earlier this month at AAG:

The specific geography of individual wine growing regions has long been understood to be a significant factor in predicting both a region’s success in producing high quality grapes, and the resulting demand for wines produced from that region’s fruit. In the American wine industry, American Viticultural Areas (AVAs) are increasingly being used to designate a uniqueness and specificity of place. This process is often predicated on the argument that these areas represent a certain degree of physiographic uniformity or homogeneity. This is particularly the case with regard to the phenomenon of sub-AVAs, wherein smaller areas within large, spatially heterogeneous AVAs seek to differentiate themselves based on the physiographic features that are purportedly unique to those smaller subregions. In many cases, there is a strong correlation between soil classes and AVA boundaries, whereas in other cases the correlation is not as strong. This suggests that there are factors other than physiographic homogeneity contributing to the designation of these sub-AVAs. This study employs GIS and spatial analysis to examine and potentially correlate the soil classes of Oregon’s northern Willamette Valley with the sub-AVAs in that area. In doing so, this study presents maps and statistical results in order to provide a quantitative summary of the geographic context of vineyards in this region with respect to both the soil classes present and the federally designated AVA boundaries in which they are located.

 

About my data and my spatial problem:

The data set that I am working with is a legacy National Resources Conservation Service (NRCS) data set detailing soil classes throughout Oregon’s Willamette Valley. Using meets and bounds descriptions provided by the United States Department of the Treasury’s Alcohol and Tobacco Tax and Trade Bureau (TTB), the Federal entity tasked with approving AVA designation petitions, I have generated a series of polygons representing the Willamette Valley AVAs (Willamette Valley and its 6 sub-AVAs: Chehalem Mountains, Ribbon Ridge, Dundee Hills, Yamhill-Carlton, McMinnville, and Eola-Amity Hills). I also have a handful of raster data layers (slope, aspect, landform, lithology, and PRISM) that I am using to calculate zonal statistics. Many spatial statistical methods are designed around the use of point data – this poses a problem for me because all of my data is in either a vector polygon or raster format. I am interested in exploring which methods/tools within the Spatial Statistics toolbox are most appropriate for using with my data. I am also interested in getting feedback from others in this course so as to make my research more robust, defensible, and statistically sound.

-Doug