The analysis portion of this project did not prove to be effective, and ultimately resulted in no results. Let me use this tutorial to walk you through a cautionary tale about being just a little too determined to make an idea work with the dataset that you have.
The Question:
After doing a lot of research on both early archaeological sites, and sites that contain Pleistocene megafauna in the Willamette Valley, a few patterns seemed to emerge. Megafauna sites seemed to occur in peat bogs, and all of the earliest archaeological sites occurred on the margins of wetlands. This led me to begin to ponder that maybe there is a connection between the soil type, pH, and other factors across all of the known sites. What variables might be useful to predicting the locations of other yet to be discovered archaeological and megafauna sites throughout the valley?
The Tools:
In order to conduct this exploration, I decided to use the tools that were built into ArcGIS. Hotspot analysis and regression were going to be the main two tools that were going to be used.
For the data, I found a SSURGO dataset that was in vector format. It contained polygons of all of the mapped soil units in the valley, as well as a variety of factors related to slope, parent material, order, etc. Eventually I switched gears and found another SSURGO dataset that was in raster format and contained a whole lot more data, hoping that this change in dataset would make the analysis much easier.
The Steps:
The first step that I took when conducting this analysis was mapping different variables out and looking at them comparatively, to see if there were any obvious patterns to emerge. Three different soils popped up as looking like they were important when considering the late Pleistocene in the Willamette Valley.
The mapping revealed that there were three soil types that seemed to appear at most of the known sites.
Below is a map of all of the known Pleistocene megafauna and early Holocene archaeological sites in the valley.
There were 3 major soil types that emerged associated with local sites. The Labish, Bashaw, and McBee soil types.
The Labish Soil was especially interesting, as it only seemed to occur at the major peat bearing sites in the valley, most of which were drained lakes that are currently used for crops.
After reading about the nature of soil pH in wetland deposits, I began to hypothesize that pH would have been an important variable in the soils that I had identified, and wanted to use this knowledge to find more sites through hotspot analysis and/or regression analysis.
The Results:
The results are the toughest part to discuss, as there was not much to show for results.
Many attempts at successfully running regression analysis were made, using a wide variety of different combinations of data, but all of it returned an error of perfect multicolinearity, resulting in fails across the board. The analysis was attempted using both the vector and raster form of the data, using built-in pH data, pH data that was acquired elsewhere and added to the data, as well as combinations of variables.
As I began to explore the dataset further, I realized that the data, while initially appearing to be incredibly varied, was in fact quite the same. I mapped out the soil orders and Great Groups in the valley and realized that each of the maps looked strikingly similar, which was telling me that (as was mentioned in class), all of the data was likely extrapolated from a few key points.
Soil Order:
Soil Great Group:
Aside from a few differences, both of the maps are extremely similar, which is telling me that this data is more than likely, as mentioned above, extrapolated across a large landscape.
This realization made me doubt the pH data as well, so I mapped that out as well.
Soil pH Map:
The valley soils appear to be fairly neutral, and only vary between 5.7 and 6.6
This would make it very difficult to use some sort of exploratory statistical analysis on this dataset, as there wasn’t much variability.
In order to look at how the pH was distributed throught the valley, I ran a hotspot analysis as well as a Moran’s I analysis.
pH Hotspot Analysis Map:
pH Moran’s I Analysis Results:
As you can see, the data is extremely clustered, especially in my particular areas of interest, which are the valley floor.
Was this useful?
This analysis was useful, but for a different reason than was expected. The SSURGO dataset is not the best tool for soil landscape analysis at a smaller scale. Throughout the class, I have seen other statewide projects that were a lot more successful due to higher variability in soils between the east and west sides of the state.
I became a tad too determined to run this kind of analysis, and the results were completely inconclusive in that respect, but in the end, the most beneficial part of the analysis was figuring out that there are likely connections between my sites of interest. In order to investigate these connections, physical testing is likely the most reliable source, since the SSURGO data is not reliable for this purpose.
Also, don’t rely on your data too much. It might mess with your head a bit!