Note – This is largely an edited transcript of my final presentation, which was pre-recorded and uploaded to the internet as a video-blog. You can see it here.
Background
The Valley of Oaxaca, Mexico was home to one of the first civilizations in the New World, the Zapotec State. Founded around 500 BC with the construction of the hill-top city of Monte Albán, the Zapotec State persisted for more than 1000 years before finally dissolving around AD 850. My research focuses on pottery production and exchange at the height of Zapotec Civilization during the Late Classic (AD 550 – 850), when we suspect that large secondary centers such Dainzú-Macuilxóchitl and Jalieza were beginning to break away from the capital at Monte Albán.
Figure 1: Late Classic settlement patterns in the Valley of Oaxaca, showing major sites mentioned in the text.
Our study of pottery exchange networks is intended to address how politically integrated these communities were by assessing their degree of economic interaction. To do this, we analyze elemental composition of ancient pottery recovered from excavations at these sites, and compare this to similar data for clay samples collected from across the region during the Oaxaca Clay Survey in 2007 and 2012. Figure 2 shows clay sampling locations relative to regional geology. As you can see, the geology of the area is quite complex, contributing to stark differences in the elemental composition of clay resources from different areas. The majority of our samples however were collected from the alluvial sediments from the valley floor, where sediments derived from multiple parent materials have mixed and altered through soil formation processes. Yet we know little about how interactions between these factors influence the elemental clay composition. This is important because we cannot sample every clay deposit in the region, but wish to identify possible production locales between our sampling locations.
Figure 2: Oaxaca Clay Survey sampling locations relative to regional geology.
Research Questions
- How do concentrations of individual elements in Oaxaca clays pattern spatially across the region?
- How are concentrations of individual elements related to environmental factors such as parent material, soil development, and the transport of sediment through alluvial systems?
Because this is an exploratory study, analyses will be limited to five major and minor elements measured for 137 clay samples collected from the eastern, Tlacolula arm of the valley. These elements include aluminum (Al), calcium (Ca), sodium (Na), Potassium (K), and iron (Fe).
Data
This study relies on three datasets:
- Measurements of aluminum (Al), calcium (Ca), sodium (Na), potassium (K), and iron (Fe) concentrations in 137 clay samples form the Tlacolula Valley. These data will act as response variables in regression analyses.
- Spectral reflectance and emissivity data for 13 bands of ASTER imagery, as measured at each clay sampling location during a cloud-free period in June of 2001 at a band-dependent spatial resolution of 15 to 90m (Figure 3). These data will act as proxy measures of surface lithology and serve as one set of predictor variables for clay chemistry.
- Elevation, slope, and flow accumulation estimates for each clay sampling location derived from a 90m DEM covering the Tlacolula Valley. Elevation and slope are good proxy measures for the differential processes affecting alluvial and residual soils, while flow accumulation provides a direct measure of a given location in the valley’s stream network. These data will act as a second set of predictor variables for clay chemistry.
Figure 3: Oaxaca Clay Survey sampling locations over ASTER image of the Tlacolula Valley
Hypotheses
H1 – Concentrations of individual elements will be spatially auto-correlated due to their association with different, spatially segregate parent materials.
H2 – The elemental composition of residual and alluvial clay deposits in the Tlacolula Valley, Oaxaca, Mexico, will be strongly conditioned by geologic parent material (as measured through spectral reflectance and emissivity in ASTER imagery) in both alluvial and residual soils.
H3 – The elemental composition of clay deposits will be further affected by processes of erosion and deposition (as measured through elevation, slope, and flow accumulation rasters).
H0 – The elemental composition of Tlacolula Valley Clays may be poorly predicted by both ASTER imagery and DEM-derived variables due to (1) spectral interferences from land-cover such as vegetation and pavement, and (2) elevation, slope, and flow accumulation may be inadequate proxy measures for other factors affecting sediment lithology and chemical alteration.
Approaches
Four statistical methods were used to evaluate the hypotheses listed above:
- Moran’s Index, to assess spatial auto-correlation for each element in the Tlacolula clays.
- Principal Components Analysis (PCA), to reduce the dimensionality of the ASTER data from 13 bands to 4 Principal Components (ASTPC1 – ASTPC4).
- Exploratory Multiple Regression (EMR), to identify an optimal combination of dependent variables for prediction of each element using global, least squares regression.
- Geographically Weighted Regression (GWR), to examine local deviations from global models that may result in non-stationarity in the data.
See my earlier blog-post on multiple regression for details regarding my use of Principal Components Analysis, Exploratory Multiple Regression, and Geographically Weighted Regression for analysis of these data.
Results
Moran’s I values for elemental concentrations of Al, Ca, Na, K, and Fe are highly significant for clay sampling locations in the Tlacolula Valley, supporting our first hypothesis that concentrations of each element are spatially auto-correlated, as one might expect if elemental clay chemistry is largely driven by parent material.
To clarify which factors were most related to concentrations of each element, a series of Exploratory Multiple Regressions were run using all four ASTER PCs, Elevation, Slope, and Flow Accumulation as dependent variables. Exploratory Multiple Regression is a brute-force method for identifying optimal combinations of predictor variables in complex multivariate data that works by assessing the relative fit of all possible combinations of variables.
Table 1 shows the final models selected for each of our five elements. Four things are important to note:
- All elements are predicted using a combination of ASTER PCs and DEM-derived variables, confirming that both parent material and soil formation processes influence Tlacolula Valley clay chemistry;
- Each element was best predicted using a different combination of variables;
- Elevation was not included in final models for any of the five elements; and
- While each model is statistically significant at α = 0.05, they have relatively weak measures of fit.
Table 1: Final least squares models selected for each element using Exploratory Multiple Regression.
A possible reason for the poor fit of the EMR models is that they carry the assumption of spatial stationarity. And yet, we already know from our Moran’s I tests that each of response variables is spatially auto-correlated. Geographically Weighted Regression is a good post-hoc test for evaluating whether local deviations from global fit are an issue for OLS models. Table 2 shows quasi-global measures of fit for GWR models relative those of the final OLS models selected through Exploratory Multiple Regression. Each shows slight to moderate improvement over its least squares counterpart, but what is really of interest are measures of local fit.
Table 2: Comparison of quasi-global measures of fit for GWR and OLS models predicting concentrations of five elements in Tlacolula clays. Predictor variables are those selected for final OLS models using Exploratory Multiple Regression.
Space prevents an in-depth discussion of each element, but a couple examples should be sufficient to illustrate my argument. Results of a GWR for Al show much stronger measures of local fit for the western side of the study area than the east (Figure 4), suggesting a strong lack of stationarity in the data. Results for Ca are remarkably similar (Figure 5), showing nearly identical patterns of strong local fit in the west, and poor local fit in the east. As discussed in my bog-post on multiple regression, my interpretation is that these result from strong variability in both predictor and response variables in the west relative to the east. A map of raw Ca concentrations for each clay sampling location (Figure 6) shows a number of extreme values in the west resulting from the association with limestone in these locations. There are also however a number of much lower, more normative values just downslope on the floodplain where sediments are derived from mixed lithologies. That is, in the western part of the study area there is high variability in both Ca and its predictor variables.
We can see something similar in the results of a GWR for Fe. Measures of local fit are highest in the west and abysmal in the central part of the study area on the Rio Salado floodplain (Figure 7). Again, I would argue that this is due to high levels of variability in Fe concentrations in the west related to localized differences in parent material, slope, and position in the stream network (Figure 8). In the Rio Salado floodplain, by contrast, there is little local variability in either slope as a predictor variable or Fe as a response.
High variability in both predictor and response variables will not in all cases result in a strong local or global fit. There must also be a relationship between the predictor variables and the response. However, an absence of local variability in the response variable due to spatial auto-correlation will inevitably result in lower measures of local fit.
Figure 4 (above): Local fit of GWR model for Aluminum.
Figure 5 (above): Local fit of GWR model for Calcium.
Figure 6 (above): Calcium concentrations in ppm. Size of bubbles is proportional to elemental concentrations.
Figure 7 (above): Local fit of GWR model for Iron.
Figure 6 (above): Iron concentrations in ppm. Size of bubbles is proportional to elemental concentrations.
Significance
While this is an exploratory study, and the results are preliminary, our work so far suggests that there are significant relationships between elemental clay chemistry, sediment lithology, and processes of erosion and deposition in the Tlacolula Valley. Furthermore, it shows that these relationships can be assessed using remotely sensed data derived from ASTER imagery. While the strength of these relationships varies both spatially and between individual elements, this analysis provides a view to the complex factors influencing clay chemistry in the Valley of Oaxaca.
For archaeologist working in Oaxaca, this is significant because it informs a greater understanding of the factors influencing the spatial distribution of compositionally distinct clay resources that would have been available to pottery producers. Similar methods could be employed in studies of ceramic exchange for any region with similarly complex geology.
Outside of archaeology, these methods could conceivably be used by anyone interested in how the admixture of sediments derived from multiple parent materials influences the elemental composition of alluvial soils at a regional scale. Concentrations of elements such as K, Na, and P may have an enormous effect on agricultural productivity from field to field, and this is but one example. One can imagine a range of other potential applications in Forestry, Botany, Geology, Soil Science, and Public Health. The sky is the limit.
Lessons Learned – Software
For this project, I used a combination of ENVI (for processing of the ASTER imagery), ARCGIS (for spatial data manipulation, raster calculation, Moran’s I, and Geographically Weighted Regression), and JMP (for calculation of non-spatial statistics, including Principal Components Analysis and Exploratory Multiple Regression). I was already familiar with these software, but I would be interested in any suggestions for other programs or packages that could improve these analyses.
I did get to learn how to make a Powerpoint into a Youtube video for my final presentation, which will be invaluable in my work as an ecampus instructor.
Lessons Learned – Statistics
Before this project I was unfamiliar with geographically weighted regression. After some initially promising results, I was forced to grapple with what these models were really telling me, and whether I could (in good faith) use them to generate predicted surfaces in the way that one might with a strong OLS model. I ultimately concluded that this would be inappropriate with my data and that Geographically Weighted Regression is perhaps better used as a post-hoc test for stationarity in a least squares model. That said, I recognize that this is not always how it is employed, and that other members of the class have tried to take it further. I have written about this at length in a previous blog-post, but am no more an authority on this subject than they are and would be interested in any other thoughts on the subject.