Final Post 2016 Archives - Page 2 of 2 - GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Background

Bluebunch wheatgrass is an important native species used extensively in the restoration of Great Basin habitats. Bluebunch wheatgrass has also been found to be locally adapted to climate (St. Clair et al. 2013). Seed transfer zones were developed for bluebunch wheatgrass to reduce the chances of maladaptation. Maladaptation can stem from the movement of locally adapted bluebunch wheatgrass phenotypes to differing climates where they are not likely to thrive.

Although climate clearly plays a large role in the success of this, and other plant species, it is unclear to what extent local soils may also influence genetic differentiation at the seed source population level. As more and more seed transfer zones are constructed for native species, it is important to consider all factors that may place selective pressure on plant populations. It follows that an understanding of soil-plant dynamics within the context of seed zones will help to further improve the seed transfer zone delineation process and lead to greater success in restoration efforts in general.

Because of the inherent complexity of soils, the significant effort involved in mapping them accurately, and their heterogeneous distribution in space, soils traits have been excluded from the already complex method of seed zone delineation. The focus of this study is to explore the body of soils data that exists for the Great Basin region and determine if the tools used to construct climate-based seed zones (such as existing common garden experiments) might also be used to better understand soil-plant relationships.

Research Question

Do seed-source population phenotypic traits of bluebunch wheatgrass vary with one or many soil traits that exist in the Great Basin?

Datasets

The seed zones for bluebunch wheatgrass have been delineated into shapefiles. Each polygon represents a seed zone. The boundaries of these polygons were delineated using ArcMap spatial analysis of climate (precipitation / temperature), elevation, and plant traits (leaf length, leaf width, crown width etc.). This dataset / map was developed in 2013 and is based on two years of data collection at common-garden sites in the Great Basin.
The Phenotypic trait dataset contains measurements of individual plants at sixteen different common gardens spread throughout the study area. Each seed zone is represented by two common gardens, and each common garden contains 4-5 populations from each seed zone. Measurements such as leaf length, width, and reproduction stage scores were gathered from each individual plant in all sixteen gardens. Each population of bluebunch wheatgrass growing in the common gardens is associated with a collection site (UTM / elevation / approximate area in acres). Each common garden also has an associated (UTM, elevation / and dimensions in meters).

Figure 1. Map of Study Area and Seed Zones for Bluebunch Wheatgrass

3. “The gSSURGO (soils) database is derived from the official Soil Survey Geographic (SSURGO) database. SSURGO generally has the most detailed level of soil geographic data developed by the National Cooperative Soil Survey (NCSS) in accordance with NCSS mapping standards. The tabular data represent the soil attributes and are derived from properties and characteristics stored in the National Soil Information System (NASIS). The gSSURGO data were prepared by merging the traditional vector-based SSURGO digital map data and tabular data into statewide extents, adding a statewide gridded map layer derived from the vector layer, and adding a new value-added look up table (valu) containing “ready to map” attributes. The gridded map layer is in an ArcGIS file geodatabase in raster format. The raster and vector map data have a statewide extent. The raster map data have a 10-meter cell size that approximates the vector polygons in an Albers Equal Area projection. Each cell (and polygon) is linked to a map unit identifier called the map unit key. A unique map unit key is used to link the raster cells and polygons to attribute tables” (Soil survey staff 2015).

Objective

My objective is to identify important soils characteristics at seed source locations that influence phenotypic traits in bluebunch wheatgrass. A secondary objective is to generate hypotheses about how soil traits may influence genetic differentiation in bluebunch wheatgrass. If I find correlations between bluebunch wheatgrass phenotypes and local soils, then soils and not just climate, may require more serious consideration during seed zone delineation. If I am unable to correlate bluebunch wheatgrass phenotypes with soils characteristics, then I will conclude that soils are (most likely) not a dominant factor in the genetic differentiation of this species.

Approaches

Hot Spot analysis in ArcMap of soil order and various bluebunch wheatgrass traits
Geographically weighted regression in ArcMap of available water storage of seed-source soils and bluebunch wheatgrass phenotypic traits
PCA – To visualize differences among bluebunch wheatgrass populations in phenotypic trait space
MRPP* – To test the significance of the similarity in phenotypic traits between soil classifications

Hotspot Analysis Results:

Figure 2. Map of soil orders and hot spots of crown width in a common garden in zone 7 (cool and wet)

Figure 3. Map of soil orders and hot spots of crown width in a common garden in zone 1 (hot and dry)

Discussion:

Based on these results there is not strong evidence of clustering in crown width in populations from either common gardens. Although not shown here, I also performed hot spot analysis on leaf length, width, and length-width ratio and produced similar results.

These results are not surprising given the dispersed geographic distribution of the populations selected for each common garden. Furthermore, in many cases the populations of bluebunch wheatgrass selected to represent a seed zone were spatially dispersed as well. Although many of the points are within relatively close proximity, they are usually residents of different seed zones and climate regimes with different growth habits. For this reason it is not surprising that the observed phenotypic traits do not group spatially. In areas where some indication of clustering occurs, it is likely that two populations from the same seed zone happen to also be close in proximity.

Spatially Weighted Regression Results:

Figure 4. Regression Coefficients of Leaf Width in terms of Available Water Storage (upper 25cm) for Populations Grown at the Wahluke Common Garden

Figure 5. Regression Coefficients of Leaf Width in terms of Available Water Storage (upper 25cm) for Populations Grown at the Wahluke Common Garden

Figure 6. Regression Coefficients of Leaf Length in terms of Available Water Storage (upper 25cm) for Populations Grown at the Wahluke Common Garden

Darker blue points indicate more negative coefficients. Pale blue points indicate a very slight negative correlation between AWS and leaf length.

Figure 7. Local R-squared Values of Regression Coefficients for Leaf length in terms of AWS for Populations Grown at the Wahluke Common Garden.

Interpretation

In figure 4 above, two of the populations with negative coefficients also tend to occur in moderately wet (AWS 3.7-6.8) soils. In figure 5, positive relationships exist in moderately wet to dry areas (AWS 2.5-6.8). Additionally in both figure 4 and 5, negative coefficients tend to be near other negative coefficients while positive coefficients also tend to be near other positive coefficients.

In figure 6, all of the regression coefficients were negative suggesting that to varying degrees, leaf length is negatively correlated with AWS. When compared to the local R-squared values in figure 7, it appears that the more strongly negative coefficients were also commonly associated with higher R-squared values.

PCA / MRPP Results:

Figure 8. Principle Component Analysis of Soil Order in Trait Space

Interpretation

The individual triangles represent each individual plant in each of the sixteen common gardens. Triangles that are closer in space are more similar in terms of phenotypic traits. The color-coding and enclosing polygons around the triangles represent soil orders. The radiating blue lines are indicators of the strength and direction of the relationship between each phenotypic trait and either of the principle components.

Table 1. Multiple Response Permutation Procedure (MRPP) results for the PCA above.

Interpretation

The results of the MRPP indicate that there is a significant difference in bluebunch wheatgrass phenotypic traits when grouped according to Mollisols and Aridisols. In addition there is a moderately significant result for Aridisols versus Inceptisols. The low A statistic however indicates that there is very little within-group agreement (meaning that there is high trait variance within all of the soil orders). This suggests that although the differences we see are not likely attributable to chance, grouping by soil order is at best a weak predictor of phenotypic traits in bluebunch wheatgrass.

Significance

Although the results of my analysis are modest at best, it is important to recognize that bluebunch wheatgrass phenotypic traits may not be adapted to soils, or that the soil traits I have elected to analyze for this course are not strong drivers of selection. If no strong relationships between soils and local phenotypes emerge through this exploratory study then my work will support the current methodology in seed zone delineation which excludes soils.

Learning: Software

ArcMap

A huge hurdle that I overcame was learning how to utilize the gSSURGO soils database and extract relevant data for analysis in ArcMap as well as other programs. Now that I have spent considerable time working with this immense dataset I am much more confident in my ability to continue to analyze interesting soil-phenotype relationships in the future.

I also learned how to perform a hot spot analysis and a geographically weighted regression. Although being limited to the analysis of one common garden and phenotypic trait at a time does not provide particularly helpful results, I do find it very valuable to understand.

Learning: Statistics

ArcMap

This was my first exposure to geographically weighted regression and the process of mapping the regression coefficients and their associated R-squared values enhanced my ability to interpret the results.

PC-Ord

The process of interpreting figure 8 was very powerful for me. As I learn more and more about PCA and other multivariate approaches to statistical analysis, I am encouraged that they might be powerful tools for meeting my research objectives.

Other

Although I didn’t have time to gain personal experience with Kriging, wavelet analysis, or other tools that my classmates used, I learned a lot from their presentations about how to interpret the results and problem solve with them.

Note – This is largely an edited transcript of my final presentation, which was pre-recorded and uploaded to the internet as a video-blog. You can see it here.

Background

The Valley of Oaxaca, Mexico was home to one of the first civilizations in the New World, the Zapotec State. Founded around 500 BC with the construction of the hill-top city of Monte Albán, the Zapotec State persisted for more than 1000 years before finally dissolving around AD 850. My research focuses on pottery production and exchange at the height of Zapotec Civilization during the Late Classic (AD 550 – 850), when we suspect that large secondary centers such Dainzú-Macuilxóchitl and Jalieza were beginning to break away from the capital at Monte Albán.

Microsoft PowerPoint - Pink_MacuilTanSack2016.pptx

Figure 1: Late Classic settlement patterns in the Valley of Oaxaca, showing major sites mentioned in the text.

Our study of pottery exchange networks is intended to address how politically integrated these communities were by assessing their degree of economic interaction. To do this, we analyze elemental composition of ancient pottery recovered from excavations at these sites, and compare this to similar data for clay samples collected from across the region during the Oaxaca Clay Survey in 2007 and 2012. Figure 2 shows clay sampling locations relative to regional geology. As you can see, the geology of the area is quite complex, contributing to stark differences in the elemental composition of clay resources from different areas. The majority of our samples however were collected from the alluvial sediments from the valley floor, where sediments derived from multiple parent materials have mixed and altered through soil formation processes. Yet we know little about how interactions between these factors influence the elemental clay composition. This is important because we cannot sample every clay deposit in the region, but wish to identify possible production locales between our sampling locations.

OaxacaGeo

Figure 2: Oaxaca Clay Survey sampling locations relative to regional geology.

Research Questions

How do concentrations of individual elements in Oaxaca clays pattern spatially across the region?
How are concentrations of individual elements related to environmental factors such as parent material, soil development, and the transport of sediment through alluvial systems?

Because this is an exploratory study, analyses will be limited to five major and minor elements measured for 137 clay samples collected from the eastern, Tlacolula arm of the valley. These elements include aluminum (Al), calcium (Ca), sodium (Na), Potassium (K), and iron (Fe).

Data

This study relies on three datasets:

Measurements of aluminum (Al), calcium (Ca), sodium (Na), potassium (K), and iron (Fe) concentrations in 137 clay samples form the Tlacolula Valley. These data will act as response variables in regression analyses.
Spectral reflectance and emissivity data for 13 bands of ASTER imagery, as measured at each clay sampling location during a cloud-free period in June of 2001 at a band-dependent spatial resolution of 15 to 90m (Figure 3). These data will act as proxy measures of surface lithology and serve as one set of predictor variables for clay chemistry.
Elevation, slope, and flow accumulation estimates for each clay sampling location derived from a 90m DEM covering the Tlacolula Valley. Elevation and slope are good proxy measures for the differential processes affecting alluvial and residual soils, while flow accumulation provides a direct measure of a given location in the valley’s stream network. These data will act as a second set of predictor variables for clay chemistry.

Figure 3: Oaxaca Clay Survey sampling locations over ASTER image of the Tlacolula Valley

Hypotheses

H₁ – Concentrations of individual elements will be spatially auto-correlated due to their association with different, spatially segregate parent materials.

H₂ – The elemental composition of residual and alluvial clay deposits in the Tlacolula Valley, Oaxaca, Mexico, will be strongly conditioned by geologic parent material (as measured through spectral reflectance and emissivity in ASTER imagery) in both alluvial and residual soils.

H₃ – The elemental composition of clay deposits will be further affected by processes of erosion and deposition (as measured through elevation, slope, and flow accumulation rasters).

H₀ – The elemental composition of Tlacolula Valley Clays may be poorly predicted by both ASTER imagery and DEM-derived variables due to (1) spectral interferences from land-cover such as vegetation and pavement, and (2) elevation, slope, and flow accumulation may be inadequate proxy measures for other factors affecting sediment lithology and chemical alteration.

Approaches

Four statistical methods were used to evaluate the hypotheses listed above:

Moran’s Index, to assess spatial auto-correlation for each element in the Tlacolula clays.
Principal Components Analysis (PCA), to reduce the dimensionality of the ASTER data from 13 bands to 4 Principal Components (ASTPC1 – ASTPC4).
Exploratory Multiple Regression (EMR), to identify an optimal combination of dependent variables for prediction of each element using global, least squares regression.
Geographically Weighted Regression (GWR), to examine local deviations from global models that may result in non-stationarity in the data.

See my earlier blog-post on multiple regression for details regarding my use of Principal Components Analysis, Exploratory Multiple Regression, and Geographically Weighted Regression for analysis of these data.

Results

Moran’s I values for elemental concentrations of Al, Ca, Na, K, and Fe are highly significant for clay sampling locations in the Tlacolula Valley, supporting our first hypothesis that concentrations of each element are spatially auto-correlated, as one might expect if elemental clay chemistry is largely driven by parent material.

To clarify which factors were most related to concentrations of each element, a series of Exploratory Multiple Regressions were run using all four ASTER PCs, Elevation, Slope, and Flow Accumulation as dependent variables. Exploratory Multiple Regression is a brute-force method for identifying optimal combinations of predictor variables in complex multivariate data that works by assessing the relative fit of all possible combinations of variables.

Table 1 shows the final models selected for each of our five elements. Four things are important to note:

All elements are predicted using a combination of ASTER PCs and DEM-derived variables, confirming that both parent material and soil formation processes influence Tlacolula Valley clay chemistry;
Each element was best predicted using a different combination of variables;
Elevation was not included in final models for any of the five elements; and
While each model is statistically significant at α = 0.05, they have relatively weak measures of fit.

Table 1: Final least squares models selected for each element using Exploratory Multiple Regression.

A possible reason for the poor fit of the EMR models is that they carry the assumption of spatial stationarity. And yet, we already know from our Moran’s I tests that each of response variables is spatially auto-correlated. Geographically Weighted Regression is a good post-hoc test for evaluating whether local deviations from global fit are an issue for OLS models. Table 2 shows quasi-global measures of fit for GWR models relative those of the final OLS models selected through Exploratory Multiple Regression. Each shows slight to moderate improvement over its least squares counterpart, but what is really of interest are measures of local fit.

Table 2: Comparison of quasi-global measures of fit for GWR and OLS models predicting concentrations of five elements in Tlacolula clays. Predictor variables are those selected for final OLS models using Exploratory Multiple Regression.

Space prevents an in-depth discussion of each element, but a couple examples should be sufficient to illustrate my argument. Results of a GWR for Al show much stronger measures of local fit for the western side of the study area than the east (Figure 4), suggesting a strong lack of stationarity in the data. Results for Ca are remarkably similar (Figure 5), showing nearly identical patterns of strong local fit in the west, and poor local fit in the east. As discussed in my bog-post on multiple regression, my interpretation is that these result from strong variability in both predictor and response variables in the west relative to the east. A map of raw Ca concentrations for each clay sampling location (Figure 6) shows a number of extreme values in the west resulting from the association with limestone in these locations. There are also however a number of much lower, more normative values just downslope on the floodplain where sediments are derived from mixed lithologies. That is, in the western part of the study area there is high variability in both Ca and its predictor variables.

We can see something similar in the results of a GWR for Fe. Measures of local fit are highest in the west and abysmal in the central part of the study area on the Rio Salado floodplain (Figure 7). Again, I would argue that this is due to high levels of variability in Fe concentrations in the west related to localized differences in parent material, slope, and position in the stream network (Figure 8). In the Rio Salado floodplain, by contrast, there is little local variability in either slope as a predictor variable or Fe as a response.

High variability in both predictor and response variables will not in all cases result in a strong local or global fit. There must also be a relationship between the predictor variables and the response. However, an absence of local variability in the response variable due to spatial auto-correlation will inevitably result in lower measures of local fit.

GWR_Al

Figure 4 (above): Local fit of GWR model for Aluminum.

Figure 5 (above): Local fit of GWR model for Calcium.

Figure 6 (above): Calcium concentrations in ppm. Size of bubbles is proportional to elemental concentrations.

Figure 7 (above): Local fit of GWR model for Iron.

Figure 6 (above): Iron concentrations in ppm. Size of bubbles is proportional to elemental concentrations.

Significance

While this is an exploratory study, and the results are preliminary, our work so far suggests that there are significant relationships between elemental clay chemistry, sediment lithology, and processes of erosion and deposition in the Tlacolula Valley. Furthermore, it shows that these relationships can be assessed using remotely sensed data derived from ASTER imagery. While the strength of these relationships varies both spatially and between individual elements, this analysis provides a view to the complex factors influencing clay chemistry in the Valley of Oaxaca.

For archaeologist working in Oaxaca, this is significant because it informs a greater understanding of the factors influencing the spatial distribution of compositionally distinct clay resources that would have been available to pottery producers. Similar methods could be employed in studies of ceramic exchange for any region with similarly complex geology.

Outside of archaeology, these methods could conceivably be used by anyone interested in how the admixture of sediments derived from multiple parent materials influences the elemental composition of alluvial soils at a regional scale. Concentrations of elements such as K, Na, and P may have an enormous effect on agricultural productivity from field to field, and this is but one example. One can imagine a range of other potential applications in Forestry, Botany, Geology, Soil Science, and Public Health. The sky is the limit.

Lessons Learned – Software

For this project, I used a combination of ENVI (for processing of the ASTER imagery), ARCGIS (for spatial data manipulation, raster calculation, Moran’s I, and Geographically Weighted Regression), and JMP (for calculation of non-spatial statistics, including Principal Components Analysis and Exploratory Multiple Regression). I was already familiar with these software, but I would be interested in any suggestions for other programs or packages that could improve these analyses.

I did get to learn how to make a Powerpoint into a Youtube video for my final presentation, which will be invaluable in my work as an ecampus instructor.

Lessons Learned – Statistics

Before this project I was unfamiliar with geographically weighted regression. After some initially promising results, I was forced to grapple with what these models were really telling me, and whether I could (in good faith) use them to generate predicted surfaces in the way that one might with a strong OLS model. I ultimately concluded that this would be inappropriate with my data and that Geographically Weighted Regression is perhaps better used as a post-hoc test for stationarity in a least squares model. That said, I recognize that this is not always how it is employed, and that other members of the class have tried to take it further. I have written about this at length in a previous blog-post, but am no more an authority on this subject than they are and would be interested in any other thoughts on the subject.

GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Just another blogs.oregonstate.edu site

Category Archives: Final Post 2016

An Exploratory Analysis of Genetic Differentiation in Bluebunch Wheatgrass Seed Source Populations According to Soil Traits in the Great Basin

Background

Research Question

Do seed-source population phenotypic traits of bluebunch wheatgrass vary with one or many soil traits that exist in the Great Basin?

Datasets

Figure 1. Map of Study Area and Seed Zones for Bluebunch Wheatgrass

Objective

Approaches

Hotspot Analysis Results:

Figure 2. Map of soil orders and hot spots of crown width in a common garden in zone 7 (cool and wet)

Figure 3. Map of soil orders and hot spots of crown width in a common garden in zone 1 (hot and dry)

Discussion:

Spatially Weighted Regression Results:

Figure 4. Regression Coefficients of Leaf Width in terms of Available Water Storage (upper 25cm) for Populations Grown at the Wahluke Common Garden

Figure 5. Regression Coefficients of Leaf Width in terms of Available Water Storage (upper 25cm) for Populations Grown at the Wahluke Common Garden

Figure 6. Regression Coefficients of Leaf Length in terms of Available Water Storage (upper 25cm) for Populations Grown at the Wahluke Common Garden

Figure 7. Local R-squared Values of Regression Coefficients for Leaf length in terms of AWS for Populations Grown at the Wahluke Common Garden.

Interpretation

PCA / MRPP Results:

Figure 8. Principle Component Analysis of Soil Order in Trait Space

Interpretation

Table 1. Multiple Response Permutation Procedure (MRPP) results for the PCA above.

Interpretation

Significance

Learning: Software

ArcMap

Learning: Statistics

ArcMap

PC-Ord

Other

Evaluating Environmental Predictors of Elemental Clay Chemistry in the Tlacolula Valley, Oaxaca, Mexcio

Contact Info