The research question and Background
Soil microbes are critically important for soil to function, because they provide the functional backbone for many soil ecosystem services such as the carbon and nitrogen cycles. The structure and composition of microbial communities can have a large influence on the rates at which these ecosystem services can be performed; therefore, determining the spatial distribution of these microbial communities is necessary to establish accurate functionality of soils across a landscape. However, this can be very difficult due to the fact that microbial communities are extremely diverse. In one gram of soil there can be over a billion microbes with anywhere from 4,000 to 50,000 different species. There are two different potential forces governing the spatial distribution of microbial communities. The first theory, first popularized by Baas and Becking (1934), stated that “Everything is everywhere but, the environment selects”. This idea is in line with the classical ecology niche theory, in which species distribution is dictated by their niche. The opposite of this paradigm is the idea that microbes are more limited by dispersal potential then environmental selection factors. This idea reflects the ideas of neutral theory, which states that species distributions are completely stochastic, and at any given location, the species composition is determined by the species composition of its neighboring areas.
Until recently, these theories have not really been tested in microbial communities. However, with the advent of high throughput sequencing in the 1990s, an in situ view of microbial communities became possible. In a recent study, O’Brien et al. (2016) determined that at a fine scale (1cm2) community composition was extremely heterogeneous and correlated weakly with environmental factors; however, as the scale and scope increased, patterns in microbial community distribution began to emerge and were closely linked to environmental parameters. This study illustrates that when examining microbial communities, the spatial scale of the study will influence your results.
In my study, we want to know which principles govern microbial distributions across a regional landscape. Are microbial communities across a regional landscape influenced by environmental parameters, or is their distribution dictated more by dispersal limitation? There has been some work on a similar sized scale that has determined that bacterial diversity is more closely linked to environmental parameters then dispersal limitation. Specifically studies from all over the world have found the most influential environmental factor to be soil pH (Fierer and Jackson 2006, Griffiths et al. 2011).
The Dataset
During the spring of 2015, 113 soil samples were collected across the state of Oregon to capture the spatial pattern of microbial community structure across a regional landscape. Sampling locations were stratified by Common Resource Areas (CRA), which are geographically associated land resource regions based on geology, climate, water, soils, biological resources, and land use. This stratification was used to help capture all of the spatial heterogeneity of the soil across a regional landscape. A subset of this data (90 out of 113) was previously sampled by the NRCS in soil series survey, where an extensive lab work up was conducted to catalog each of these soils’ physical and chemical properties.
Hypothesis
The physical and chemical properties in soil will have a greater influence on microbial communities than spatial variability. In essence, on a regional landscape scale, the factors dictating microbial community composition will be linked more closely to environmental factors in the soil rather than how geographically distant two samples are.
If this is true then we need to be able to quantify the heterogeneity of a landscape of these environmental parameters. In this blog post I will use the 90 sample points from the NRCS to see how well interpolation methods preform when examining the spatial heterogeneity of edaphic properties, using soil pH as a proxy. To examine the accuracy of these interpolation methods we use the SSURGO database of soil pH as our truth. This will allow me to compare the interpolation techniques and determine their strength and weaknesses.
Approaches
The first method used was spatially extrapolating pH point data over their relative CRA polygon. In theory since these CRA polygons were delineated by similar environmental factors, the edaphic properties such as soil pH should be conserved within each polygon.
Inverse Distance Weighted (IDW) Interpolation method was also attempted. This method essentially determines the value of any unknown point by taking weighted averages of all known points, where the weights are determined by their distances to the unknown point.
The last interpolation method used was ordinary kriging. Just as IDW only considers distances between points for interpolation, so does kriging; however, unlike IDW kriging incorporates the fact that after a certain distance there is no more useful information to explain the variation at a given point, determined through the variogram. Therefore, kriging uses a variogram model to determine the maximum distance from any given point that information should come from. The figure below shows the relationship between the variance explained and distance from the given point. The fitted line is the model used in the ordinary kriging interpolation and how the weights for the average are determined.
After interpolation, these maps were then compared to the SSURGO map to test their performance. This comparison was done by finding the absolute difference at each pixel and summing the variance observed. Since the SSURGO map was not completely finished for all of Oregon, each of the interpolated maps were clipped to same spatial extent as the SSURGO map.
Results
The map shown below is the soil pH values extrapolated over its respective CRA. You can see the Willamette Valley is quite homogeneous in its soil pH values while Oregon’s Eastern side is quite a bit more heterogeneous. The problem with this map is it draws sharp lines between CRA polygons where in theory the real pH values would have a continuum as you moved from one CRA to another.
The IDW map seemed to show the same pattern in the Willamette valley; however, the heterogeneity of eastern side of Oregon seems to be blended together. The IDW interpolation also seemed to develop pockets of like pH values which look more like artifacts of the IDW algorithm than real characteristics of the landscape.
Like IDW, ordinary kriging also blended the heterogeneity of the eastern side of Oregon together; however, it did so without creating many of these artificial pockets that the IDW method seemed to create.
The main quantitative difference between the SSURGO map and the interpolated maps can be seen in the western part of Oregon. The SSURGO map shows a much higher level of heterogeneity compared to the interpolated maps as seen in the Willamette Valley. It also shows a large amount of heterogeneity in the southeastern side of Oregon that was only captured in the CRA map.
As expected, the IDW map performed the least favorably (a variance score of 2676) while both the kriging and CRA map had very similar results (variance scores of 2440 and 2414, respectively). This is quite surprising since the kriging map seemed to have a homogenizing effect on soil pH in eastern Oregon which was present in both the CRA map and the SSURGO map. Below is a map of the difference in pH between the CRA map and the SSURGO values of pH. You can see the key areas in which the CRA failed were areas of high heterogeneity of the SSURGO map.
Significance
Through this analysis we have determined that across this landscape there is a large amount of heterogeneity in edaphic properties. There may be significantly correlated parameters that have not been captured in the SSURGO database. If so, it may be inappropriate to interpolate these values using just our sampling points.
Finally, if it is determined that geographic distance is the best proxy for determining microbial community distribution at this scale, an interpolated map of the community needs to be presented in a way in which the audience understands where the map has strong areas of success and areas in which we are quite sure the map is wrong.
What I Learned: Software
The majority of the project was conducted in R’s statistical environment, which is due to several reasons. First, I was already familiar with the programming language. Moreover, R is quite powerful in its data manipulation capability and its ability to execute complex statistics in its base functions. Because R is also completely free and availability to anyone, it allows me to conduct my analyses without relying on a paid programs. The main drawback to this particular statistical software is it has a rather steep learning curve which made AcrGIS more preferable in some aspects. I found ArcGIS rather useful in its ability to quickly display results and data with minimal effort. It is able to quickly display a large amount of information very quickly in a user-friendly way. However, when attempting statistical interpolation methods in ArcGIS, I felt rather limited in its customizability and was completely ignorant of the underlying processes it was doing during interpolation.
R on the other hand requires more work upfront to create and display raster and vector files; however, once the data is in, manipulating it and doing analytical analysis on it was quite straightforward. When conducting spatial analysis, R is transparent; each function comes with its own documentation about what exactly is happening during the execution of the function. It also requires the users to have a working understanding of the basic methodology when doing spatial analysis, which is always a good thing.
All in all, both programs have their niches and best uses; however, one should not feel limited to just one. Exploiting the strengths of each program and bypassing their relative weakness by shifting to different platforms should be everyone’s goal when conducting any sort of analysis including spatial analysis.
What I Learned: Spatial Statistics
Before joining this class I had a very limited understanding of spatial statistics. Through the class exercises and a lot of reading I developed a comfortable understanding of the basic idea behind the process used in several different spatial statistic methods. Some of these models include IDW, kriging, geographically weighted regression, PCA ordination, and boosted regression tree. However, I feel the most important lessons I learned how to critically interpret the results of these analysis such as geographically weighted regression in ways that make intuitive sense.
Work Cited
Fierer, Noah, and Robert B. Jackson. “The Diversity and Biogeography of Soil Bacterial Communities.” Proceedings of the National Academy of Sciences of the United States of America 103, no. 3 (January 17, 2006): 626–31. doi:10.1073/pnas.0507535103.
Griffiths, Robert I., Bruce C. Thomson, PHillip James, Thomas Bell, Mark Bailey, and Andrew S. Whiteley. “The Bacterial Biogeography of British Soils.” Environmental Microbiology 13, no. 6 (June 1, 2011): 1642–54. doi:10.1111/j.1462-2920.2011.02480.x.
O’Brien, Sarah L., Sean M. Gibbons, Sarah M. Owens, Jarrad Hampton-Marcell, Eric R. Johnston, Julie D. Jastrow, Jack A. Gilbert, Folker Meyer, and Dionysios A. Antonopoulos. “Spatial Scale Drives Patterns in Soil Bacterial Diversity.” Environmental Microbiology, March 1, 2016, n/a-n/a. doi:10.1111/1462-2920.13231.