Tag Archives: Final Project

Courtney’s Final Project Post

Research Question

  • “How is the spatial pattern of ion and isotope concentrations in wells tapping the basalt aquifer related to the spatial pattern of mapped faults via the mechanism of groundwater flow as determined by hydraulic transmissivity of the geologic setting?”

Description of dataset that I examined

  • A: In my research I have analytical data for 31 wells, whose XY locations were determined by field confirmation of the Oregon Water Resource Department (OWRD) well log database. As groundwater is a 3D system, I have to consider Z values as well. The well depths and lithology information are also from the OWRD database. My analytical data provides a snapshot of water chemistry during the summer of 2018. I have only one temporal data point per well. At all 31 wells, I collected samples to be analyzed for pH, temperature, specific conductivity, oxygen isotopes 16 and 18, and hydrogen isotopes 1 and 2. At a subset of 18 of those wells I collected additional samples for tritium, carbon 14, and major ion analysis.
  • B: The shapefile of faults mapped at the surface was created by Madin and Geitgey of the USGS in their 2007 publication on the geology of the Umatilla Basin. There is some uncertainty in my analysis as far as extending this surface information into the subsurface. USGS studies have constrained proposed ranges of dip angles for the families of faults that I am studying, but not exact angles for any single mapped fault.
  • C: results of pumping interference tests involving 29 wells, 12 of which I had chemical data for. The data was collected by the OWRD in 2018 and 2019.

Hypotheses

  • Faults can act as conduits or barriers to groundwater flow, depending on how their transmissivity compares to the transmissivity of the host rock.
  • I hypothesize that clusters of similar chemical and isotopic properties of groundwater can indicate a shared aquifer unit/compartment, and that if faults separate clusters then the fault is influencing that difference in chemical/isotopic signatures. If the fault is between two clusters, I hypothesize that it is acting as a barrier. If it crosses through a cluster, I hypothesize that it acts as a conduit.
  • Where faults act as barriers, I hypothesize that parameter values will differ in groups on either side of a fault. Specifically, a barrier fault might cause older, warmer water to rise into upper aquifer layers, and the downstream well might show a signature of more local recharge.
  • Where faults act as conduits, I hypothesize that water chemistry and isotopes of samples from wells on either side of the fault would indicate a relatively direct flowpath from the upstream well to a downstream well. Over a short distance, this means that ion and isotope concentrations would not differ significantly in wells across the fault.
  • My hypotheses depend on a “barrier” fault damming groundwater flow up-gradient of the fault, and compartmentalizing local recharge on the down-gradient side. This hypothesis is only relevant if the fault is roughly perpendicular to the flow direction, and so disrupting transmissivity between a recharge zone and the wells. If a fault that separates two wells is parallel to the flow direction and there is no obstacle between the wells and upstream recharge areas, then the fault might indeed limit communication between the wells but they will have similar chemical signatures. Wells separate by this second kind of fault barrier would be better evaluated by a physical test of communication, such as a pumping interference test.

Analysis Approaches

  • Principal component analysis: used to simplify the multivariate data set (19 variables!) into variable relationships that could represent aquifer processes
  • Analysis of PCA results compared to distance from a flow path
    • Interpolation of well water levels classified by well stratigraphy to estimate a potentiometric surface and evaluate groundwater flow directions.
    • Raster calculations to compare flow direction to fault incidence angle
    • Measuring distance from each well to the nearest fault along the flow path
    • simple linear regression, comparing Non-ion PC1 score of a well with its distance from a fault.
  • Two-sided T-tests comparing distance between wells, presence of a fault, and pumping interference test communication between wells
  • ANOVA F-tests comparing chemical and isotopic variance within groups of wells that communicate with each other and between those groups.

Results

  • Principal component analysis – Patterns of variation between wells are statistically explained primarily by total ion concentration, a relationship between chemical evolution from ion exchange and decreasing stable isotope ratios, and the combination of well depth and water temperature. Moran’s I indicates that only Non-ion PC2 is spatially clustered, while the other PC models have a distribution not significantly different than random. The other PC models are useful to understand the groundwater system, but not specifically to analyze clustering correlated to faults.
  • Interpolation of water level, and comparison of fault incidence angle with flow direction, indicates faults that are and are not able to be tested by my hypotheses.
  • Analysis of PCA results compared to distance from a fault along flow path – some wells that are “within” a fault zone have very old signatures and others have very young signatures. This could be related to the angle of the dip of the fault and the accuracy of mapping compared to the depth of the well. I hypothesize that the wells that are in the fault zone but have high PC1 scores are on the up-gradient side of the fault where older water is upwelling along a barrier. Wells in fault zones with low PC1 scores could indicate wells open to downgradient areas of the fault, where vertical recharge through the fault damage zone is able to reach the well.
  • Returning to the conclusions I wrote in that blog post after I found improved stratigraphic data, I’m not sure if I can make conclusions other than those about the wells are that mapped as “inside” a fault. Several wells that are closer to faults are also open to shallower aquifer units, and so the effect of lower PC1 scores closer downgradient to faults might be confounded by lower PC1 scores caused by vertical inflow from the sedimentary aquifer and upper Saddle Mountain aquifer.
  • Two-sided T-tests comparing distance between wells, presence of a fault, and communication between wells show that the presence of a fault has a greater effect on communication than the distance between the wells.
  • ANOVA F-tests comparing chemical and isotopic variance within groups of wells that communicate with each other and between those groups – stable isotopes and specific conductivity both show more variation between well groups than within well groups.
  • Not covering in these blog posts, I also ran Moran’s I on my inputs to see which ones are clustered and so might be more related to horizontal grouping factors (such as faults) than vertical grouping parameters (such as stratigraphic unit). Of the PCA and individual variables, only d18O, d2H, and Non-ion PC2(combination of well depth and water temperature) were clustered. The other PCA models, temperature, pH, and specific conductivity were not significantly spatially clustered.

Significance –  Groundwater signatures are related to faults agree/disagree with past understandings of differences between wells in the region, and can inform well management. If a senior permit holder puts a call on the water right and asks for junior users to be regulated off, it would not help that senior user if on of those junior permit holders’ wells is not hydraulically connected to the senior users.

  • More wells would need to be sampled to be better able to disentangle the effects of faults from the effects of well stratigraphy.

My learning – I learned significantly more about how to code and troubleshoot in R. Additionally, I learned about the process of performing spatial clustering analysis in ArcGIS.

What did I learn about statistics?

  • PCA was completely new to me, and it’s a cool method for dealing with multivariate data once I dealt with the steep learning curve involved in setting it up and interpreting the results. It was useful getting more practice performing and interpreting t-tests and Anova F-tests. I had not used spatial clustering before, and learning how to apply it was interesting. It gave me a much more concrete tool to try to disentangle the patterns in my effectively 3D data on X,Y plane, as opposed to the Z direction.

Spatial pattern of ventenata invasion in eastern Oregon: Final Project

  1. The research question that you asked.

I initially asked the question, “how is the spatial pattern of invasion by the recently introduced annual grass, ventenata, influenced by the spatial pattern of suitable habitat patches (scablands) via the susceptibility of these habitat patches to invasion and ventenata’s invasion potential?”

  1. A description of the dataset you examined, with spatial and temporal resolution and extent.

In Exercise 1, I examined spatial autocorrelation and in ventenata abundance and ventenata hotspots using spatial data (coordinates and environmental variables) and ventenata cover data that I collected in the field (summer 2018) for 110 plots within and surrounding seven burn perimeters across the Blue Mountain Ecoregion of eastern Oregon.

Target areas were located to capture a range of ventenata cover from 0% ventenata cover to over 90% cover across a range of plant community types and environmental variables including aspect, slope, and canopy cover within and just outside recently burned areas. Once a target area was identified, plot centers were randomly located using a random azimuth and a random number of paces between 5 and 100 from the target areas. Sample plots were restricted to public lands within 1600m of the nearest road to aid plot access. Environmental data for sample plots includes: canopy cover, soil variables (depth, pH, carbon content, texture, color, and phosphorus content), rock cover, average yearly precipitation, elevation, slope, aspect, litter cover, and percent bare ground cover.

For Exercise 2, I examined how the spatial pattern of vegetation type influences invasibility of plant communities by ventenata. To achieve this, I applied vegetation type data from the Simpson Potential Vegetation Type raster data (Simpson 2013) to 30m resolution in Arc GIS which was developed to identify potential vegetation types across the Blue Mountain Ecoregion.

In Exercise 3, I explored how the spatial pattern of canopy cover was related to ventenata abundance. For this, I used a live tree canopy cover layer developed for the Pacific Northwest calculated using Forest Vegetation Simulator methods including the sum of canopy cover estimates for vegetation plots in the region (Crookston and Stage 1999).

  1. Hypotheses: predictions of patterns and processes you looked for

Ventenata is an invasive annual grass that shares many functional traits to other impactful invasive annual grasses in the region such as cheatgrass and medusahead, including similar vegetative height, fall germination, and shallow root system. These similarities have led me to believe that, like cheatgrass and medusahead, ventenata will be more abundant in open areas with low canopy cover where competition from existing vegetation is lower.

The study area contains many open areas interspersed throughout the larger forested landscape. The patchy spatial distribution of open areas throughout the study area will likely result in a patchy distribution of areas with high ventenata cover. Additionally, ventenata produces many seeds, with the majority of these seeds dispersing short distances from the parent plant. This leads me to believe that areas with high ventenata cover will be clustered near other areas with high ventenata cover creating invasion “hot spots” across the study region.

Hypothesis 1: Areas with high ventenata cover will be clustered near other high cover areas and low cover areas will be clustered near other low cover areas.

Hypothesis 2: The spatial pattern of ventenata abundance will be positively correlated with a neighborhood of non-forest habitat types (shrub-lands and grasslands) and negatively correlated with a neighborhood of forest habitat types. This relationship will decrease in strength as distance increases from the high cover sample point, as vegetation types farther from an invasion point are likely not as strongly influencing invasion as vegetation types closer to that point.

Once a species has established in a suitable habitat, it may spread to areas of less suitable habitat aided by strong propagule pressure from a nearby population. Open areas may act as source populations, allowing ventenata to build propagule pressure to the point where it is able to successfully establish and maintain a population in less suitable habitat such as areas with high canopy cover.

Hypothesis 3: Plots where ventenata is present in areas with high canopy cover (e.g. forests) will be clustered near open areas. These open areas may provide strong propagule pressure to aid invasion into areas with fewer available resources (sunlight).

  1. Approaches: analysis approaches you used.

To test these predictions I performed a handful of spatial analyses including:

Exercise 1: I tested for spatial autocorrelation using Moran’s I and created a correlogram in R and performed hot spot analysis in ArcGIS

Exercise 2: I explored the spatial relationship between the spatial pattern of ventenata abundance and the spatial pattern of different vegetation types using neighborhood analyses in ArcGIS and R

Exercise 3: I examined the spatial relationship between ventenata and canopy cover using a Ripley’s cross K analysis in R

  1. Results: what did you produce — maps? statistical relationships? other?

Throughout the analyses, I produced a series of statistical relationships displayed as maps and graphs. Hot spot analysis produced a map that allowed me to visualize the relationship of autocorrelation between ventenata abundance at my sample points. For Moran’s I, neighborhood analysis, and Ripley’s cross K, I produced graphical representations of statistical relationships in R.

  1. What did you learn from your results? How are these results important to science? to resource managers?

The correlogram and hotspot analysis results showed that the spatial pattern of ventenata is auto correlated and has a patchy distribution. The hotspot analysis suggests that areas of high ventenata are clustered with other high ventenata plots and low ventenata plots are clustered as I predicted in Hypothesis 1.  This is likely a result of the patchy distribution of open areas and forested areas across the landscape and the dispersal ability of ventenata.

Neighborhood analysis showed that areas with high ventenata cover are more positively correlated with nearby forested areas (ponderosa pine) than I originally thought. This result suggests that ventenata may preferentially invade areas surrounded by ponderosa pine vegetation type as well as shrublands which would not support Hypothesis 2. However, ponderosa pine vegetation type does not necessarily indicate high canopy cover, and could represent invasion into an alternative low canopy cover vegetation type. Additionally, the vegetation type maps are mapped at large spatial scales and may not represent the fine scale variation in vegetative cover. Uncertainty in this result inspired a follow up analysis using canopy cover instead of vegetation type as a predictor variable in a Ripley’s cross K analysis.

In my follow up analysis using Ripley’s cross K, I found that forest plots where ventenata was present were only weakly clustered around open areas despite my original hypothesis that there would be strong clustering (H3). These results could suggest that ventenata has a higher tolerance for high canopy cover than I originally predicted. Alternatively, these results could indicate that ventenata is capable of dispersing large quantities of seed much farther distances than originally thought, thus not requiring open areas in the immediate neighborhood. Moreover, the same issue of scale may apply to the canopy layer as the vegetation layer, and the 30m resolution may be over predicting canopy cover at my sample sites.

My findings could have severe implications for forest ecosystems which are commonly thought to be relatively resistant to invasion by annual grasses and are now showing susceptibility to ventenata invasion. For example, vententata could increase fine fuels in these systems, making them more likely to ignite and carry surface fire. Managers may want to consider incorporating annual grass management strategies into their current and future forest management plans to help reduce potential invasion impacts.

  1. Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?

During this class I learned a suite of new tools in ArcGIS including hotspot analysis and concentric ring buffer. I created my first model using ArcGIS Modelbuilder! I learned the basics of spatstat in R and successfully completed some spatial analysis which required transforming my data into a spatial data frame (I did not know that these existed prior to this class). Additionally, I was exposed to, and gained experience using many other new functions in R including Moran.I, correlog and kcross that were useful for spatial analysis.

  1. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), (d) multivariate methods (e.g., PCA),  and (e) or other techniques?

I learned that Moran’s I and correlograms are useful for testing spatial autocorrelation in data, but only if the scale applied is of interest. For example, it was not useful to compute only one Moran’s I value for my entire data set – this indicated that there was spatial autocorrelation in the data, but did not indicate a spatial pattern. However, when I computed Moran’s I at various distances and displayed these results in a correlogram, I found uncovered the pattern of the spatial correlation. The hotspot analysis allowed me to visualize exactly where the high and low clustering was occurring across my sample plots while simultaneously providing a significance value for those hot and cold spots.

Ripley’s cross K analysis was useful for testing the relationship of my ventenata points to another variable (canopy cover). I found this test appea ling because it tests whether or not one variable is clustered around another variable using a Poisson distribution to compare observed and expected values assuming spatial randomness. However, I learned that this method was not appropriate for my data, as my sample plots were chosen based on field variables and were not a random sample. This violated assumptions of randomness and homogeneity across the sampling region as my plots were more heavily located in non-forested areas. If I wanted to properly investigate these spatial questions, I would have to develop a more random sampling method.

Citations

Simpson, M. 2013. Developer of the forest vegetation zone map. Ecologist, Central Oregon Area Ecology and Forest Health Program. USDA Forest Service, Pacific Northwest Region, Bend, Oregon, USA

Crookston, NL and AR Stage. 1999. Percent canopy cover and stand structure statistics from the Forest Vegetation Simulator. Gen. Tech. Rep. RMRS-GTR-24. Ogden, UT: U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station. 11 p.