Tag Archives: t test

Courtney’s Final Project Post

Research Question

  • “How is the spatial pattern of ion and isotope concentrations in wells tapping the basalt aquifer related to the spatial pattern of mapped faults via the mechanism of groundwater flow as determined by hydraulic transmissivity of the geologic setting?”

Description of dataset that I examined

  • A: In my research I have analytical data for 31 wells, whose XY locations were determined by field confirmation of the Oregon Water Resource Department (OWRD) well log database. As groundwater is a 3D system, I have to consider Z values as well. The well depths and lithology information are also from the OWRD database. My analytical data provides a snapshot of water chemistry during the summer of 2018. I have only one temporal data point per well. At all 31 wells, I collected samples to be analyzed for pH, temperature, specific conductivity, oxygen isotopes 16 and 18, and hydrogen isotopes 1 and 2. At a subset of 18 of those wells I collected additional samples for tritium, carbon 14, and major ion analysis.
  • B: The shapefile of faults mapped at the surface was created by Madin and Geitgey of the USGS in their 2007 publication on the geology of the Umatilla Basin. There is some uncertainty in my analysis as far as extending this surface information into the subsurface. USGS studies have constrained proposed ranges of dip angles for the families of faults that I am studying, but not exact angles for any single mapped fault.
  • C: results of pumping interference tests involving 29 wells, 12 of which I had chemical data for. The data was collected by the OWRD in 2018 and 2019.

Hypotheses

  • Faults can act as conduits or barriers to groundwater flow, depending on how their transmissivity compares to the transmissivity of the host rock.
  • I hypothesize that clusters of similar chemical and isotopic properties of groundwater can indicate a shared aquifer unit/compartment, and that if faults separate clusters then the fault is influencing that difference in chemical/isotopic signatures. If the fault is between two clusters, I hypothesize that it is acting as a barrier. If it crosses through a cluster, I hypothesize that it acts as a conduit.
  • Where faults act as barriers, I hypothesize that parameter values will differ in groups on either side of a fault. Specifically, a barrier fault might cause older, warmer water to rise into upper aquifer layers, and the downstream well might show a signature of more local recharge.
  • Where faults act as conduits, I hypothesize that water chemistry and isotopes of samples from wells on either side of the fault would indicate a relatively direct flowpath from the upstream well to a downstream well. Over a short distance, this means that ion and isotope concentrations would not differ significantly in wells across the fault.
  • My hypotheses depend on a “barrier” fault damming groundwater flow up-gradient of the fault, and compartmentalizing local recharge on the down-gradient side. This hypothesis is only relevant if the fault is roughly perpendicular to the flow direction, and so disrupting transmissivity between a recharge zone and the wells. If a fault that separates two wells is parallel to the flow direction and there is no obstacle between the wells and upstream recharge areas, then the fault might indeed limit communication between the wells but they will have similar chemical signatures. Wells separate by this second kind of fault barrier would be better evaluated by a physical test of communication, such as a pumping interference test.

Analysis Approaches

  • Principal component analysis: used to simplify the multivariate data set (19 variables!) into variable relationships that could represent aquifer processes
  • Analysis of PCA results compared to distance from a flow path
    • Interpolation of well water levels classified by well stratigraphy to estimate a potentiometric surface and evaluate groundwater flow directions.
    • Raster calculations to compare flow direction to fault incidence angle
    • Measuring distance from each well to the nearest fault along the flow path
    • simple linear regression, comparing Non-ion PC1 score of a well with its distance from a fault.
  • Two-sided T-tests comparing distance between wells, presence of a fault, and pumping interference test communication between wells
  • ANOVA F-tests comparing chemical and isotopic variance within groups of wells that communicate with each other and between those groups.

Results

  • Principal component analysis – Patterns of variation between wells are statistically explained primarily by total ion concentration, a relationship between chemical evolution from ion exchange and decreasing stable isotope ratios, and the combination of well depth and water temperature. Moran’s I indicates that only Non-ion PC2 is spatially clustered, while the other PC models have a distribution not significantly different than random. The other PC models are useful to understand the groundwater system, but not specifically to analyze clustering correlated to faults.
  • Interpolation of water level, and comparison of fault incidence angle with flow direction, indicates faults that are and are not able to be tested by my hypotheses.
  • Analysis of PCA results compared to distance from a fault along flow path – some wells that are “within” a fault zone have very old signatures and others have very young signatures. This could be related to the angle of the dip of the fault and the accuracy of mapping compared to the depth of the well. I hypothesize that the wells that are in the fault zone but have high PC1 scores are on the up-gradient side of the fault where older water is upwelling along a barrier. Wells in fault zones with low PC1 scores could indicate wells open to downgradient areas of the fault, where vertical recharge through the fault damage zone is able to reach the well.
  • Returning to the conclusions I wrote in that blog post after I found improved stratigraphic data, I’m not sure if I can make conclusions other than those about the wells are that mapped as “inside” a fault. Several wells that are closer to faults are also open to shallower aquifer units, and so the effect of lower PC1 scores closer downgradient to faults might be confounded by lower PC1 scores caused by vertical inflow from the sedimentary aquifer and upper Saddle Mountain aquifer.
  • Two-sided T-tests comparing distance between wells, presence of a fault, and communication between wells show that the presence of a fault has a greater effect on communication than the distance between the wells.
  • ANOVA F-tests comparing chemical and isotopic variance within groups of wells that communicate with each other and between those groups – stable isotopes and specific conductivity both show more variation between well groups than within well groups.
  • Not covering in these blog posts, I also ran Moran’s I on my inputs to see which ones are clustered and so might be more related to horizontal grouping factors (such as faults) than vertical grouping parameters (such as stratigraphic unit). Of the PCA and individual variables, only d18O, d2H, and Non-ion PC2(combination of well depth and water temperature) were clustered. The other PCA models, temperature, pH, and specific conductivity were not significantly spatially clustered.

Significance –  Groundwater signatures are related to faults agree/disagree with past understandings of differences between wells in the region, and can inform well management. If a senior permit holder puts a call on the water right and asks for junior users to be regulated off, it would not help that senior user if on of those junior permit holders’ wells is not hydraulically connected to the senior users.

  • More wells would need to be sampled to be better able to disentangle the effects of faults from the effects of well stratigraphy.

My learning – I learned significantly more about how to code and troubleshoot in R. Additionally, I learned about the process of performing spatial clustering analysis in ArcGIS.

What did I learn about statistics?

  • PCA was completely new to me, and it’s a cool method for dealing with multivariate data once I dealt with the steep learning curve involved in setting it up and interpreting the results. It was useful getting more practice performing and interpreting t-tests and Anova F-tests. I had not used spatial clustering before, and learning how to apply it was interesting. It gave me a much more concrete tool to try to disentangle the patterns in my effectively 3D data on X,Y plane, as opposed to the Z direction.

Ex 3: Relating distance between wells to communication type and fault presence

Question that I asked:

Is there a relationship between the distance between wells, their communication status based on a pumping interference test, and whether or not they are separated by a fault?

Name of the tool or approach used:

Polyline creation and classification in ArcGIS Pro, boxplot creation and two-sided t-tests in R.

Method:

29 wells in my study area were evaluated by the Oregon Water Resources Department during pumping interference tests in 2018 and 2019. This test involves pumping one well, and seeing whether the water levels in nearby wells drop in response. I received a verbal account of the wells that did and did not communicate, sketched it on a map, and then transferred that information to ArcIS Pro. I drew polylines using the well locations as snapped endpoints. Then, I created and populated fields containing the well communication type (“communicate” for wells that respond to pumping at a nearby well, and “does not communicate” for wells that do not) and whether or not the path between two wells crosses a fault. Shape_Length in feet was automatically calculated when I created the polylines, on account of the projection I used for the shapefile.

I exported that data table to a csv and imported it in R, where I subset it into three categories: all paths, paths between wells that communicate, and paths between wells that do not communicate. I then created box plots and ran t-tests to see differences between means and distributions of path length based on communication type or fault length.

Results:

Comparing the path length and the communication type of all 29 wells involved in the communication test, there is not significant evidence of a difference in mean path length between wells that do and do not communicate because the p-value of a two-sided t-test was 0.152. While the mean distance between wells that do not communicate is larger than the mean distance between wells that do communicate, the overlapping interquartile ranges in both categories make this difference less significant. There is not clear evidence that distance plays a role in well communication.

There is some evidence for a difference in mean path lengths between wells that do and do not cross faults, based on a p-value of 0.047 in a two-sided t-test. The mean path length that crosses a fault is 5,139 ft, while the mean path length that does not cross a fault is 3,608 ft. Wells that are closer together are less likely to be separated by a fault.

For wells that do communicate, there is evidence of a difference between the mean path lengths that cross faults and the mean path lengths that do not cross faults. The p-value for a two-sided t-test was 0.024. Wells that communicate but are not separated by a fault are more likely to be closer together than wells that are separated by a fault.

For wells that do not communicate, there is no evidence of a difference in mean path lengths between paths that do and do not cross faults, given a p-value of 0.98 in a two-sided t-test. Wells that do not communicate are likely to be separated by the same mean distance whether or not they are separated by faults, although there is a larger range of path length values for wells separated by a fault that do not communicate.

 

Summary of results:

Wells that communicate in pumping tests do not have a significantly different mean distance between them than wells that do not communicate (p = 0.152)

Wells that are closer together are less likely to be separated by a fault. (p = 0.047)

Wells that communicate but are not separated by a fault are more likely to be closer together than communicating wells that are separated by a fault. (p = 0.024)

Wells that do not communicate are likely to be separated by the same mean distance whether or not they are separated by faults, although there is a larger range of path length values for non-communicating wells separated by a fault. (p = 0.98)

Critique: I wish I had more sample points and paths to work with, so I could use a more interesting analysis such as ANOVA.