Author Archives: marcellm

Final Project: Searching For Faults with Kriging

Introduction:

I have looked at the spatial distribution of depth to first lava and the spatial relationship between the depth to first lava and the depth to first water. I found that the distance from volcanic centers might affect the depth we find the contact with first lava. I did not find any strong correlation between first water and first lava, but that does not preclude the idea that first water correlates to deeper lava contacts.

Now I want to look at what else might be effecting the spatial distributions of lava in the region. Note, that the techniques I used for looking at lava can also be applied to looking at the spatial distribution for water in the field area.

Figure 1: The style of active faulting in the Pit River Study area (delineated roughly in green). N-S trending faults accommodate dip slip (up-down) motion, while NW-Se trending faults accommodate oblique slip motion (up-down and side to side). Structurally controlled volcanoes often lie at the elbows of intersection between the N-S and NW-SE trending faults. Modified from Blakely et al 1997.

The Northern California Field area lies at the intersection of the Walker Lane Fault Zone, the Basin and Range and the Cascadia subduction zone. For the purpose of this examination I will focus on the relationship between the spatial distribution between Walker Lane style faulting (fig 1) and the spatial distribution of the depth to first water and the depth to first lava in the region.

The topography in the region reflects the interplay between active Walker Lane Style Faulting in the region and the higher frequency volcanism that mutes it. For, example, in the northern portion of the field area, Medicine Lake Volcano, with flanks covered by volcanism less than 10 kyr (fig 2), active faults disappear beneath the volcanic edifice. They do not appear in the topography, but they exist in the subsurface. However, in regions where faults are well expressed, enough displacement has occurred to 1) be resistant to burial by periodic lavas, and 2) to either serve as conduits to lava flow, or to displace lava itself.

The Data:

Figure 2: Map of the Field area with active faults (pink lines), the Pit River (blue line), rough locations of ground water discharge (yellow circles), young lava flows (green outlines) and all of the available well data (blue dots).

The subset of the data we used in figure 2 is depicted in figure 3. Temporally they range from wells dug in the1950s to the present. It is important to note that depths to first water might have changed in the 60 years since the older well logs were recorded. For this reason I did not use many of the older well logs in my analyses.

Figure 3: The center of the township and range sections for the wells used and corresponding depths to first lava. Purple logs are the shallowest and Blue logs are the deepest.

Hypotheses:

Figure 3: Conceptual design for the project. A) Configuration before slip upon the fault. The lava flow is continuous. B) If a normal fault displaces a lava flow, then one side will go down with respect to the other. The lower block experiences deposition, and the development of a soil profile. If enough time has passed, then there will be a difference between the depth to first lava between the upper and lower block.

I predict that faults will run parallel to and coincide with large step changes in the depth to first lava.

If there are no large step changes in depth to first lava, I predict that these changes will be driven by changes in elevation and correspond to channelized lava flows rather than faults.

Methods:

I used the Threshold Kriging function in Arc to make a surface that corresponded to the likelihood of the depth to first lava being less than 30 feet deep. The Kriged surface then interpolates the probability that location at other points in the map correspond to less than 30 feet deep. When the surface has a value of 1, values are close to less than 30ft deep. When it is close to zero, values are greater than 30 feet deep.

I then preformed a confusion matrix to see how my results compared to what I expected. This part I did by eye.

Results:

Figure 4: Kriged Surface with faults superimposed.

	High-Low	Low-Low or High-High
parallel	257	315
perpendicular	213	5

Table 1: Corresponding confusion matrix. There were 257 High-Low values that corresponded to parallel faults, but there were 315 Low-Low or High Values that also corresponded to parallel faults. See discussion below.

Figure 5: Kriged surface with contours.

Discussion:

257 out of 790 of the faults were gradient parallel. If, corresponded to step function in kriged depth to first lava. However, many more did not. What is happening here? If you look to figure 5, you can see that while some of the high gradients correspond to rapid changes in elevation. This block is the footwall of a range bounding normal fault. That fault is not mapped because it is no longer active. But it has large amounts of uplift upon it, and likely corresponds to a different lithology entirely and so does not fall into our hypothesis.

If I do this analysis for every lava layer I find, and can correctly link the well logs together, then this analysis could help me constrain slip upon the faults in the region. This information is good for hazards assessment in the region as PG&E has a hydroelectric dam in the area.

Learning outcomes:

Throughout the course of this class, I learned a few of the nuances of R, though my work here has just begun. I also feel like I have a better understanding of how to think about spatial processes, which I why I enrolled in the first place. Future goals include better locating my data in space. This way, many of the techniques I used in this class will pick up on actual processes, and not on the gridded spacing of my data.

Sources:

Blakely, Richard et al. “Gravity Anomalies, Quaternary Vents, and Quaternary Faults in the Southern Cascade Range, Oregon and California: Implications for Arc and Backarc Evolution.” Journal of Geophysical Research: Solid Earth, vol. 102, no. B10, 1997, pp. 22513–22527.

Testing the Cross Variogram with Ripley’s K Plot and Cross K Plot

Marina Marcelli

Question:

What is the spatial distribution of the water table in my study area with respect to scale? How are wells in my area clustered? Is there a relationship between wells where the water lies above the first lava and below the first lava, wells that have water tables that correspond to the first lava those that lie above the water table, and wells that have water tables that correspond to the water table and those that lie below it?

Approach:

I used both the Ripley’s K function and Ripley’s Cross K function to look at the spatial distribution of water, first lava and the water table with respect to lava in the area. Like the variogram and the cross variogram, Ripley’s K function and the Cross Ripley’s K function describe spatial distributions at different scales. However, rather than use the variance, as with the variogram, Cross Ripley K compares the spatial data to a curve that represents complete randomness (the Poisson’s curve).

Brief Methodology:

I first did a preliminary analysis using the Ripley’s K function for both the depth to first water and the depth to first lava.

Then I used a Kcross to compare the differences between the depths to first lava and water. Because the Kcross function uses only factor variables, I had to make sure my data was categorical. I thus decided to bin my data into three categories depending on the Lava – Water (L-W) value.

L_W[i] >40 <- “above”

40>= L_W[i] >= -40 <- “equal”

L_W[i] <-40 <- “below”

Where “above” stipulates that the water table is above the contact to first lava, “equal” stipulates that the water table roughly equated to the contact to first lava, and “below” signifies that the water was below the contact to first water.

After binning the data I compared each category to the others, resulting in three Cross Ripley’s K plots. I then plotted a significance envelope to see where the data was actually significant.

Results:

Figure 1: Study area with the well logs used for this study. The cyan represent the well logs that have a water table “above” the first lava. The Blue are below, and the pink have a water table that roughly correponds to the first lava.

Figure 2: K(r) vs r, the distances for which we are comparing clustering. For these data, Poisson’s curve appears to be nearly horizontal. This means that the data appear to be clustered at all scales measured by Ripley’s K function. According to these plots the depth to first lava data is clustered at all scaled.

Figure 3: The shape of the depth to first water Ripley’s K plot is different than the depth to first lava plot (fig 2). What that means, I don’t know. However, based on both Poisson’s curve and the significance envelope, water also clusters all scales measured.

Figure 4: Ripley’s Cross K function for the points where the water table is above the contact with first lava, with the points where the water table is below the contact with first lava. At distances shorter than 6 km, the spread appears to be random, while distances, the data appear to have significance. This means that the data do not cluster a close distance. This corresponds what we would expect from natural fluctuations in elevation of the water table driven by changes in elevation. In a simple system, with one lava and one water table, this works well. However, the study area is in reality much more complicated than this.

Figure 5: Cross K function for the points that correspond to above and equal. The data appear to be correlated at much closer distances than the above and below data.

Figure 6: Cross K plot for points corresponding to equal and below. They appear to be linked at all scales. Ideally they would be clustered at closer scales and random farther away. The discrepancy might be accounted for by faults, or multiple water bearing layers.

Conclusion:

Depth to first lava and depth to first water are linked at all scales measured by Ripley’s K plot. In this case, the largest scale I managed to measure was 12 km. Wells that plotted as above the water table and wells that plotted below the water table were not clustered, rather they showed to be linked at distances greater than 6 km. Points that were equal and below were correlated at all scales. Well logs that were equal and above were correlated at scales larger than 2 km. This might have to do with lack of data. It might also have to do with the regional geology.

Critique of the method:

One aspect of the process that I walked away from was that my field area is 40 km across. The largest r value I calculated was 12 km. Ripley’s K plots and the Cross K plots might demonstrate different relationships are larger scales. In the future I would like to figure out how to change the r values.

I will need to be able to plot these data at larger scales to determine weather or not they corroborate what the variograms found.

Exercise 2: Cross Variogram and Kriging

In Volcanic Regions fluid flow paths are limited to the rubbley bases and flow tops of lava flows where permeability promotes transitivity.

Hypothesis:

The depths to first water will corresponds to depths located near lava flows.

Definitions:

Contact: location on the surface, or at depth where two different rock types touch.

Depth to first water: the depth from a particular well log where water was first noted, this is not always listed.

Depth to first lava: the depth at which the first lava is noted in a particular well log. There can be multiple contacts to a lava in a well log, which is why I specified first.

Figure 1: Block diagram of what well log depicts. The green and red planes represent contacts between the two rock types. Well (grey) and well logs record these contacts as depths from the surface (brown).

Question:

Does the depth to first water correspond to the depth to first lava in my data set?

Tool:

Cross Variogram and Kriging

Like the variogram, the cross variogram is a tool that allows you to compare spatial data at multiple scales. Unlike the variogram, the cross variogram compares one data set to another data set at multiple scales.

Kriging uses the variogram to interpolate a surface.

Brief Description:

In order to use both the variogram and the cross variogram you must normalize the data you are working with. Otherwise the semivariance values can range from 0 to infinity. Normalizing the data allows you to distinguish data that are correlated (semivariance<1) from data that have no correlation with eachother (semivariance>1).

In order to use the R function gstat, I had to turn the data into a spatial data frame.

The function gstat allows you to simultaneously create variograms for each of the induvidal data sets you are working with and compares them with each other. In this case I just compared the depth to first lava with the depth to first water.

I used Arc GIS kriging formula (ordinary) to krige a surface that represented the difference between between the depth to first lava and the depth to first water. In other words I subtracted the depth to first lava from the depth to first water and kriged that “surface”. I wanted to see if there was a spatial distribution around which those difference were low. I tried to use R to krige, but did not have the time to work out the krige function.

Results:

My results were strange, though ultimately unsurprising.

Figure 2: Variograms of my two variables water1 and lava1 as well as the cross variogram that comes from comparing the two. Note that the water1lava1 cross variogram has negative values for the semivariance.

Figure 3: Plot of well log data with the difference between first lava and first water plotted on top of its kriged surface.

The strangest thing to resolve from this exercise are the negative semivariance values for the cross variogram. Semivariance is a squared values and therefore should not have negative values. I have no idea what is happening here. I need to ruminate upon it. Either way, the data does not appear to be well correlates, or at the very least, I am not comfortable making conclusions about it with the negative semivariance values.

The ordinary Kriged surface interpolated from the difference between lava and water lets me know that the highest Kriged surface (Fig 3, white) between water and contact lies in the middle of the study area . In geographic and geologic space this corresponds to a basin filled with sediment and inter-fingered with lava. Many of the well logs in the region are not deep, they don’t have to be because the water here is near the surface, and close to some of these buried basalt flows. The data at the far edges of the map are spatial outlies, and thus we can’t look at any of the map that lies far from the main cluster of data points.

From physically looking at the well logs I know that while the well logs do often correspond to a lava flow, it is almost never the first lava flow. I am not surprised that the semivariance indicates that the data are not correlated.

Critique of the method:

what was useful, what was not?

It was not particularly useful, because it told me what I already know and left me with more questions than answers. However, I did walk away with some considerations. The cross variogram (and variogram) might work at a smaller scale.

In other words, if I broke my field area up in regions where I think the lava layers might source from the same place (Lassen Peak or Medicine Lake Volcano) I would be able to make the assumption that lava layers that are at similar depths in the well log correspond to the same lava flow in space. If we consider figure 1, in a small area we would be able to link the green layer to other locals where the green layers lies at depth, and would be able to spatially autocorrelate them with the variogram.

The next step, one I narrowed down my area, would be to correlate the depth to water with the depth to every lava flow I found in the well log. This would allow me to see which lava layer best corresponds to the depth to first water.

One of the things I discussed with my partner was trying to figure out what the negative values meant in my variogram. As I stated above, I still need to think about this, or figure out what I did wrong. I also discussed taking data out of lat-long space and into UTM space; that is something I am also still thinking about.

One final note: At the moment my data is both clustered around certain spots, and I do not have much of it. Every time I add a few data points, the shape of the variogram changes. Some of the spikiness I am seeing is likely from that.

Exercise 1: The Variogram

Question:

How does the variance between the depths to the first lava flow in my filed area vary with increasing distance?

The Tool: Variogram

I used a variogram to analyze the variance within my dataset. Variograms are discrete functions calculated using to measure the average correlation between pairs of measurements at various distance (Cameron and Hunter 2002). These distances are referred to as the binned distances (Anselin, 2016). In this study, binned distances determine the distances by which the depth to first lava flow is autocorrelated (Aneslin, 2016).

variog(geodata,coords = geodata$coords,data = geodata$data,max.dist=”number”) (R documentation)

The R code above need the geodata, an array of the data you are testing, the cords, or the coordinates those data correspond too.

Brief Methodology:

I selected data based on the quality of the descriptions, in the well log and assigned each well log I have interacted with a value from 0 to 3. Data with a score 0 represents either a dry well or a destroyed well. Data that has a score of 3 is well described and denotes either the depth to first water or the standing water level. I used data with a score of 3 for this analysis. I denoted the depth to the first lava flow in each of the well logs.

Figure 1: My field area, the well log locations are the blue circles, a rough delineation of my field area is in while.

First I normalized my data, giving a mean of 0 and a standard deviation of 1. Then I determined a max distribution, which determines the max bin size the variogram uses. The max distribution determines the lag increment, or the distances between which the increment is calculated (Anselin, 2016).

The data are projected, meaning that their horizontal distance is measured in meters. However, they are recorded in decimal degrees and located at the center of the township and range section that they are located in. Rather than convert my data from decimal degrees to meters, I recognized that there are around 111 km in 1 degree (at the equator), that there are 0.6214 miles in a kilometer and that there are 6 miles in a township. This helped me determine the max distribution for the variog function in R. I decided upon a max distributions of 0.09 and 0.5 degrees.
I then used R’s variog function on the normalized data with the max distribution of 0.09 and 0.5 degrees.

Succinct Results:

Figure 2: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.09 degrees. Note the low semivariance with the lower bin size (.02 to 0.08 degrees).

Figure 3: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.5 degrees. Note the low saw-toothed pattern of semivariance. From 0.01 to 0.08, semivariance is low, it spikes up, and lowers again around 0.2, spikes again at 0.3 degrees and lowers again at 0.4.

Critique:

I tested max bin sizes of 0.5 and 0.09 degrees to see how the variogram changes with an increasing bin size. Changing the max bin size, called the max distribution in R, changes the variogram. The smaller bin size, 0.09 degrees, limits the max bin size to a narrow range of values. In effect, the variogram only tests the covariance of data points that are separated by a maximum .09 or degrees. Increasing the bin size increases the distance around which the covariance of the data points are tested. Thus, mad distributions of 0.5 result in a spikey plot. Normally, one would expect the variogram to plateau at larger bin sizes, representing large variance with the data with larger distances, but figure 2 does not.

At its simplest, the variogram in figures 1 implies that data points that are correlated in space are more likely to have similar depths to the first lava flow. You can see this in the low variance in you find in locations that are close to each other, the smaller distances, and the higher variance in distances that are farther from each other.

Figure 2, with the max distribution of 0.5 degrees one could made an argument for a distribution of the locations of lava flows based on the locations of volcanic centers. Medicine Lake Volcano lies in the North of the study area and Lassen Peak lies in the south. The two spikes in variance with location might be linked to the distances between those two centers. In other words, distances that correspond to low values of semivariance (>1) correspond to either regions that lie on the same lava flow, or another near surface lava flow sourced from another volcanic center in the region (figure 3). Rather than finding the same lava flow at depth, we are seeing different lava flows at similar depths.

Figure 4: My field area with the rough delineations of two of the near surface lava flows in the region. 1 degree of longitude corresponds to roughly 111 km.

Lava flows are not attracted or repulsed to or from each other, but they do follow the laws of physics. Often, Volcanoes build topography, when lava erupts from them, the lava will flow from the high elevations of the volcano, to basins, using paleo-valleys as conduits for flow. Thus, if you know the paleo topography you can understand where a lava might have flowed, and where it might emplace. Predicting paleo-topography can be difficult in old volcanic regimes, but on the geologic timescales we are looking at, I can predict that the topography of the MLV-LP-BVM triangle has not changed much over the past 5 ma.

Lavas flows from high to low topography and from Volcanos. The two main volcanos in my field area are Lassen Peak in the South and Medicine Lake Volcano, in the North. Moreover, lava flows from high to low elevation. Lavas emplace in basin, if they sit long enough, the top soils form, the basin subsides, and another the cycle repeats.

My data does have different spatial patterns at different scales. If you look at regions that are within 0.1 of a degree of distance then you would expect to see a similar depth to the lava flow. If you move to 0.5 of a degree, you see a sees-saw effect, where the depth to the lava flow moves from having lava close to the surface of deeper down. This variation stems from the proximity of the data point to the sources and the sinks of the lava flow.

I would use this technique again, it helped me think about my problem.

Sources:
Cameron, K, and P Hunter. 2002. Using Spatial Models and Kriging Techniques to Optimize Long-Term Ground-Water Monitoring Networks: A Case Study. Environmetrics 13:629-59.

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Using Spatial Statistics to Determine the Subsurface Spatial Distribution of Lava Flows in Northern California

Research Question

I am trying to determine potential fluid flow paths based on the spatial distribution of lava flows in the Medicine Lake, Lassen Peak Big Valley Mountain area of Northern California. The final goal of this project is a 3-D subsurface framework of the geology from which we can model fluid and heat flow.

Thus we want to know “How the spatial pattern of the depth of the lava flows attributes of wells is related to the spatial pattern of lava flow depth (B1), which in turn is related to pre-lava-flow topography (B2) and the regional geology (B3), because lava flows follow topography and subside post emplacement at rates that follow basin subsidence rates in the region, and form barriers for/conduits of groundwater (C1).”

Dataset

My data are a series of more than 1500 well logs. Each contain data that pertains to a map coordinate and with information about changes in lithology, or rock type, with respect to depth. Well logs are collected the year the well is made. I have well logs that range from the 1960s to the present. Temporal data is less important for the initial part of my study (Fig 1).

Figure 1: What is a Well Log? A well log is a record of the changes in rock type that occur with changes in depth and a location at the surface of the earth.

I also have data from geologic maps. Geologic maps provide the spatial extent of the surface exposure of a geologic unit, which includes information about its contacts with other rock units in x,y space as well as information about the elevation (Fig 2).

Figure 2: My study area in Northern California, it lies between the Big Valley Mountain, Medicine Lake Volcano and Lassen Peak with the Pit River draining the middle of it. The green dots represent the center of the township and range section that the wells are located in, and the size of the circle indicates the number of wells in that section.

Hypothesis

Figure 3: Cross section of a Basalt Column (Lyle 2000). In Volcanic systems, fluid flow is limited to the Vesicular flow tops, the rubbly bases (P.P.C in this diagram) and sedimentary interbeds that lie between vertically stacked flows. These sections of the rock are the only locations with high enough permeability and porosity to allow the movement of water.

This means that if we know where the lava beds lie, and we know their contacts, then we can outline potential zone of flow. Determining the patterns of how lavas flow and emplacement then allows us to determine the lava flow’s spatial extent, and therefore potential flow paths. I expect lavas to follow the laws of physics. They travel as viscous fluids and fill basins, and so I would expect them to be thickest where paleotopography was lowest, they will likely thin out at the edges, and they will be down slope from the volcano or vent that erupted them. Given the regional geology, I would expect the thickest lava flows to lie in the Pit River basin, near the range bounding Big Valley Mountain Fault (Fig 3).

At its simplest, we expect lava flows to follow the geologic principals of original horizontality and of cross cutting relationships.

Approaches

I would like to learn both how to apply variograms and kriging and how they work; plus any new techniques that I am not yet aware of. We can also make the assumption that all our residuals with follow the rules of stationarity. This means that any irregularities in the data represent unacknowledged geologic features.

Expected outcome

My first goal is to create a 3-D subsurface map of the connectivity and contacts of lava flows in the Medicine Lake, Lassen Peak Big Valley Mountain region. Ideally I would begin the process with Geog 566. I would like to have a few surface I can test in the field by the end of this term, but understanding different methods with which I can make this framework is my first goal.

Relevance

Understanding the distribution of lava flows in the region ties into the regional geology of my study area. As I stated in Question 3, lavas fill basins. Basin morphology, and the amount of space lavas can fill depends on the slip rates and distribution on faults in the Medicine Lake, Lassen Peak Big Valley mountain triangle. By constraining slip rates in the basin, we both build a better picture of the regional geology, and we can make more rigorous checks on the validity of our statistical outputs.

Another equally important point is the next step in the project. After we make the 3-D framework, the USGS will use it to model fluid and heat flow in the region. Comprehending potential changes in groundwater flow in the region will allow city manages in the region to better manage water in the future.

Preparation

Arc-Info	Not much, ArcMap, yes
Modelbuilder and/or GIS programming in Python	Working knowledge of python outside of GIS programming
R	None
Image Processing	Working Knowledge
Relevant Software	Matlab

Sources

Lyle, P. “The Eruption Environment of Multi-Tiered Columnar Basalt Lava Flows.” Journal Of The Geological Society, vol. 157, 2000, pp. 715–722.

GEOG 566

Advanced spatial statistics and GIScience

Author Archives: marcellm

Final Project: Searching For Faults with Kriging

Testing the Cross Variogram with Ripley’s K Plot and Cross K Plot

Marina Marcelli

Question:

Approach:

Brief Methodology:

Results:

Conclusion:

Exercise 2: Cross Variogram and Kriging

Exercise 1: The Variogram

Question:

The Tool: Variogram

Brief Methodology:

Figure 1: My field area, the well log locations are the blue circles, a rough delineation of my field area is in while.

Succinct Results:

Figure 2: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.09 degrees. Note the low semivariance with the lower bin size (.02 to 0.08 degrees).

Figure 3: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.5 degrees. Note the low saw-toothed pattern of semivariance. From 0.01 to 0.08, semivariance is low, it spikes up, and lowers again around 0.2, spikes again at 0.3 degrees and lowers again at 0.4.

Critique:

Figure 4: My field area with the rough delineations of two of the near surface lava flows in the region. 1 degree of longitude corresponds to roughly 111 km.

Using Spatial Statistics to Determine the Subsurface Spatial Distribution of Lava Flows in Northern California