Category Archives: 2019

Spring 2019 blog posts

Spatial Distribution of Trees by Height Class, Slope and Elevation in the HJ Andrews Forest

Guiding Questions: How do distances between trees differ depending on tree height? How does the spatial pattern of tall trees relate to the spatial pattern of slope and elevation?

Methods: I used a combination of ArcMap, QGIS and R to perform analyses and view results. I used the results of my previous distance analysis within the HJ Andrews Forest, which grouped individual trees into ten height classes and calculated the mean distance between trees within the same height class, to correlate tree spacing with other spatial phenomena. I wanted to know if hot spots in within class tree spacing correlated with hot spots in tree height, so I examined hot spots and cold spots of tree distances within each height class and compared them to tree heights, slope and elevation. Height class 1 is the shortest class of trees and height class 10 is the tallest class of trees.

I used the Hot Spot Analysis Tool in the Arc Toolbox > Spatial Statistics Tools > Mapping Clusters > Hot Spot Analysis (Getis-Ord Gi*) to perform a Hot Spot Analysis on each of the ten height classes by mean distance to the 30 closest trees of the same height class. In the context of this analysis, the interpretation of a hot spot is that it is a region of greater than expected distances between trees of the same height class. For example, in the shortest height class, 1, hot spots are regions of greater than expected mean distance between a short tree and the 30 closest short trees. Cold spots would then be regions of closer than expected mean distance between short trees.

The Hot Spot Analysis in ArcMap used a self-generated distance band of 113m for my original hot spot analysis of the global dataset (not broken up by height class), so I decided to use a distance band of 100m for each subsequent hot spot analysis. Each height class has a different number of total trees in it, so by holding the distance band constant, I hoped to avoid influence from any differences in total number of trees between height classes.

After viewing the hot spot results, I plotted the z-scores of heights for each height class against the z-scores of the distances between trees to visually examine their relationship. If both heights and distances between trees were perfectly normally distributed, one would expect a circular distribution on the density plots with a slope of zero.

I then compared the mean slopes, elevations, and standard deviation of slopes and elevation within height classes across the entire forest. Since HJA is a research forest with many different management areas, including harvested patches and research plots, I limited the next part of the analysis to only within control areas of the forest. I downloaded the most recent (2014) land use designations from the HJA data repository (http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI008). For this analysis, I used Entity Title 3: Reserved areas (controls) within the Andrews Experimental Forest. I compared slopes and elevations within the control plots only by height class, to see if there were differences between the global dataset and the control regions of the forest.

Results:

The density plots of height z-score versus distance z-score revealed a different pattern between smaller height classes of trees and tree spacing than the relationship between larger height classes of trees and tree spacing. As we go from shorter height classes of trees to taller height classes, the density plot distributions change (Figures 1-10). There is strong evidence of positive correlation between hot spots of short trees and hot spots of distance between short trees, but from height class 6-10, there is little to no evidence of a relationship between hot spots of trees and distance between them. Tall trees are more or less distributed randomly throughout the forest.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

There is clearly some structure to the density plots (especially in height classes 1-5), so we can assume that the trees are not randomly distributed and that there is a relationship between height and distance between trees. I compared mean and standard deviation of slope, as well as mean and standard deviation of elevation for each height class of trees (Table 1). Mean slopes do not significantly differ between height classes, so slope is likely not a main driver of tree height. However, there is some evidence that tree heights differ at lower and higher elevations, with the shortest height class of trees at a mean elevation of 1014 m, and the tallest height class at a lower mean elevation of 831 m. It’s important to note that mean elevations have large standard deviations, so the trend may not be as strong as it first appears. I wanted to know if there was more evidence for this pattern, so I calculated the same statistics for subsets of the hotspot analyses constrained to only the control areas of the forest (Table 2) to get an idea of how management may or may not influence the relationship between tree height and tree spacing throughout the forest. Mean slopes and elevations, accounting for standard deviation from the mean, do not differ significantly between the global and control datasets, meaning that the control regions are reasonable representations of the rest of the forest. To examine this further, I examined the same data for the entire forest excluding the control areas (Table 3). The same pattern holds between the three datasets; class 10 is the only class that is consistently at lower elevations across the forest. I made two density plots to display this relationship. Figure 11 shows the distribution of elevation for the shortest height class (class 1) of trees (green) versus the global dataset of all trees (red). The short trees follow the same distribution as the rest of the dataset, meaning that they are dispersed more or less evenly across elevations. Figure 12 shows the distribution of the tallest height class of trees (class 10; light blue) by elevation versus the global dataset of trees by elevation (red). This clearly shows that tall trees are not distributed at higher elevations.

Figure 11. Density of trees by elevation in height class 1 (shortest trees; green) versus global dataset of all trees (red).

Figure 12. Density of trees in height class 10 (tallest trees; light blue) versus global dataset of all trees (red) by elevation.

Table 1: Global dataset

Height Class	Mean Slope	SD Slope	Mean Elevation (m)	SD Elevation
1	23	11.7	1014	294
2	23	11.6	1008	291
3	25	11.6	990	295
4	25	11.5	948	291
5	25	11.3	931	293
6	26	10.9	972	291
7	27	10.5	982	250
8	26	10.6	959	221
9	25	10.7	926	207
10	23	11.2	831	197

Table 2: Subset of control regions

Height Class	Mean Slope	SD Slope	Mean Elevation (m)	SD Elevation
1	27	11.9	1104	317
2	27	11.5	1136	328
3	28	11	1157	329
4	28	10.8	1143	311
5	28	10.2	1133	285
6	28	10	1071	262
7	28	10	992	228
8	27	10.6	955	193
9	26	10.8	935	187
10	24	11	869	189

Table 3: Global dataset excluding control regions

Height Class	Mean Slope	SD Slope	Mean Elevation (m)	SD Elevation
1	22	11.4	991	284
2	22	11.4	979	273
3	24	11.6	952	272
4	24	11.5	902	266
5	24	11.5	867	265
6	25	11.3	908	290
7	26	10.7	973	267
8	25	10.5	964	242
9	23	10.4	919	221
10	22	11.2	793	198

Critique of the method:

A criticism of hot spot analysis is that it’s basically a smoothing function that places a focal around an area but does not account for the distribution of values within that area. So, the tallest tree in the dataset could be in cold spot (region of shorter than expected trees) and the hot spot analysis would give you no indication of that, so one may miss out on potentially useful/interesting information.

This is only a cursory look at the data and a next step is to more closely examine how slope, elevation and aspect influence distribution and height of trees, particularly within the control areas of the forest.

Ex 2: Spatial distribution of juniper saplings and slope,aspect, and seed source

1. Question Asked: How does the density of juniper vary based on characteristics associated with soil moisture? Additionally, does this pattern vary with juniper density (as an indicator of potential seed source) surrounding each transect?

The study site for this analysis is the Camp Creek Paired Watershed Study (CCPWS) in central Oregon. The majority of juniper was removed from one watershed (“treated WS”) while one watershed is dominated by mature juniper (“untreated WS”). This analysis seeks to examine the density of juniper sapling reestablishment in the treated WS and mature juniper density in the untreated WS as it relates to indicators of soil moisture (slope, aspect, and a combination of both) and seed sources (surrounding juniper density).

2. Tools and steps used in analysis: All steps were completed in ArcGIS 10.6 or ArcGIS Pro.

Surrounding juniper density: Thirty-three belt transects were conducted in the summer of 2018 in the treated WS. The number of juniper for each transect (30m long by 3m wide) was recorded. For the purposes of this analysis, the transects were represented by a point. Supervised classification (support vector machine) was applied to National Agriculture Imagery Program (NAIP) imagery for this area and pixel-based analysis was used to estimate the density of juniper. General steps are as follows:

Buffers were created surrounding each transect point at 50m, 100m, and 250m using the buffer tool.
The zonal histogram was used to extract the pixel values within each buffer. To ensure proper counts, each buffer must be iterated separately if there is overlap between buffers.
A model was created in ModelBuilder to extract values from the buffers and combine values in a table. In-line variables were used to separate the data for each transect into separate tables prior to being merged.The submodel (first image) and full model (second image) are displayed below:

The percentage of juniper pixels within each buffer (number of juniper pixels/total number of pixels) was calculated using excel. This percentage was used as an estimate of surrounding juniper canopy/density within the surrounding areas.
Point plots were created indicating the number of juniper per transect and the corresponding juniper density.
The transects were divided in to two groups representing “low” counts (transects where one or fewer juniper saplings were found) and “high” counts (transects where five or greater juniper were found).

Slope and Aspect: Slope and aspect were considered to be indicators of soil moisture within the watershed. Separate approaches were used to assess if juniper density was found to vary with these characteristics: regression (using ordinary least squares and global weighted regression) and a rating model based on slope and aspect categories. General steps are as follows:

Using a 30m DEM, the slope and aspect values were extracted using the slope and aspect tools within ArcMap.
Using the calculate field function, aspect and slope were each ranked from 1-9, with areas were greater soil moisture is expected (flat and shallow slopes and northern slopes were weighted heaviest). The following parser was used (aspect values are in degrees and slope is in percent):

def Reclass(Aspect_category):

if (Aspect_category<0 and Aspect_category>=-1):

return 9

if (Aspect_category>=337.5 and Aspect_category<360):

return 8

if (Aspect_category>=0 and Aspect_category<22.5):

return 8

if (Aspect_category>=22.5 and Aspect_category<67.5):

return 7

if (Aspect_category>=292.5 and Aspect_category<337.5):

return 6

if (Aspect_category>=67.5 and Aspect_category<112.5):

return 5

if (Aspect_category>=247.5 and Aspect_category<292.5):

return 4

if (Aspect_category>=112.5 and Aspect_category<157.5):

return 3

if (Aspect_category>=202.5 and Aspect_category<247.5):

return 2

if (Aspect_category>=157.5 and Aspect_category<202.5):

return 1

def Reclass(Slope_category):

if(Slope_category>=0 and Slope_category<2):

return 9

if(Slope_category>=2 and Slope_category<4):

return 8

if(Slope_category>=4 and Slope_category<6):

return 7

if(Slope_category>=6 and Slope_category<8):

return 6

if(Slope_category>=8 and Slope_category<10):

return 5

if(Slope_category>=10 and Slope_category<12):

return 4

if(Slope_category>=12 and Slope_category<14):

return 3

if(Slope_category>=14 and Slope_category<16):

return 2

if(Slope_category>=16):

return 1

Based on the categorized values for aspect and slope, a rating model was calculated using the raster calculator to determine if areas of expected greatest soil moisture corresponded to juniper density. The formula for this rating model was: (slope category +aspect category)/2.
As the rating model had a lower spatial resolution and extent than the classified NAIP raster, the classified raster was resampled to be the same resolution as the rating model. The NAIP raster was masked using the extent of the rating model raster. These rasters will be used for further analysis in exercise 3. For the purposes of exercise 2, two maps were created for visual analysis. For future analysis, both rasters were projected to NAD 1983 UTM Zone 10N.
The hot spot analysis from exercise one was overlaid on the rating model to visually assess the patterns over a small scale.
An ordinary least squares (OLS) analysis was performed using the number of juniper counted for each transect as the dependent variable and the slope category, aspect category, and combination of both categories as the explanatory variables.
The standard residual for each OLS was used to calculate Global Moran’s I.
Geographically Weighted Regression (GWR) was used with the same inputs described for the OLS.

3. Overview of results.

Buffer. No clear trend was found between juniper density and the number of juniper sapling found in each transect. However, for all transects the juniper density was lower with increasing buffer distance. Results are displayed in the chart below, points are labeled by distance and “high” (corresponding to transects with 5 or more juniper) and “low” (corresponding to transects with 1 or fewer juniper:

Comparison of Rating Model and Classified Raster. The classified NAIP (prior to resampling and reprojection) and rating model raster are displayed below. Red pixels within the classified raster are pixels classified as juniper. For the rating model, lighter shaded areas correspond to those areas where higher soil moisture is expected based on the slope and aspect characteristics. The hot spot analysis provided limited use (as only one hot spot and one cold spot were indicated) but could be useful for additional analysis. However, the cold spot aligned with areas of expected lower soil moisture and the hot spot corresponded to areas with expected higher soil moisture. Further analysis of these two datasets will continue with exercise 3.

Regression analysis. The R-squared values for OLS for each explanatory variable(s)(slope category only, aspect category only, and a combination of slope and aspect) ranged from -0.025 to -0.060, suggesting that these characteristics were not a good predictor of juniper density (at least for the data set used in this case). The coefficient for each explanatory variable(s) ranged from -0.087 to 0.024, also indicating that this model was not a good fit. The Koenker (BP) statistic was insignificant for all cases. The p-values (p>0.86 for all cases) for Global Moran’s I for each combination of explanatory variables indicated random spatial patterns. As expected, results were similar for the GWR analysis with R-squared values ranging from -0.011 to -0.037. The GWR residual squares ranged from 180.8 to 182.3.

4. Discussion/critique of methods. Based on the buffer analysis, no clear patterns were indicated for repulsion/attraction based on the transect numbers and the density analysis. However, the juniper density (both in range and overall percentage) decreased for the 250m buffer regardless of the number of juniper counted per transect. These patterns are most likely related to the fact that juniper was removed from one watershed fifteen year ago and that the adjacent watershed was left untreated. This coupled with the small sample size may not be sufficient to indicate if spatial patterns between juniper density and saplings found for each transect exist. The OLS and GWR analysis indicated that slope and aspect (at least based on the data used here) were not good predictors of juniper density. For exercise 3, regression analysis of the classified raster and the rating model will take place.

The methods used here were relatively easy to apply, and can be accomplished using tools available in ArcGIS Pro. The ModelBuilder process required time to develop and some issues, particularly with the merge function, required that the tables be manually manipulated. However, ModelBuilder still was more efficient than executing the buffer and zonal histogram for each transect separately. Also, the supervised classification can easily be accomplished within ArcGIS. However, the spatial resolution of the NAIP imagery (1m) likely led to reduced accuracy of juniper sapling identification compared to images with higher spatial resolution. This was the highest resolution imagery available for the entirety of the two watersheds but this methodology could easily be applied to other data.

Overall, the methods used here may inform future analysis to be applied to other datasets. For the purpose of these exercises, very few spatial trends are evident but that may be the result of a)the sample size, b)the effects of juniper removal, and c)characteristics of the dataset (e.g., spatial resolution of the raster). In particular, I would be interested to see how these approaches would work in areas where larger-scale removal has taken place. Additionally, the use of rasters with higher spatial resolution (such as UAV-based imagery) would likely improve the classification and increase accuracy of juniper sapling detection.

Ex 2: Relationship between stream cross-sectional change and across-channel slope

For Exercise 1, I investigated autocorrelation in patterns of cross-sectional change across and along stream reaches.

For Exercise 2, I wanted to know whether these patterns of change were related to channel geometry.

Specifically, I wanted to know if erosion or deposition happened more frequently close to cut banks than further away from them.

I was originally going to investigate change in both the along-channel and across-channel directions, but after consulting with my classmates, I decided that I would not be able to draw as many conclusions in the along-channel direction because I don’t have good information about the spacing and orientation of my cross sections.

Methods:

To find cross-sectional change, I paired sequential years and calculated change along each cross section for each pair of years as I did in exercise #1.

I wanted to compare the change to the location of the steepest bank, but I couldn’t figure out how to identify what parts of a cross section counted as a “bank” using a computer. Instead, I looked at the spatial pattern of change in relationship to the point of steepest slope in the across-channel direction.

I used the original cross section profiles to identify the point of steepest across-channel slope. Since the point of steepest slope may move from year to year, I used the steepest slope from the first year in every year pair. I used the loess function in R to lightly smooth each cross-sectional profile before extracting slope in hopes of reducing the effects of some small bed features. This worked well in most cross sections, but in some cross sections, especially those with prominent midchannel features, the point of steepest slope occurred in the middle of the channel.

Once I had identified the steepest point in each of the cross section for each year pair, I calculated how far every other point in the cross section was from the steep point. Then, within each reach, I aggregated the data by distance, rounding to the nearest decimeter, and calculated the mean absolute elevation change (that is, counting both erosion and deposition as positive values). I wanted to see broad patterns overall, so the aggregate combines data for every cross section and every pair of sample years.

I plotted the resulting data from each reach. In the figure below, the colors represent how many horizontal centimeters of reach are aggregated into each point on the line graph. Bigger numbers and more blue colors represent averages from more cross sections and years while smaller numbers and more yellow colors represent distances where fewer cross sections or fewer years had data at that distance from the steep point.

Results:

The figure implies that perhaps a lot of channel change tends to happen very close to the steepest point, but then stabilizes. Far from the point, the average vertical change values become very unsteady, perhaps because fewer data points are integrated into the average.

Critique:

I thought that this was an interesting and fairly straightforward analysis to conduct, but I am not sure how physically meaningful the results are, since the steepest points are not placed in the same location in each cross section. The results figure looks a bit like a channel cross section itself, which I thought was very interesting! I wonder if this is because the averaging falls apart at a distance roughly equal to the average channel width or if there really is more change happening near channel banks on average in these streams.

Ex 2: A stain on the neighborhood… How does management relate to infection for black stain root disease?

Sorry for the bad puns in the title… Could not help myself.

QUESTION

How do spatial patterns of black stain root disease infection probabilities relate to spatial patterns of forest management classes? How do these spatial relationships differ between landscapes where similar management classes are clustered and landscapes where management classes are randomly distributed?

TOOLS AND APPROACH

I used neighborhood analysis to analyze the spatial relationship between hot and cold spots of infection probabilities and the three forest management classes simulated in my model (extensively managed plantations, intensively managed plantations, and passively managed old-growth stands).

I used a combination of ArcMap (to perform the majority of the procedure) and R (mostly for spatial data wrangling and analyzing and plotting results).

DESCRIPTION OF STEPS YOU FOLLOWED TO COMPLETE THE ANALYSIS

1. Compare the distribution of infection probabilities between landscapes and management classes

I performed this step in R to determine whether there was evidence of significant differences in the infection probabilities (a) between the clustered and the random landscapes and (b) between management classes both within and among landscapes.

2. Hotspot analysis

Converted my raster data of infection probabilities to point data (in R, using the “raster” package) and perform a hotspot analysis (hotspot tool in ArcMap) (Fig. 1). For the hotspot analysis, I used inverse squared distance weighting to conservatively include trees within hotspots.

I also created polygon shapefiles for areas identified as hotspots to the 99% confidence level in each of the landscapes and calculated the area of these hotspots for each landscape, management class, and landscape X management class.

Figure 1. Hotspot analysis overlaid on forest management classes for the “clustered” landscape. The same analysis was performed on the “random” landscape.

3. Select point for neighbor analysis

For this step, I chose one point in a hot spot and one point in a cold spot for each of the two landscapes to perform the neighborhood analysis. In the future, I would write a script to repeat this procedure with a random sample of a large number of points, but for this exercise, I just used one point for each.

For the hotspots, I only used points identified in hotspots at the 99% confidence level. For the cold spots, I used a point from the 99% confidence level for the clustered landscape, but my only options for the random landscape were points identified at the 90% confidence level. I visually selected the points for analysis (non-random), but I would not do so for my full analysis.

4. Create concentric ring buffers for the neighborhood analysis

I used the Multiple Ring Buffer tool in ArcMap to generate hollow ring buffers at a series of distances around each hotspot and cold spot point selected for analysis (Fig. 2). I then intersected these buffers with the management class shapefile and calculated the proportion of each buffer ring composed of each management class (by area). I plotted these proportions as a function of distance to complete the neighborhood analysis.

Figure 2. Concentric ring buffers used for the neighborhood analysis, with outer ring distances labeled. Example shown for the “random” landscape for both hot and cold spots.

RESULTS

Non-spatial analysis

Before performing the neighborhood analysis, I wanted to know whether there was any difference in the infection probabilities between the two landscapes and the three management classes. Visual assessment of the box plots (Fig. 3) led me to believe that there were significant differences between the means of the infection probabilities between the landscapes and management classes, which was supported by the results of student t-tests (p << 0.01). There were also significant differences in the infection probabilities between management classes both within and among the two landscapes. Since the outputs I am analyzing are “dummy data” because my model is not fully complete, this does not surprise me and I did not perform further, more rigorous statistical analysis. The higher infection probabilities simply reflect the density of trees in each of these management classes. Extensive management had the highest infection probabilities (highest initial planting density, evenly spaced), followed by old-growth (lower density but clustered trees) and finally intensive management.

Figure 3. Comparisons of infection probability distribution between (a) landscapes, (b) management classes, (c) management classes by landscape; (d) the proportion of each management class covered by an infection hot spot in each of the two landscapes (by area).

Neighborhood analysis

For the clustered landscape, the management class in which the point was located made up the largest proportion of the first several distances analyzed (cold spot: old-growth, hot spot: extensive). This makes sense given that the management classes were spatially clustered in this landscape. At a certain threshold, the proportion of this management class started to decrease, as the other two management classes increased in proportion at a similar rate. For the random landscape, the decrease in the starting management class proportion was relatively rapid with distance, and all three management classes converged at their landscape proportion by 150 meters, with some fluctuation (in both landscapes, each of the three management classes makes up about 1/3 of the total landscape).

Figure 4. Neighborhood analysis showing the proportion of each (hollow) ring buffer accounted for by each management class for hot and cold spot points in each of the two landscapes.

Critique of the method – what was useful, what was not?

Most of the drawbacks of this analysis were due to my process and the nature of my data. Because I only used one hotspot and one cold spot point for each of the two landscapes, it is difficult to say much from this analysis. If I were to automate the analysis and run it on a series of random points drawn from the hot and cold spots, I would get a lot more insight as to the patterns of neighborhood effects if any exist. In addition, the “dummy data” used for analysis come from model runs that only have local disease transmission (between neighbors at a radius of several meters). However, the full model will also incorporate long-distance dispersal by insect vectors (on the scale of kilometers), which will likely be much more interesting and less predictable when neighborhood analysis is performed.

Issues with the method itself is that it is quite time intensive – an issue that could be cleared up with a script written in Python or R to automate this process given a set of input files. If there is a way to do this already built into either of these platforms (or Arc/QGIS), I was not able to find it. Also, there is a lack of clear quantitative interpretation for these plots to separate statistically significant variability in the proportion of each management class at each distance from non-significant variability. A means of doing so would enrich the analysis.

Exercise 2: Using Neighborhood Analysis to Identify Relationships Between Seascape Classes and Rockfish Abundance Hotspots

Background and Question Asked

In Exercise 1, different interpolation methods were used to create a heat map of rockfish abundance based off of a large collection of point data. That blog discussed some of the challenges that arose while attempting to use a time series of point data with many points in close proximity to one another (if not overlapping). The exploring was in many ways successful: it was discovered that the Kriging method provided a more robust representation of the data than Inverse Distance Weighting. However, in the time since that post was published, my interpolation methods have been refined:

Instead of using the entire time series as an input for the interpolation, four individual years were selected to represent the whole dataset (2003, 2007, 2011, 2015). Kriging was then used to create heat maps for each individual year.
Additionally, the union tool was used to remove the land boundaries from the environment so that the interpolation only affected parts of the ocean
The symbology of the abundance point data was synced across all four years being used in the analyses so that they could be easily compared to one another
The symbology of the interpolated heat maps was also modified to be consistent throughout the analyses

For this exercise, I plan to compare my new, interpolated data to an already existing set of data, effectively comparing my two variables. Specifically, I hope to answer the question “Is there a spatial relationship between areas of significantly high and low rockfish abundances and specific seascape classes?”

An example of the most recent Kriging output using the 2007 data.

Name of Tool or Approach Used

I will be using a neighborhood analysis to seek an answer to this question. The neighborhood analysis requires taking areas of interest and examining the environmental conditions around that area from the perspective of another variable. By varying the distance from your original point of interest, a researcher is able to infer about the spatial relationship between the two variables.

Methods

Data Used

“Points of Interest” chosen from plot below
Buffers created around points of interest at 5km, 10km, and 25km radii
YOYRock Kriging Abundance Interpolation for 2007
Seascape NetCDF Raster File for May 5th, 2007

Rockfish abundance plotted against water column depth for trawls from 2007.

The first thing that was needed to complete this analysis was points of interest. I chose to use four points form the year 2007, as the data from this year provided the largest spatial footprint of all of the years of interest. Two of the points represented trawls that found significantly high rockfish abundance, and the other two represent trawls which found no rockfish. All four points vary spatially and physically (latitude, longitude, water column depth, etc). All points were selected from interpolated areas with different modeled outputs. Next, circular buffers were created around each point of interest with 5km, 10km, and 25km radii.

Map showing Points of Interest with circular buffers overlaid on seascape NetCDF file.

In order to use the overlay tool in ArcGIS, two polygon features are needed. In order to convert my NetCDF Raster files into a polygon, I used the Raster to Polygon tool. Once the seascape classes were converted to polygons, the Intersect tool was used to measure the shape area of each seascape class within each buffer. Those statistics were then converted to .xlsx files and summarized in Excel.

Results and Discussion

Example of data after Raster to Polygon and Intersect tools used

The neighborhood analysis found evidence that specific seascape classes may have impacted young of the year rockfish abundances in the locations selected to be a part of this analysis.

The low-abundance trawls were dominated by three seascape classes: Class 14 (Temperate Upwelling Blooms), Class 19 (Subpolar Shelves), and Class 21 (Warm, Blooms, high Nutrients). While there were more classes represented overall by the high abundance trawls, those areas were mostly dominated by two seascape classes: Class 7 (Unnamed) and Class 12 (Subpolar Nutrient). Additionally, there was very little overlap between the two areas – the only seascape class that appeared in both the high abundance radii and the low abundance radii was Class 14. Further analyses would be needed to determine if these trends are representative to the entire region or year, but this neighborhood analysis provides results that give us a place to start. Overall, I found this analysis to be extremely useful despite the number of steps needed to make it work. In addition to working in GIS normally, the data type of my seascapes had to be changed and much of my analysis had to be done in Excel, as ArcGIS cannot summarize key statistics. However, I feel as though streamlining this method could be done now that I am familiar with it.

Exercise 2: Cross Variogram and Kriging

In Volcanic Regions fluid flow paths are limited to the rubbley bases and flow tops of lava flows where permeability promotes transitivity.

Hypothesis:

The depths to first water will corresponds to depths located near lava flows.

Definitions:

Contact: location on the surface, or at depth where two different rock types touch.

Depth to first water: the depth from a particular well log where water was first noted, this is not always listed.

Depth to first lava: the depth at which the first lava is noted in a particular well log. There can be multiple contacts to a lava in a well log, which is why I specified first.

Figure 1: Block diagram of what well log depicts. The green and red planes represent contacts between the two rock types. Well (grey) and well logs record these contacts as depths from the surface (brown).

Question:

Does the depth to first water correspond to the depth to first lava in my data set?

Tool:

Cross Variogram and Kriging

Like the variogram, the cross variogram is a tool that allows you to compare spatial data at multiple scales. Unlike the variogram, the cross variogram compares one data set to another data set at multiple scales.

Kriging uses the variogram to interpolate a surface.

Brief Description:

In order to use both the variogram and the cross variogram you must normalize the data you are working with. Otherwise the semivariance values can range from 0 to infinity. Normalizing the data allows you to distinguish data that are correlated (semivariance<1) from data that have no correlation with eachother (semivariance>1).

In order to use the R function gstat, I had to turn the data into a spatial data frame.

The function gstat allows you to simultaneously create variograms for each of the induvidal data sets you are working with and compares them with each other. In this case I just compared the depth to first lava with the depth to first water.

I used Arc GIS kriging formula (ordinary) to krige a surface that represented the difference between between the depth to first lava and the depth to first water. In other words I subtracted the depth to first lava from the depth to first water and kriged that “surface”. I wanted to see if there was a spatial distribution around which those difference were low. I tried to use R to krige, but did not have the time to work out the krige function.

Results:

My results were strange, though ultimately unsurprising.

Figure 2: Variograms of my two variables water1 and lava1 as well as the cross variogram that comes from comparing the two. Note that the water1lava1 cross variogram has negative values for the semivariance.

Figure 3: Plot of well log data with the difference between first lava and first water plotted on top of its kriged surface.

The strangest thing to resolve from this exercise are the negative semivariance values for the cross variogram. Semivariance is a squared values and therefore should not have negative values. I have no idea what is happening here. I need to ruminate upon it. Either way, the data does not appear to be well correlates, or at the very least, I am not comfortable making conclusions about it with the negative semivariance values.

The ordinary Kriged surface interpolated from the difference between lava and water lets me know that the highest Kriged surface (Fig 3, white) between water and contact lies in the middle of the study area . In geographic and geologic space this corresponds to a basin filled with sediment and inter-fingered with lava. Many of the well logs in the region are not deep, they don’t have to be because the water here is near the surface, and close to some of these buried basalt flows. The data at the far edges of the map are spatial outlies, and thus we can’t look at any of the map that lies far from the main cluster of data points.

From physically looking at the well logs I know that while the well logs do often correspond to a lava flow, it is almost never the first lava flow. I am not surprised that the semivariance indicates that the data are not correlated.

Critique of the method:

what was useful, what was not?

It was not particularly useful, because it told me what I already know and left me with more questions than answers. However, I did walk away with some considerations. The cross variogram (and variogram) might work at a smaller scale.

In other words, if I broke my field area up in regions where I think the lava layers might source from the same place (Lassen Peak or Medicine Lake Volcano) I would be able to make the assumption that lava layers that are at similar depths in the well log correspond to the same lava flow in space. If we consider figure 1, in a small area we would be able to link the green layer to other locals where the green layers lies at depth, and would be able to spatially autocorrelate them with the variogram.

The next step, one I narrowed down my area, would be to correlate the depth to water with the depth to every lava flow I found in the well log. This would allow me to see which lava layer best corresponds to the depth to first water.

One of the things I discussed with my partner was trying to figure out what the negative values meant in my variogram. As I stated above, I still need to think about this, or figure out what I did wrong. I also discussed taking data out of lat-long space and into UTM space; that is something I am also still thinking about.

One final note: At the moment my data is both clustered around certain spots, and I do not have much of it. Every time I add a few data points, the shape of the variogram changes. Some of the spikiness I am seeing is likely from that.

Exercise 2: Neighborhood analysis of Texas counties with high colorectal cancer mortality rates

Question being asked

How do rurality indicator variables shift as distance increases away from Texas counties with high colorectal cancer (CRC) mortality?

In exercise 1, I used principal component analysis (PCA) to create a PCA-weighted rural index of the state of Texas using 3 scaled variables: population density, land development percentage, and median income. In this exercise I applied these same variables to determine how they change as distance increases away from the 4 Texas counties with the highest CRC mortality rates. To do this, I created multi-ring buffers around Anderson, Gonzales, Howard, and Newton county and computed averages of each rural indicator variable for each successive buffer “donut.” I hypothesize that as distance increases away from high CRC county centroids, rurality indicator measures will have more “urban” values (i.e. higher population density, higher percent developed, higher median income) and CRC mortality rates will decrease.

Tools and Data Sources Used

For this exercise, I utilized the intersection, feature-to-point, and multi-ring buffer tools in ArcGIS along with the latticeExtra/gridExtra plotting packages in R. The rural indicator data used in this analysis are from the same sources I used in Exercise 1: Texas county median household income (2010 Census Data), Texas county population density (2010 Census data), and rasterized discrete data of land use for the state of Texas (2011 National Land Cover Database). These data were then scaled using the same procedure from Exercise 1, which can be found here. Aggregated CRC mortality rates for Texas counties were obtained for the years 2011-2015 from the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program U.S. Population Data – 1969-2015.

Methods

Attribute Table Wrangling: The Texas county indicator variables were linked to county polygons in my Exercise 1, but cancer mortality data was not. For polygon linkage in this exercise, I imported the mortality data excel sheet into Arc and used the join procedure to insert the data into the existing attribute table (with indicator variables) for county polygons.

Centroid & Multi-ring Buffer Creation: First, I utilized the point-to-feature tool in Arc to create a layer of county centroid points from the county polygon layer. Once the county polygons had been converted to centroids, I identified the 4 Texas counties with the highest CRC mortality rates. Then, using the select features tool and multi-ring buffer procedure, I selected each of the 4 counties separately and created multi-ring buffers at 50, 75, and 100 Km. These distances were chosen based on the size of the selected counties and the size of the full state of Texas.

Intersection & Donut Summary Statistics: Once the multi-ring buffer layers were created, I intersected each of the 4 buffer layers with the original county polygon layer containing all relevant variables. Then, mean via the summary statistics tool were computed in Arc for population density, percent developed land, and median income for each successive donut in the multi-ring buffers. The computed tables of buffer donut means for each variable and county were then exported to Excel files.

R Plotting: The Excel files were then imported into R and line plots of buffer means by distance were created using the xyplot function within the latticeExtra package. Plots were then combined into figures by county using the gridExtra package.

Results

This figure of scaled population density means for county multi-ring buffer donuts indicates varying trends for population density between the 4 high CRC counties. Two counties have increasing population density as distance increases away from centroids, and two have decreasing population density as distance increases away from centroid. Only the buffer map for population density was presented above for post conciseness and space limitations on this blog site. More specific neighborhood relationships between CRC death rates and all indicator variables can be seen in the line plots and explanations below.

Line Plots of County Indicator Variables Over Buffer Distances

The above plots display with more specificity than the buffer map that areas surrounding the 4 counties have differences in indicator variable trends as distance increases away from county centroids. Both Anderson and Newton counties largely follow the trend hypothesized: as distance from the county centroids increases, rural indicator variables have more “urban” values and CRC mortality rate decreases. For Newton county, this trend does not hold for median income, because as CRC mortality decreases away from the county centroid, median income also decreases. For the other other 2 counties, Gonzales and Howard, the hypothesized relationship does not hold, because as distance increases away from county centroids, the rural indicator variables become more “rural” as CRC mortality decreases. This indicates that the associations between CRC mortality and rural indicator variables are complex and that neighborhood analysis does not capture all relationships.

Critique

This sort of neighborhood analysis was effective at determining trends in rural indicator variables in the areas surrounding high CRC mortality counties in Texas. The buffer map produced great broad results, where more generalized trends can be determined. The line plots specifically were highly useful for visualizing more specific changes in indicator variables and CRC mortality rates over distance. These results should be considered in the light of some limitations. First, county level data was used for all variables and the buffer donuts may be too large or too small to capture the true neighborhood relationships in the analysis, as statistical procedures were not utilized to determine the distances. Further, more buffer donuts may have been useful to see more nuanced trends over distance. Secondly, as can be seen in the buffer map, there is a lack of CRC mortality data for many counties in west and southern Texas due to data suppression in order to preserve patient confidentiality. This presents significant bias in result interpretation, especially in Howard county, where many of the counties surrounding it have suppressed CRC mortality data.

My future analysis of this data will likely be a comparative confusion matrix of the PCA-weighted index I created in Exercise 1 and CRC mortality data used in this exercise.

Exercise 2: Spatial Patterns of Perceptions of Natural Resource Governance and Environmental Condition

Questions

What are the spatial patterns of natural resource governance perceptions in relation to environmental condition?

Environmental condition was represented as: (1) Environmental restoration sites, and (2) aggregate ‘environmental effects’ scores by census tract on a scale from (1) low to (10) high that include lead risk from housing, proximity to hazardous waste treatment storage and disposal facilities, proximity to nation priorities list facilities (Superfund Sites), proximity risk management plan facilities, and wastewater discharge.

Sub-questions related to the primary question are:

a1. Does being near a restoration site associate with individual perception? a2. Does being near more restoration sites associate with individual perception?

b1. Do the environmental effects where an individual lives associate with individual perception? b2. Do the environmental effects around an individual associate with individual perception?

Tools and Approaches

Nearest Neighbor analysis in ArcGIS Pro and R studio
Geographically weighted regression in R studio
Neighborhood analysis in ArcGIS Pro

Analysis Steps

Nearest neighbor analysis was used to examine questions a1 and a2. First, a file of restoration sites was loaded into R. The sites were points in the same projection as my participant location points. This analysis required four libraries (as the points were from different files. The libraries were: nlem, rpart, spatstat, and sf. To use the tool I needed, I first had to turn my points into point objects in R (.ppp objects). First I used the convenxhull.xy() function to create the ‘window’ for my point object, then I used that to run the ppp() function. After doing this for both sets of points, I was able to use the nncross() function. This function produced min, max, average, and quartile distances from individuals to the nearest restoration site. I added the ‘distance’ from the nearest neighbor as a variable in a linear regression to determine an association for a1.

To examine a2, I used these distances (1^st quartile, median, and 3^rd quartile) to produce buffers in ArGIS Pro. After creating the buffers, I ran spatial joins between them and the restoration sites. This produced an attribute table that had a join count—the number of restoration sites within the buffer. I exported the three attribute tables from the three buffer distances back to R. In R I ran a linear regression with join count as an independent variable.

To test b1, I preformed geographically weighted regression. I used both ArcGIS Pro and R studio to run this analysis. Initially I used the GWR tool in Arc to run the regression, but wanted the flexibility of changing parameters and an easily readable output that R provides. First I joined individual points to my rank data, a shapfile at census tract level. This gave individuals a rank for environmental effects at their location. In R, I used two different functions to run GWR, gwr(), and gwr.basic(). The gwr() required creating a band using gwr.sel, and the gwr.basic required creating a band using bw.gwr. The difference between these functions is that gwr.basic produces p-values for the betas. I ran gwr on both my entire data set and a subset based on perceptions. The subset was the ‘most agreeable’ and ‘least agreeable’ individuals who I defined as those one standard deviation above and below the mean perception.

I preformed neighborhood analysis to test the final question, b2. First, I created a dataset that was just the upper and lower values of my governance perceptions (one standard deviation above and below the means. I then added buffers to these points at 1 and 5 miles. I then joined the buffers in ArcGIS Pro to the rank values to get an average rank within those buffers. I exported the files for each buffer to R. I R I created a density plot of average rank for the low governance values at each buffer, and for the high governance values at each buffer.

Results

The median distance for individuals from restoration site was 0.037 degrees, 1^st quartile was 0.020 degrees, and 3^rd quartile was 0.057 degrees.

The regression on whether distance correlates with individual perception was insignificant (p = 0.198). This led to the conclusion that distance from the nearest restoration site does not influence perceptions.

For each regression on the number of sites near an individual, all coefficients were negative. This implies that the more sites near an individual, the more disagreeable their perception was. All produced significant results, but the effect size of number of sites near individuals was very minimal (Table 1).

Table 1. Regression results for ‘nearest neighbor’ of individuals to restoration sites.

Buffer Size	b	p-value	Effect Size (r_pb)

Buffer 1 (1^st quartile)	-0.003	0.0181	.003
Buffer 2 (median)	-0.002	0.045	.002
Buffer 3 (3^rd quartile)	-0.001	0.002	.003

For the geographically weighted regression, I am presenting the results from the gwr.basic() model. For this model, I included demographic variables to control for these factors. In the model, rank, life satisfaction, years lived in the Puget Sound, and Race are significant (Table 2). All other variables were insignificant, so I will not discuss their trends. For rank (the main variable of interest), the coefficient was positive. In this case higher rank values are worse environmental effects, so as agreeable perceptions increase, environmental condition decreases. Life satisfaction is a variable of how satisfied individuals are with their life overall, which correlated positively with perception (Table 1). Years lived in the Puget Sound correlated negatively, or perception decreased the longer someone lives there (Table 1). Race, a dummy variable of white or not-white, indicated higher perceptions were held by white individuals.

Overall, the effect size of this model on governance perceptions was small, explaining about 10% of the variance in the data (Table 1).

Table 1. Regression results for environmental effects at individuals’ locations.

Variable¹	b	p-value²

Rank	0.077	0.007**
Life Satisfaction	0.478	<0.001**
Years Lived in the Puget Sound	-0.010	0.002**
Sex	-0.156	0.289
Area	-0.150	0.179
Education	-0.002	0.922
Income	-0.022	0.622
Race	0.006	0.034*
Ideology	0.103	0.533

¹R² = 0.094

²** = significant at the 0.01 level, * = significant at the 0.05 level

The plot of high and low governance values at the two buffers is presented below. The black and red curves represent respondents from the survey that were at least one standard deviation lower than the mean (the mean was neutral). The black curve is average rank with a one mile buffer, and the red curve is average rank at the five mile buffer. The green and blue curves represent respondents from the survey that were at least one standard deviation higher than the mean. The green curve is the average rank at the one mile buffer, and the blue curve is the average rank at the five mile buffer.

What this figure indicates is that there are two peaks in average rank at low environmental effects (~1) and at mid-environmental effects (~4.75). Those with lower perceptions of environmental governance had higher peaks at low environmental effects for each buffer size. Those with higher perceptions of environmental governance had a bimodal distribution with peaks at low and mid-environmental effects. The bimodal nature of these density functions leads me to believe there is some variable moderating governance perception related to environmental effects.

Critique

The methods I used were all helpful in determining the spatial relation of environmental condition to perceptions of natural resource governance. However, switching between the two programs (ArcGIS and R) was a little bit of a hassle. R has greater flexibility when running analyses, as in running is easier. This meant it was ideal for the analyses that required me to tweak my models multiple times. I also ran into little problems with Arc in terms of loading the data which made everything run slower, but Arc is more intuitive in terms of finding and executing analyses/

Does the spatial arrangement of vegetation cover influence ventenata invasion?

Question Asked

To predict the invasion potential of a species, it is necessary to understand the spatial pattern of the invasion in relation to landscape scale variables. For exercise 2, I explored how the spatial pattern of invasion by the recently introduced annual grass, Ventenata dubia (ventenata) relates to the spatial pattern of vegetation cover categories throughout the Blue Mountain Ecoregion of eastern Oregon.

Tools and Approaches Used

To unpack this question, I performed a neighborhood analysis to explore how the proportion of different vegetation type cover differ at increasing distances from plots with high versus low ventenata cover.

The neighborhood analysis required several steps performed in ArcGIS and in R:

I split my sample plot layer in ArcGIS into two layers – one containing plots with only high ventenata cover (>50%) and one containing plots with only low cover (<5%).
I buffered each plot by 10m, 50m, 100m, 200m, 400m, and 800m using the “buffer” tool in ArcGIS and then erased each buffer layer by the preceding buffer layer to create “donuts” surrounding each sample points using the “erase” tool in ArcGIS (Fig. 1).
I brought in a vegetation cover categories raster file (Simpson 2013) that overlaps with my study area and used the “tabulate area” tool in ArcGIS to calculate the total cover of each vegetation type (meadow, shrub steppe, juniper, ponderosa pine, Douglas-fir, grand-fir, hardwood forest/ riparian, and subalpine parkland) that fell within each buffer for every point. I repeated this for high and low ventenata points.
Finally, I consolidated the tables in R and created a line graph with the ggplot2 package to plot how the proportion of vegetation type differed by buffer distance from point (Fig. 2). Cover represents percent cover of each vegetation type at each buffer distance. Error bars at each distance represent standard error. VEDU refers to the plant code for Ventenata dubia (ventenata).

I was also curious to how the high and low points differed from random points in the same area. To explore this I:

Created 110 random points that followed the same selection criteria of 1000m proximity to fire perimeter used to select the ventenata sampling points.
Repeated steps 2 through 4 above to graphically represent how vegetation cover differs as a function of distance from these random points in relation to low and high ventenata points (Fig. 3).

Results & Discussion

My analysis revealed that vegetation type differs between high and low ventenata sites and random sites within the study area. The high ventenata plots were located entirely in ponderosa pine and shrub steppe vegetation types, but as distance increased from the plots, the distribution of about half of the vegetation types became more evenly distributed (Fig. 2). Ponderosa covers over 75% of the high ventenata 10m buffer areas with shrub steppe making up the remaining 25%. However, as distance increased, ponderosa cover dropped sharply to under 35% at 400m. Shrub steppe gradually declined throughout the 800m distance, and was surpassed by grand fir and Douglas fir by 800m. Meadows covered about 10% of the 50m buffer but declined to about 5% by the 400m buffer. The remaining vegetation types, juniper, riparian, and subalpine fir, were consistently under 5% cover throughout the buffer analysis.

In the low ventenata sites, shrub steppe vegetation was the most dominant, but the distribution was spread more evenly across the vegetation types than in the high ventenata sites (Fig. 2). Shrub steppe vegetation droped from 45% to 30% from the 10m to the 50m buffer, and then remained relatively constant throughout the remaining buffer distances. Like the high ventenata sites, grand-fir gradually increased in cover throughout, becoming the most dominant vegetation type of the 800m buffer. Unlike the high sites, ponderosa pine made up only about 10% of each buffer. Riparian vegetation was the only cover type that remained 0 throughout all the buffers.

In the random sites, the distributions of vegetation type were steady throughout the 800m, with only small fluctuations in cover with increasing distances (Fig. 3). Shrub steppe vegetation type was the highest at about 30% throughout, followed by juniper, ponderosa pine, and grand fir at about 20% cover.

This analysis demonstrates that ventenata could be dependent on specific vegetation types not only at the sample location, but also in the vicinity surrounding the sample area. This is evident in the high ventenata analysis where ponderosa pine cover remains much higher than the low sites and the random sites throughout the 800m buffered area. This analysis also depicts my sample bias as it demonstrates which community types I was targeting for sampling (shrub steppe and dry forest communities), which may not be representative of the area as a whole (as demonstrated in the random points analysis).

Critique of Method

The neighborhood analysis was a useful way of visualizing how vegetation type changes with distance from high and low ventenata points and may have helped uncover the importance of large areas of ponderosa pine as a driver of invasion; however, the results of the analysis could be a relic of my sampling bias towards shrub steppe and dry forest communities rather than an absolute reflection of community drivers of ventenata. The vegetation layer that I used was also not as accurate or as detailed as I would have liked to capture the nuance of the different shrub steppe and forest community types that I was attempting to differentiate in my sampling. If I were to do this again, I would try to find and use a more accurate potential vegetation layer with details on specific community attributes. Additionally, the inclusion of error bars was not possible using the “multiple ring buffer” tool in ArcGIS, so, I instead had to make each buffer distance as a separate layer and erase each individually to maintain the variation in the data. I like the idea of the random points as a sort of randomization test; however, more randomizations would make this a more robust test. With more time and more knowledge of coding in ArcGIS/ python, I would attempt a more robust randomization test.

Simpson, M. 2013. Developer of the forest vegetation zone map. Ecologist, Central Oregon Area Ecology and Forest Health Program. USDA Forest Service, Pacific Northwest Region, Bend, Oregon, USA

On the relationship between spatial variation of economic losses and tsunami momentum flux

Question

For Exercise 1, I evaluated the spatial pattern of economic losses resulting from a joint earthquake and tsunami event. I then deaggregated the losses by hazard (earthquake only, tsunami only, and combined) as well the intensity of the event.

For Exercise 2, I evaluated how the spatial pattern of economic losses resulting from a tsunami relates to the spatial pattern of tsunami momentum flux (a measure of velocity and inundation depth) by performing a geographically weighted regression (GWR). For this analysis, I only considered the tsunami because there is significant spatial variation in the hazard, whereas the spatial variation for the earthquake is minimal.

Tool and approach

I performed the GWR using the python library PySAL (Python Spatial Analysis Library). The independent variable was defined as the momentum flux, and the dependent variable defined as the percent loss of economic value.

Description of steps

The average losses at each building resulting from an earthquake/tsunami loss model were first converted to percent loss (loss divided by real market value), and added as an attribute to a GIS shapefile. The percent loss was used as opposed to the economic losses because each building has a different initial value. Consequently, the percent loss serves to normalize the economic losses across all buildings within Seaside. For this analysis, the results from the “1000-year” tsunami event were analyzed.

The GWR was then performed using PySAL with the momentum flux defined as the independent variable and the percent loss defined as the dependent variable. The GWR resulted in a slope and intercept at each tax lot, as well as a global r² value. Two separate maps were generated wherein each tax lot was color coded based on values of the slope and intercept.

Description of results

The results from the GWR and a global regression are shown in Figures 1 and 2 respectively. A global r-squared value of 0.575 was obtained, indicating that the data is moderately correlated. In Figure 1, it can be seen that the intercept is larger near to the ocean, and decreases as the distance to the shore increases. This can be explained by the fact that the momentum flux is the largest near to the coast, and decreases as the tsunami propagates over the land.

Similar trends would be expected for the slope coefficients; however, it can be seen that along the coast the results are negative indicating that the economic losses decrease as the momentum flux increases. This can likely be explained by inconsistent building types within Seaside. For example, concrete buildings are able to better withstand the impact of a tsunami compared to their wood counterparts. Similarly, buildings of different heights (number of floors) have different damage properties. Consequently, because the building types are not consistent within Seaside, significant variations in the percent of loss within a small spatial region can occur (e.g. a wood building is located next to a concrete building). This would lead to a decrease in percent loss for a larger momentum flux.

Figure 1: Spatial variation of slope and intercept resulting from the GWR

Figure 2: Global regression and line of best fit

Critique

While the GWR does provide a means to evaluate correlation between two variables that are within the same geographical region, there are limitations for this particular application. The results showed negative slopes in some locations, which is likely caused by the large variation in the percent loss. To alleviate this, alternative statistical models could be developed using GWR that only consider similar building types. An example of a non-spatial regression for wood buildings with 2 and 3 floors can be seen in Figure 3. The improvement in r-squared values can be observed, and would likely translate to the GWR.

Figure 3: Example of global regression considering specific building types

Fire Refugia’s Effects on Clustering of Infected and Uninfected Western Hemlock Trees

Overview

For Exercise 1, I wanted to know about the spatial pattern of western hemlock trees infected with western hemlock dwarf mistletoe. I used a hotspot analysis to determine where clusters of infected and uninfected trees were in my 2.2 ha study area (Map 1). I discovered a hot spot and a cold spot, indicating two clusters, one of high values (infected) and one of low values (uninfected).

In my study site, 2 fires burned. Once in 1829, burning most of the stand, and then again in 1892, burning everywhere except the fire refugia (polygons filled in blue). This created a multi-storied forest with remnant trees located in the fire refugias. One component of the remnant forest are infected western hemlocks. These remnant hemlocks serve as the source of inoculum for the hemlocks regenerating after the 1892 fire.

For Exercise 2, my research question was: How does the spatial pattern of fire refugia affect the spatial pattern of western hemlock dwarf mistletoe?

I predicted that a cluster of infected western hemlocks are more likely to be next to a fire refugia than a cluster of uninfected trees. In order to assess this relationship, I used the geographically weighted regression tool in ArcMap.

Geographically Weighted Regression

Geographically weight regression (GWR) works by creating a local regression equation for each feature in a data set you want to analyze, using an explanatory variable(s) to predict values for the response variable, using the least squares method. The Ordinary Least Squares (OLS) tool differs from GWR because OLS creates a global regression model (one model for all features) whereas GWR creates local models (one model per feature) to account for the spatial relationship of the features to each other. Because the method of least squares is still used, assumptions should still be met for statistically rigorous testing. The output of the GWR tool is a feature class of the same type as the input, with a variety of attributes for each feature. These attributes summarize the ability of the local regression model to predict the actual observed value at that feature’s location. If you have an explanatory variable that explains a significant amount of the variation of the response variable, this is useful for seeing how its coefficient varies spatially.

Execution of GWR

To use this tool, I quantified the relationship between the trees and the fire refugia. I used the “Near” tool for this to calculate the nearest distance to a fire refugia polygon’s edge. This was my explanatory variable. My response variable was the z-score that was output for each tree from the Optimized Hot Spot Analysis. Then I ran the GWR tool. I then used the Moran’s I tool to check for spatial autocorrelation of the residuals. This is to check the clustering of residuals. Clustering indicates I may have left out a key explanatory variable. The figure below displays my process.

I tested the relationship between nearest distance to a fire refugia polygon’s edge and the z-score that was output for each tree from the Optimized Hot Spot Analysis using OLS, which is necessary to develop a well specified model. My R² value for this global model was 0.005, which is incredibly small. Normally I would have stopped here and sought out other variables to explain this pattern, but for this exercise I continued the process.

Results

This GWR produced a high global R²value of 0.98 (Adj R² 0.98) indicating that distance to refugia does a good job of explaining variance in the spatial pattern of infected and uninfected trees. However, examining the other metrics for the local model performance gives a different picture of model performance.

Map 2 displays results for the coefficients for the explanatory variable of distance to nearest refugia. As this variable changes, the z-score increases or decreases. These changes in z-scores indicate a clustering of high or low values. From examining the range of coefficient values, the range is quite small, -0.513 to 0.953. This means that across my study site, the coefficient only changes slightly from positive to negative. In the north western corner, we see a cluster of positive coefficient values. Here, as distance to refugia increases, the z-score of trees increases, predicting a clustering of infected trees. These values are associated with high local R² values (Map 4). In other places of the stand we see slight clustering of negative coefficients, indicating distance to refugia decreases the z-score of trees, predicting a clustering of uninfected trees.

Map 3 displays the standardized residuals for each tree. Blue values indicate where the local model over-predicted what the actual observed value was, and red values are under-predictions. When residuals from the local regression models are distributed randomly (i.e. not clustered or dispersed) over the study area, then the geographically weighted regression model is fit well, or well specified. The residuals of the local regression models were significantly clustered. (Moran’s Index of 0.265, p-value of 0.000, z-score of 24.344). Because we can observe clustering in my study area of residuals, there is another phenomenon driving the changes in z-scores; in other words, driving the clustering of infected and uninfected trees.

From the previous two map evaluations I saw that the distance of a tree to fire refugia was not the only explanatory variable necessary to explain why infected and uninfected trees clustered. Map 4 displays the local R² values for each feature. The areas in red are high local R²values. We see the northwestern corner has a large number of large values which correspond to a cluster of small residuals and positive coefficients. Here, distance to fire refugia explains the clustering of infected trees well. The reverse is observed in several other places (clusters of blue) where distance to fire refugia does not explain why infected or uninfected trees cluster. In fact the majority of observations had a local R² of 0.4 or less. From this evaluation, I believe this GWR model using distance to refugia does a good job of explaining the clustering of infected trees, but not much else.

Critique

GWR is useful for determining how the coefficient of an explanatory variable can change across an area. One feature in a specified area may have a slightly different coefficient from another feature, indicating these two features are experiencing different conditions in space. This allows the user to make decisions about where the explanatory has the most positive or negative impact. This result is not something you can derive from a simple OLS global model. This local regression process is something you could do manually but the tool in ArcMap makes this process easy. The output of GWR is also easy to interpret visually.

Some drawbacks are that you need to run the OLS model first for your data to determine which variables are significant in determining your response variable. If not, then a poorly specified model can lead to inappropriate conclusions about the explanatory variable (i.e. high R² values). Also, the evaluation of how the features interact in space is not totally clear. The features are evaluated within a fixed distance or number of neighbors, but there is no description for how weights are applied to each neighboring feature. Lastly, for incidence data, this tool is much harder to use if you want to determine what is driving the spatial pattern of your incidence data. Some other continuous metric (in my case a z-score) must be used as the response variable, making results harder to interpret.

Model Results Follow-Up

After finding that distance to a refugia was not a significant driver for the majority of trees, I examined my data for other spatial relationships. After a hotspot analysis on solely the infected trees, I found that the dispersal of infected trees slightly lined up with the fire refugia drawn on the map (Map 5).

Among other measures, forest structure was used to determine where fire refugia were located. Old forest structure is typically more diverse vertically and less clustered spatially. Also infected western hemlocks are good indicators of fire refugia boundaries because as a fire sensitive tree species, they would not survive most fire damage and the presence of dwarf mistletoe indicates they have been present on the landscape for a while. From the map we can see that the dispersal of infected trees only lines up with the refugia in a few places. This mis-drawing of fire refguia bounds may be a potential explanation for under-performance of the GWR model.

Courtney’s EX2: Comparing faults and principal components

Question asked in this exercise:

How does ion principal component 2 at a well vary with the well’s distance from faults along the groundwater flow path?

In EX1, I used principal component analysis to evaluate how parameters accounted for variance between the wells I studied. Based on my knowledge of how chemistry varies as water flows through the basalt, ion PC2 accounts for variance caused by ion exchange between the basalt and the groundwater, with increasing sodium ion concentration/pH and decreasing calcium/magnesium ion concentrations as the water spends more time underground.

In this exercise, I estimated the groundwater flow directions in my study area using interpolation, calculated fault incidence direction, calculated angular difference between flow direction and fault direction, and then manual measurement of the distance between each well and the distance to the nearest fault segment that had flow direction within 45 degrees of parallel to the fault along the estimated flow path.

Name of tool or approach used:

Interpolation, reclassification, raster math, distance measurement in ArcGIS Pro

Methods:

Input data:

2018 static water levels in wells provided by Oregon Water Resource Department (OWRD)
Well lithology from OWRD groundwater information system (GWIS)
Well seal depth from well logs accessed through OWRD GWIS
Fault polyline shapefile from Madin and Geitgey 2017
Well locations from OWRD database, with ion concentration information based on my sampling in the summer of 2018

Steps:

Classified wells in the static water level dataset by the basalt formation that they were open to, based on lithology and seal depth. Excluded wells that lacked this information.
1. Output: wells classified as open to Saddle Mountain Basalt (Smb) Wanapum Basalt (Wb), Grande Ronde Basalt (Grb), or both Wanapum and Grande Ronde Basalt (WbGrb). These classifications were joined to the static water level information.
Created an interpolation surface for static water levels of wells open to both the Wanapum and Grande Ronde. I ignored the Saddle Mountain formation, since the wells that I had sampled were not open to it. I used kriging with a cell size of 200 ft, and this created an estimated potentiometric surface for these wells. The interpolation was a bit ugly because my wells were not ideally distributed for this.
1. I tried creating interpolations based on other combinations of formations to better approximate the potentiometric surfaces posited by past studies in the region. I ended up creating two potentiometric surfaces: one using wells that were only open to the Wanapum Basalt (Wb), and a second using wells that were open only the Grande Ronde Basalt as well as those that were open to both the Grande Ronde Basalt and Wanapum Basalt(WbGrb_Grbonly).
Calculated flow direction in the two aquifer groups – Wb and WbGrb_Grbonly
1. Used the Hydrology toolbox to fill sinks and then calculate flow direction.
2. This creates out a raster with eight possible values between 1 and 256.
3. I then reclassified it so the values corresponded to the eight primary cardinal directions (N, NE, E, SE, S, SW, W, NW), which range from 0 to 360 degrees
Calculated fault incidence angle
1. Added a cell to the polyline attribute table called “angle”
2. Split the polyline at each vertex, which creates a new shapefile
3. Used the field calculator and a python script to assign an angular value between 0 and 359 to each fault polyline segment
4. Performed the polyline to raster function, using the angle as the cell value. I used a cell size of 200 ft.
5. This created a raster where only pixels that include part of a fault polyline had values, and those values ranged between 0 and 359.
Subtracted the flow direction raster from the fault incidence angle raster, in order to create fault line pixels that had values that reflected the difference between the fault incidence angle and flow direction. I did this twice, once each for the Wb and WbGrb_Grbonly rasters
1. Reclassified this raster so that pixels with fault direction within 45 degrees of parallel to the flow direction were 0, and the pixels with fault direction with 45 degrees of perpendicular to the flow direction were 1.
2. I then added the WB and WbGrb_Grbonly results together, so that pixels with fault direction within 45 degrees of parallel to the flow direction in both were 0, and the pixels with fault direction within 45 degrees of perpendicular in either raster to the flow direction were 1, and pixels with fault direction within 45 degrees of perpendicular in both rasters were 2. I named this allwbgrb_ff.
On a map layout, I added my sampled sites with PCA data, the interpolated potentiometric surface for wells open to the Wanapum and Grande Ronde aquifers, and allwbgrb_ff.
1. I added a field to the sampled sites with PCA data called “dist_from_fault”
2. Using the measure tool, I measure the distance from each well along the path most perpendicular to potentiometric contours to the nearest allwbgrb_ff pixel with a value of 1 or 2.
3. This was subjective because the potentiometric surface is imperfect because of the erratic spacing of wells. In areas where the potentiometric surface had noticeable glitches, I used my own judgement based on topography and literature about groundwater flow direction in the region.
I then graphed the “dist_from_fault” against the ionsPC2 category.

Discussion and Results:

I hypothesized that ion PC2 would increase with decreasing potentiometric surface elevation. An increased score in ionPC2 indicates an elevation sodium concentrations and pH and a decrease in calcium and magnesium caused by a progressive ion exchange reaction between the groundwater and the basalt. Because water flows from higher potentiometric elevations to lower potentiometric elevations, I would expect water samples from lower potentiometric elevations to show chemical evidence of increased interaction with the basalt. If this process of down-gradient groundwater flow were the only process influencing the ion exchange reactions, the well symbols on the map below would become progressively darker as the interpolated well level surface elevation decreased and the wells were further from the up-gradient recharge zone.

However, upon examining a map of ion PC2 values this is not the case – there are anomalously low values of PC2 in the valley, where one would expect to see increased values if groundwater flowed uninterrupted from the up-gradient recharge zones. In this study I introduced the variable of distance from faults in order to test another hypothesis: that faults compartmentalized groundwater flow, blocking lateral flow through the aquifer while promoting vertical permeability and modern recharge into the down-gradient aquifer. I also hypotehsized that if a fault was a barrier, PC2 values up-gradient of the fault would be elevated as the fault trapped water behind it. This would result in more chemically evolved groundwater backed up behind the fault, and less chemically evolved groundwater down-gradient of the fault.

The results of this study tentatively support the conceptual model of fault compartmentalization. In particular, water samples from wells in the valley down-gradient of the fault zones have evidence of less exposure to the basalt than wells further towards the recharge zones. 15 of the 18 wells sampled show a positive correlation between distance from a fault along a flow path and ion PC2 score, especially when graphed points are compared to their up-gradient and down-gradient neighbors (i.e. 57946 being up-gradient from 57235 and down-gradient from 54277).

Because of the width of the raster cells indicating faults and their flow direction in this model, four wells ended up have a distance from faults of 0. This does not seem unreasonable, because examination of exposed fault zones in the area indicated that many are up to a couple hundred meters wide. Additionally, while geological studies of the area indicate that the faults are close to vertical, their exact dip angles are unknown and this introduces a certain amount of uncertainty about their location at depth. Ion PC2 values of wells with dist_from_fault = 0 show an interesting dichotomy of either very high (4167, 4179) or very low (3929, 3962) values. I believe this indicates that wells 4167 and 4179 are up-gradient of faults that are acting as barrier, while 3929 and 3962 are slightly down-gradient or within the fault damage zone itself.

map of fault locations, potentiometric surface, and wells, plus a chart of distance vs ion PC2

Critique of Method:

This is admittedly a crude method of estimating groundwater flow direction. However, a certain amount of estimation is necessary to model a system with relatively few and unevenly spaced measured water level points in a hydrogeologic regime with many individual water-bearing layers of lava interflow zones that are unpredictably connected. I wish I had been able to find an automated way to measure well distance from faults along the groundwater flow path, which would have taken away some of the subjectivity of measuring the flow paths by hand the old-fashioned way.

Because of the uneven spacing of wells used for interpolation, the flow direction rasters that I created have glitchy areas and some of these areas of physically improbable values influenced the Fault and GW Flow Direction raster. I reclassified that raster into broad categories to avoid creating a false sense of precision.

Determining suitable habitat for Olympia oysters in Yaquina Bay, OR

Exercise #1

Question that you asked:
My goal for my thesis work is to evaluate the distribution of native Olympia oysters in Yaquina Bay, Oregon by assessing habitat suitability through spatial analysis of three habitat parameters: salinity, substrate availability, and elevation. A map of predicted suitable habitat as a result of the spatial analysis will be compared with field observations of oyster locations within Yaquina Bay. The main research question I am examining in this project is:

How is the spatial pattern of three habitat parameters (salinity, substrate, elevation) [A] related to the spatial pattern of Olympia oysters in the Yaquina estuary [B] over time [C]?

For this blog post, I will be evaluating the [A] portion of this question and the three habitat parameters simultaneously to identify where habitat is least suitable to most suitable. To better understand the spatial pattern of the habitat parameters, I am evaluating a raster layer for each parameter, then combining them to determine where overlap between the layers shows the best environmental conditions for oysters to survive.

Name of the tool or approach that you used:
For this portion of my research analysis, I wanted to be able to make an educated guess about where the best and worst habitat for Olympia oysters would be located within Yaquina Bay by ranking different subcategories within each of the salinity, substrate, and elevation datasets.

To do this, I started by looking through the available literature on the subject and consulting with shellfish biologists to get an idea of what conditions oysters prefer in order to apply a ranking value. The following table is a compilation of that information:

Habitat parameter	Subcategories	Subcategory variable range	Olympia oyster tolerance	Rank value applied
Mean wet-season salinity (psu)	Upper estuary	< 16 psu	somewhat, but not long-term	1
	Upper mid estuary	16.1 – 23 psu	X	4
	Lower mid estuary	23.1 – 27 psu	X	3
	Lower estuary	> 27 psu	somewhat	2

Substrate availability	1.2	Unconsolidated mineral substrate	possible	3
	1.2.1.3.3	Gravelly mud	unlikely	2
	1.2.2.4	Sandy mud	no	1
	2	Biogenic substrate	yes	4
	3	Anthropogenic substrate	yes	4
	3.1	Anthropogenic rock	yes	4
	3.1.2	Anthropogenic rock rubble	unlikely	2
	3.1.3	Anthropogenic rock hash	no	1
	9.9.9.9.9	Unclassified	uncertain	—

Bathymetric depth (compared to MLLW)	1.5 – 2.5m	supratidal	no	1
	1 – 1.5m	supratidal	no	1
	0.5 – 1m	intertidal	maybe	2
	0 – 0.5m	intertidal	yes	3
	-2 – 0m	intertidal	yes	4
	-3 – -2m	subtidal	yes	4
	-4 – -3m	subtidal	yes	4
	-6 – -4m	subtidal	yes	4
	-8 – -6m	subtidal	yes	3
	-12.5 – -8m	subtidal	yes	3

Once I established my own ranking values, I decided to use the ‘weighted overlay’ function, found within the Spatial Analyst toolbox in ArcGIS Pro. Weighted overlay applies a numeric rank to values within the raster inputs on a scale that the ArcGIS user is able to set. For example, on a scale from 1-9 ranking 1 as areas of least fit and 9 as areas of best fit. This can be used to determine the most appropriate site or location for a desired product or phenomenon. I used the ranking value scale 1-4 where 1 indicates the lowest suitability of subcategories for that parameter and 4 indicates the highest suitability.

Brief description of steps you followed to complete the analysis:

To apply the weighted overlay function:

Open the appropriate raster layers for analysis in ArcGIS Pro. Weighted overlay will only work with a raster input, specifically integer raster data. Here, I pulled all three of my habitat parameter raster layers from my geodatabase into the Contents pane and made each one visible in turn as I applied the weighted overlay function.
In the Geoprocessing pane, type ‘weighted overlay’ into the search box. Weighted overlay can also be found in the Spatial Analyst toolbox.
Once in the weighted overlay window within the Geoprocessing pane, determine the appropriate scale or ranking values for the analysis. I used a scale from 1-4, where 1 was low suitability and 4 was high suitability.
Add raster layers for analysis by selecting them from your geodatabase and adding them into the window at the top left. To add more than one raster, click ‘Add raster’ at the bottom of the window.
Select one of the raster inputs and see the subcategories for that raster appear on the upper right. Here, ranking values within the predetermined can be individually applied to the subcategories by clicking from a drop-down list. Do this for each subcategory within each raster input. I ranked each subcategory within each of my habitat rasters according to the ranks listed on the table above.
Determine the weights of each raster input. The weights must add up to 100, but can be manipulated according to the needs of the analysis. A raster input can be given greater or lesser influence if that information is known. For my analysis, I made all three of my habitat raster inputs nearly equal weight (two inputs were assigned a weight of 33, one was weighted 34 to equal 100 total).
Finally, run the tool and assuming no errors, an output raster will appear in the Contents pane and in the map window.

Brief description of results you obtained:

The first three images show each habitat parameter weighted by suitability, with green indicating most suitable and red indicating least suitable.

Salinity —

Bathymetry —

Substrate —

The results of the final weighted overlay show that the oysters are most likely to be in the mid estuary where salinity, bathymetry, and substrate are appropriate.

Critique of the method – what was useful, what was not?:

The weighted overlay was a simple approach to combining all of the raster layers for habitat and creating something spatially meaningful for my research analysis. The areas indicated in green in the resulting map generally reinforce what was found in the literature and predicted by local shellfish biologists. While the weighted overlay tool did generate a useful visual, it is highly dependent on the quality of the raster inputs. In my analysis, the detailed resolution of the bathymetry layer was very helpful, but the substrate layer is a more generalized assessment of sediment types within Yaquina Bay. It doesn’t show the nuances of substrate availability that might be important for finding exactly where an opportunistic species like Olympia oysters might actually have settled. For example, in Coos Bay Olympia oysters have been found attached to shopping carts that have been dumped. The substrate raster is a generalized layer that uses standardized subcategories and does not pinpoint such small features.

Additionally, the salinity layer is an average of wet-season salinity, but it can change dramatically throughout the year. Some in situ measurements from Yaquina Bay this April showed that the surface salinity with the subcategory range of 16-23 psu were actually <10 psu. While it is more reasonable to generalize salinity for the purposes of this analysis, it is important to note that the oysters are exposed to a greater range over time.

This spatial information serves as a prediction of suitable oyster habitat. The next step is to compare this predicted suitability to actual field observations. I’ve recently completed my first round of field surveys and will be analyzing how closely the observations align with the prediction in Exercise #2.

Spatial Pattern of NDVI change from the Center of an Artisanal Gold Mine

For Exercise 2, I wanted to explore how the spatial distribution of land use/land cover (LULC) varied in relation to an artisanal, small-scale gold mine (ASGM). To do this, I took the NDVI change maps which I had generated for Exercise 1 (modified slightly to produce better, more accurate results), as well as my dataset of mapped footprints of known areas of ASGM activity, found the median center of those areas, generated buffers/donuts at 250m apart, from 250m to 2000m, clipped the NDVI change layer to each buffer, and counted the amount of each type of NDVI pixel contained within each buffer. In addition, I performed this same analysis around non-mining villages, to examine how the spatial pattern of NDVI loss changes with distance from the center of non-mining villages. With this data, I could generate a chart showing how the percentage of pixels representing loss or decrease in NDVI changed as you moved further away from the center of mining activity. This examination looked at nine mining villages and nine non-mining villages

In order to perform this analysis, I continued my work using ArcGIS Pro, in addition to Excel for a bit of charting work.

To begin, I imported my NDVI change map, which detailed increases and decreases in NDVI values between imagery taken in April 2018 with Landsat 8, and imagery taken in April 2007 with Landsat 5, representing 11 years of NDVI change. I also imported my shapefile containing polygons which I had digitized over my high-resolution satellite imagery depicting areas of known ASGM activity. With this shapefile, I used ArcGIS’s Median Center tool, which found the median center of the mining areas near the village of Douta (fig. 1). From there, I generated buffers/rings at 250m intervals (e.g. 0-250m, 251-500m, 501-750m, etc.), from 250m to 2000m around this median center. I then used the clip tool to clip the overall NDVI change layer to each buffer, resulting in NDVI change for 0m to 250m from center, NDVI change for 251m to 500m from center, and so on.

Fig 1: Center of the village of Douta with a 2km NDVI buffer; the blue buffer shows the 1750m extent, whose values were subtracted from the 2000m values, to only give values between 1751 and 2000m

Once I had accomplished this, I generated histograms for each buffered NDVI change layer in order to count the amount of pixels contained within each buffer, and assign them to one of two classes: negative values, representing overall NDVI decrease from 2007 to 2018, and positive values, representing overall NDVI increase from 2007 to 2018. I did not account for magnitude of change, as I wanted a general idea of how NDVI was changing from the center. Fig. 2 shows an example of these histograms, specifically for the 500m buffer. The values from the previous buffers, e.g. the one to 250m, were subtracted to only show values from 251 to 500m.

Fig. 2: Douta histogram

I entered all the pixel values for negative change at each distance into Excel, as well as all of the pixel values for positive change, and was able to generate a chart showing how the percentage of the overall pixels in each subsequent buffer from center representing NDVI decrease change over distance (fig. 3)

Fig. 3: Pale blue lines indicate individual mining villages. The dark blue line indicates the average of those villages. Thin red lines indicate non-mining villages. The dark red line indicates the average for those villages.

On the whole, this exercise was useful for illustrating the problem I’m attempting to grapple with. I was frustrated with the lack of Landsat imagery from the late 2000s — I was unable to find any Landsat 5 imagery corrected for surface reflection aside from the year 2007. Additionally, there are problems with comparing this 2007 image to the 2018 image. I found that the rainy season before the 2018 image was taken, in 2017, was wetter than average, while I was unable to determine the rain fall that preceded the image in 2007. As such, it is possible that the 2018 Landsat 8 image shows a non-normal vegetative pattern — or it’s even possible that the 2007 image is showing a non-normal vegetative pattern! I require some more investigation into the historical meteorology of the area before I can say either way. Regardless, I feel that this is a useful first step in investing how LULC change relates to the establishment of ASGM.

What is the spatial pattern of the ground cover vegetation within 100 m, 500 m and 1 km of 20 wetland sites?

Exercise 1: What is the spatial pattern of the ground cover vegetation within 100 m, 500 m and 1 km of 20 wetland sites?

To determine these spatial patterns I first imported the latitudinal and longitudinal locations of my wetland sites as a point layer by adding a csv file with this data from the catalog to my map and choosing the Display X, Y Data option after right clicking on the imported file.
Display X, Y Data: X= longitude
Y= latitude

I then created 3 buffers around each point using the Buffer tool in spatial analyst. The buffers created had a 100m radius, a 500m radius, and a 1km radius.
Buffer: Input vector= wetland site points
Buffer= 100 (then 500, then 1000)
Unit= Meters

I used Clip in raster processing to clip my raster layer to the extent of my largest buffer layer so that the computer did not have to process so much extra raster data moving forward.
Clip: Input= raster layers
Clip to= 1kmBufferLayer

Then I created an NDVI raster layer from Landsat data obtained by the USGS Earth Explorer. I downloaded and then imported data from 1995 and 2019. The LIDAR data from 2019 was collected via the LandSat 8 satellite which collects, among other things, Near Infra-Red (NIR) and red wavelength reflectance. NIR is categorized as Band 5 and Red as Band 4. To create an NDVI layer from this data I had to rescale the Band 5 and Band 4 raster layers because the USGS self-correcting algorithm overcorrected for atmospheric disturbances. To do this I used to Raster Calculator in spatial management. This resulted in 2 new raster layers within the acceptable range (0-1000) for Band 5 and Band 4 of the 2019 LIDAR data. I did not have to rescale the 1995 data because there was no self-correcting algorithm used in that data.
Raster Calculator: Query= [(grid – Min value from grid) * (Max scale value – Min scale
value) / (Max value from grid – Min value from grid)] + Min scale value

I used the Raster Calculator again to create the final NDVI layer for 1995 and one for 2019 using the NIR and red wavelength data layers. The 1995 data was collected via LandSat 4 which categorizes NIR wavelengths as Band 4 and red wavelengths as Band 3.
Raster Calculator: Query= ((“NIR_Layer”-“red_Layer”)/(“NIR_Layer” + “red_Layer”))

2019 NDVI result below

I then used the Raster to Point tool to turn both the 1995 NDVI and 2019 NDVI data into a polygon layer.

Raster to Point: NDVI layers to point

Finally, then I used the Clip in spatial analyst tool to clip the polygon NDVI layers to the buffer layers I created earlier.
Clip: Clip “NDVI layers” to “Buffer layers”

Now I have the land cover type (in the form of a number signifying vegetation density) information for all of my sites within all of my buffer zones.

Ripley’s K analysis on two forested stands

Bryan Begay

Question asked

Can a point pattern analysis help describe the relationship between the spatial pattern of trees in my area of interest with forest aesthetics? More specifically, how does Ripley’s K function describe forest aesthetics in different parts of the forest on the McDonald-Dunn Forest.

Tool

The main tool that I used for the point pattern analysis was the Ripley’s K function.

Steps taken for analysis

My workflow involved doing preprocessing of the LiDAR data, then creating a canopy height model to obtain an individual tree segmentation. The individual tree segmentation would then allow me to extract tree points with coordinates that could be usable points for the Ripley’s K Function.

LiDAR preprocessing

I started off with finding my harvest unit area of interest (Saddleback stand) and finding a nearby riparian stand that would be used to compare the Ripley’s K function outputs. I create polygons to clip the LiDAR point clouds onto. I found the LiDAR files that were over the AOIs and used the ArcMap Create LAS Dataset (Data Management) to make workable files, then clipped the data sets to the polygons using the Extract LAS (3D analyst) tool. Fusion was used to merge the clipped LiDAR tiles to make one continuous data set for both AOIs. Then I normalized the point cloud with FUSION by using a 2008 DEM raster from DOGAMI, and the FUSION tools ASCII2DTM and Clipdata.

CHM, tree segmentation, and Ripley’s K

With the normalized point cloud, a canopy height model (CHM) was created in R-studio, and then an individual tree segmentation was made with an R package called lidR by using a watershed algorithm. The segmented trees were exported as a polygon shapefile that could be used in ArcMap. The Feature to Point tool (Data Management) was used to calculate the centroid of the polygons to identify individual trees as points. The points could then be used in RStudio spatstat package to be used in a Ripley’s K Function. The function was calculated for both Saddleback stand and a nearby riparian area.

Results

The results show that the pattern for both the Saddleback stand and the riparian area were clustered. Both stands observed lines were plotted above the expected line for a random spatial pattern. The lines were significantly different, being above the higher confidence envelope. The riparian stand has higher levels of clustering compared to Saddleback stand. The Saddleback stand showed a plotted clustering pattern as well, but not to the degree of the riparian stand.

Critiques

Some critiques for my analysis would be to use a more robust individual tree segmentation algorithm analysis. For the sake of processing speed and creating delineated polygons with reduced noise, I used a resolution of 1 meter for my CHM. The 1 meter resolution for my CHM smoothed over the tree segmentation, possibly removing potential tree polygons but creating more defined segmented trees. The CHM lower resolution was used with a relatively simple watershed algorithm. Past algorithms I’ve used showed better results than watershed but required more detailed inputs. Another criticism I have is that using the feature to point does not necessarily give me the tree tops, but finds the centers of polygons that the tree segmentation identified as individual trees. Finding a more robust method for determining tree points would be more preferable.

Recession Analysis in Two Coastal Basins

Research Question

The question that I asked for this exercise was: What is the spatial variability of the recession timescale at different flow rates in rain-dominated, coastal basins?

Approach

Stream flow regimes are generally defined by five components: magnitude, frequency, duration, timing, and rate of change (Poff et al., 1997). For the purposes of this exercise, I am investigating the rate of change (or “flashiness”) component of flow regime in two river systems in the Oregon Coast Range: the Nehalem and Alsea River watersheds. I am analyzing recession behavior in these two systems to quantify a rate of change metric. I used the recession curve method, largely developed by Brutsaert and Nieber (1977), and later built upon by Krakauer et al. (2011) among many others. Recession curves describe the rate at which streamflow recedes in various streamflow conditions. In more general terms, recession curves provide an indication of watershed storage and groundwater behavior.

Methods

To complete the recession analysis in the Nehalem and Alsea watersheds, I followed the following steps:

Downloaded streamflow data for the available period of record in 1-hour observation intervals using the dataRetrieval and EGRET (both developed by the USGS) packages in R.
The data were in cubic-feet-per-second (cfs) units, which I converted to unit discharge (mm/hour) using respective basin areas.
For recession analyses, only data points with insignificant precipitation are viable. Therefore, all time steps when precipitation was greater than 10% of total streamflow were removed. To do this, local precipitation data was necessary, however it is often hard to come by. As a result, the precipitation data sources for the Nehalem and the Alsea are different and the methods diverge slightly hereafter.
1. Nehalem precipitation data was sourced from the USGS Vernonia site, which is upstream from the mainstem river gauging site where streamflow data for this analysis was sampled. For the purposed of this exercise, I assumed that rainfall (measured in inched at 30-minute intervals) was spatially consistent across the basin. For a more precise estimate of precipitation, multiple, spatially distributed rain gages would be needed.
2. Alsea precipitation data in hourly timesteps was not readily available. There are a couple NOAA rain gages in the vicinity, but the data appeared to be spotty. As a result, I used daily precipitation data from PRISM to identify days in which total daily precipitation was greater than 10% of total daily streamflow. Next, I used the subset out the identified dates from the hourly streamflow data.
Calculating rate of change: For this component of the analysis, I only wanted data from the receding hydrograph. I used the equation to estimate hourly -. Hourly streamflow (Q) for the corresponding hours was estimated as .
(or ) is the rate at which flow jumps values from one time-step to the next at a given flowrate. Because this is a recession analysis, I only wanted data that was on the receding slope and therefore subset the data to time-steps when was negative.
Lastly, I developed a simple linear regression model for vs Q for each watershed using the following equation:

Results

Alsea

When log-transformed and plotted, the discharge and rate of change of discharge follow a linear relationship, however the relationship is different for data analyzed at different temporal resolutions (Figures 1 a-b). The recession results from the daily timestep are likely muted because recession processed occur on shorter timescales, on the order of minutes to hours.

Figures 1 a-b. On the left, recession curve results for the Alsea River on a daily time step, after removing days with high precipitation. On the right, recession curve results for the river on an hourly time step, after days with high precipitation were removed. Both a and b include only recession data.

Table 1. Table showing different coefficient values for different temporal resolutions of Alsea recession data.

Data	a coefficient	b coefficient
Alsea (daily)	-3.029	1.639
Alsea (hourly)	-5.2346	0.9138

Nehalem

Recession analysis on year-round, hourly Nehalem River data resulted in an a coefficient value of -4.696 and a b coefficient value of 1.049, which are similar values for the Alsea hourly results. However, the Nehalem recession data is showing two linear trends in the plotted data (Figure 2). The two trendlines remained after controlling for both diurnal flux and season (Figure 3).

Figure 2. Recession results from the Nehalem River data. The red line is the calculated linear regression model, however two lines may be more representative of the recession behavior, such as those estimated in blue.

Figure 3. Recession data separated into seasons and only including receding slopes, night time hours, insignificant precipitation. The two trendlines are apparently less distinguished in certain seasons of the year.

Critique of Method

Recession analysis is a method that I will use in my research, and this was a helpful exercise in that it helped me understand the nuances of recession data analysis, including data acquisition and availability, and opportunities for improvement. Moving forward, this analysis would benefit from: more spatially accurate precipitation data, more investigation of the explanations for the observed patterns and trends, and more sophisticated statistical comparison between sites.

References:

PRISM Climate Group, Oregon State University, http://prism.oregonstate.edu, created 4 Feb 2004.

Krakauer, N. Y., & Temimi, M. (2011). Stream recession curves and storage variability in small watersheds. Hydrology and Earth System Sciences, 15(7), 2377–2389. https://doi.org/10.5194/hess-15-2377-2011

Sawaske, S. R., & Freyberg, D. L. (2014). An analysis of trends in baseflow recession and low-flows in rain-dominated coastal streams of the pacific coast. Journal of Hydrology, 519, 599–610. https://doi.org/10.1016/j.jhydrol.2014.07.046

Brutsaert, W., & Nieber, J. L. (1977). Regionalized drought flow hydrographs from a mature glaciated plateau. Water Resources Research, 13(3), 637–643. https://doi.org/10.1029/WR013i003p00637

Poff, N. L., Allan, J. D., Bain, M. B., Karr, J. R., Prestegaard, K. L., Richter, B. D., … Stromberg, J. C. (1997). The Natural Flow Regime. BioScience, 47(11), 769–784. https://doi.org/10.2307/1313099

Leaf spot patch analysis caused by Blackleg

Review
My research involves the classification of a leaf spots on turnip derived from the pathogen blackleg. I had hypothesized that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves. This post focuses on the analysis of the clusters based on image classification through segmentation as well as the manually classified clusters. These clusters of pixels are expected to be representative of the diseased patches on the leaves. Here we seek to obtain some patch statistics which will be compared for relationships and accuracy at a later time. A large portion of this process went into image processing before the analysis could be conducted. All of the image processing took place in ArcGIS Pro. The next step involved the patch analysis which was conducted in Fragstats.

Questions
Some questions I asked myself about element A & B of my hypothesis included:

Can diseased patches be successfully identified and isolated by computer image classification and through manual classification?

If yes; what is the area and perimeter of the diseased patches based on image classification and manually classified diseased patches? I was mostly looking to obtain and gain some experience in the patch analysis provided by Fragstats. Much more thorough analysis can be and will be completed in Fragstats when variable A & B are compared for accuracy assessment in exercise two.

Tools used and methodology
The image processing and classification of pixels was conducted in ArcGIS Pro. This analysis required extracting both manually classified diseased patches and computer classified patches. I began by uploading band 3, which captures light energy in the red wavelength at 670 – 675 nm.

1. For computer classification of the image I used Segmentation.

a) I went to the Imagery tab and the Image Classification group. Click the drop-down arrow and an option for Segmentation should appear. Be sure the layer you desire to perform the segmentation on was selected prior to selecting the Segmentation Classification Tools. In the pane you have options to adjust Spectral detail, Spatial detail and Minimum segment size in pixels. There are variable and depend on image resolution, level of accuracy you wish to achieve, etc. Because my resolution to high I chose for higher spectral detail and spatial detail. I adjusted the spectral detail from 15.50 to 17 and the spatial detail from 15 to 17 as well. For the minimum segment size in pixels I chose 10. As mentioned these values are variable and although the standard values worked well for segmentation, I previewed other values and achieved greater results by adjusting. After previewing other options and deciding what values work best, click the Run button in the bottom right corner of the pane.

b) Masking all the cells that weren’t diseased was the next step and crucial to this diseased region selection process. To do this, go to the Analysis tab and the Raster group to find the Raster Functions. Open the Raster Functions and it should appear in the pane to the right. Search Mask and select it. Open the segmented image in the raster box and three options will be available for Included Ranges. These three options are representative of the blue, green and red bands. Because the data was adjusted from 12-bit to 8-bit in the segmentation process, we should have a range of values between 1 and 255. Despite having three band options, all the bands hold the same value because we are only working with the red band. Click on the segments you wish to include in the working window to check their values. Include a minimum and maximum which should be the same for all 1, 2 and 3 that includes the segments you want. The range I had included was 100 to 160, which is the RGB.Pixel Values that appear when clicking a pixel. Once you have these entered, click create new layer in the bottom of the right pane.

c) Next, I used the Clip function by navigating to the Imagery tab, the Analysis group and selecting Raster Functions. In the window pane to the left a search bar can be found along the top where you can enter Clip. Although there are other methods and locations the clip function can be found, I experienced different results depending on how I navigated to clipping, as well different clipping options in the window pane. In the Parameters section for Clip Properties, select the drop-down arrow for the raster you previously segmented. Leave Clipping Type and Clipping Geometry/Raster as the default. Adjust the active map view by zooming in or out to the region you want preserved after the clip. I use as small of area as I possibly can when clipping, while keeping all required segmented regions. Click the Capture Current Map Extent button found just to the right of the extent options in the pane (Green square map). To clip, click Create new layer at the bottom of the pane and a new map layer with the clipped region should appear in the map contents.

d) From here, go to the Analysis tab and Geoprocessing group where you can find the Tools. Click the Tools to open the options in the left pane. Here we want to use the Raster to Polygon (Conversion Tools) which can be found by searching at the top of the tools window pane. Use the most recent segmented-clipped raster for the Input raster, leave the Field blank and select where you would like the Output Polygon features to save. I left the remaining option as the default and clicked the Run arrow in the bottom right corner. Although we need the final product to be in raster format, this step is important for creating separate polygons which are grouped together as one unit.

e) As mentioned we must get the polygons back into a raster. Before we use the Polygon to Raster tool, polygons which overlap must be merged. This is one of the two reasons why we converted the raster to polygons. When the segmentation was conducted in (step d), diseased regions were not all categorized in the same bin. This resulted in regions which were maintained after masking but are different shades grey. Even one diseased patch may have two to three tones to it, resulting in segments for a patch. Now that everything is a polygon, we can merge these polygons that should be one polygon. Click the Edit tab and select Modify in the Features group. In the Modify Features, scroll down to the Construct group and click Merge. In Existing Features click Change the Selection and while holding shift, select the polygons which belong in one group. They will be highlighted in blue and appear in the pane if properly selected and if so, click Merge. Navigate to the Map tab and Selection group and click Clear. Use the same merge steps to merge any other polygons which belong to the same diseased lesion but were classified separately in raster segmentation.

f) Finally, the polygons can be converted back to a raster for the last step of image processing. Click the Analysis tab and find the Geoprocessing group to Tools once again. Search Polygon to Raster (Conversion Tools) in the Tools pane and select it. Use the polygons layer for Input Features and the window pane options should adjust to accommodate your raster. The only adjustment I made was to the Cellsize which I changed to 1 from 0.16 to maintain the cell size in my original raster. Select Run in the bottom right corner of the pane for the final layer. Each of the polygons should now be converted to a rasterized polygon with a unique ObjectID, Value and Count which can be found by going to the Attribute table for that layer or clicking on a group of pixels. This allows for each diseased patch to be aggregated and analyzed as such in Fragstats.

g) Go to the final raster layer, right click and go to data and export data. What I wanted is a TIFF file for analysis in Fragstats.

2. To compare the accuracy of the segmentation as a classification method for diseased pixels, manual classification was used as a ground truth technique.

a) For manual classification of the original red band, the Training Sample Manager was used. This allows for manual classification which can be used for supervised machine learning techniques of classification. Here, I simply used it to select diseased regions manually, but have an end goal of using the support vector machine learning model.

b) To get to the Training Sample Manager, go to the Imagery tab and Image Classification group and select the Classification Tools where you can click Training Sample Manager. In the Image Classification window pane go to create new scheme. Right click on New Scheme and Add New Class and name it diseased pixels and supply a value which is arbitrary and a description if desired. Click ok and select the diseased pixels scheme and then the Freehand tool in the pane. Draw an outline around the diseased regions in the image. Save these polygons and the classification scheme. Go to Add data in the Layer group on the Map tab to upload the training sample polygons.

c) Once the polygons are uploaded, use the Polygon to Raster protocol which can be found in step 1, part (f) followed by part (g).

Fragstats was used for the analysis of the polygons. For exercise 1 the intent in Fragstats was to simply obtain information about the diseased patches that were extracted from the red band image. This included the pixel number, area, perimeter, perimeter-area ratio and shape index. Many more options were available for patch analyses and will be considered for exercise 2.

1. Open up Fragstats for the patch analysis.

a) To import your first image click Add layer and go down to GeoTIFF grid and select it. In the Dataset name, search for your tiff file you created in ArcGIS Pro and select it and click ok. You have to do this for each of the datasets. Then go to the Analysis parameters and click Patch metrics in the Sampling strategy. For General options hit Browse to find a location for the output to save and click the Automatically save results checkbox.

b) Click on the red box called Patch metrics and in the Area – Edge tab select the Patch Area and Patch Perimeter. Click on the Shape tab and click Perimeter – Area ratio and Shape Index.

c) In the upper left corner, you can hit Run and it will check everything you have uploaded and the analysis you have selected. You must then hit Proceed after it checked the model consistency and it runs the analyses.

d) After this hit the Results options for viewing the output.

Results
The patches aligned pretty well through visualization in ArcGIS Pro when comparing the two layers. The comparison between patches will be done in exercise 2. I did notice that the segmentation process missed many of the smaller diseased patches. Looking at Image 2, you can see that they were segmented from the surrounding regions but were lumped into a bin with that caught areas that appeared lighter in the image because of a reflection. Different thresholds could be used in the segmentation process in order to include those smaller patches. This could maybe be done if they followed a parameter involving shape. This may be considered for future segmentation. A concern is that you set the threshold too low and it includes many regions that aren’t diseased but include all the diseased regions also. This would result in many false positives for disease. The alternative is to set a high threshold value for diseased regions which would result in false negatives. The idea is to find a good balance between the two. Currently the segmentation is strictly based on the segments value which is attributed to the reflectance in the red band. As mentioned another parameter worth considering is shape. Since the diseased regions tend to be leaf spots resulting in a circular area typically, adjustments could be made to include more circular patches that don’t extend past a certain number of pixels. In table 1 we see that number of pixels for patches ranges from 8 to 50. The patch analysis provides some insights into possible spatial factors that explain these segmented regions and how the classification process could be done more accurately.

Table 1. Showing the 11 different patches that were manually selected and analyzed in Fragstats. The color indicates the corresponding patch that was detected through segmentation shown in Table 2.

Table 2. The 4 different patches that were identified through segmentation and analyzed in Fragstats. The color indicates the corresponding patch that was detected through segmentation shown in Table 1.

Critique of the method
The MicaSense red-edge camera has 5 bands which can be very helpful for applying different vegetative indices and compositing bands in order to help bring out the variation in spectral signatures between diseased and un-diseased tissue. Although the pixel size is just under 1 mm, which appears to be adequate for identifying lesions on the leaf, the bands do not have perfect overlap. This is due to the design of the camera which has five different lenses for the five different bands that are all separated by one to two inches. This could be corrected for if the extent for each of the five bands was manually adjusted for a near perfect overlap. Until this is resolved, the red band seems to show the most variation in pixel value for diseased patches in comparison to the other four bands and was used for this analysis.

Additionally, the amount of processing that is done in part 1 step (a) to get the segmented raster patches is has many steps. Methods for speeding up this process need to be considered for future analysis especially when conducting this analysis on the 500 leaves.

As mentioned before, Fragstats has many more statistically capabilities that were not applied in this analysis. Getting more statistics on the manually and segmented patches would be helpful for determining the level of accuracy as well as other parameters worth considering.

Appendix

Image 1. The red band tif file uploaded into ArcGIS Pro for classification. The white patches are what we would expect the diseased regions to look like.

Image 2. The result of the segmentation performed on Image 1 described in step 1 part (a)

Image 3. The patches determined based on segmentation in part 1 from Image 2. Separate patches were created by using the Raster to Polygon tool, followed by the Polygon to Raster tool. Two layers were turned on here for the intent of showing the overlap between the raster patches and the original image.

Image 4. The patches determined based on the manual classification described in part 2 from Image 1. Training sample manager followed by polygon to raster was used to get these patches. Two layers were turned on here for the intent of showing the overlap between the raster patches and the original image.

Spatial Patterns of Salmonella Rates in Oregon

Question Asked

At this stage I asked several questions regarding the spatial distribution of population characteristics in all counties in Oregon in 2014: What are the county level spatial patterns of reported age-adjusted Salmonella rates within Oregon in 2014? County level spatial patterns of proportions of females? Median Age? Proportion of infants/young children aged 0-4 years?

To answer these questions I used several different datasets. The first dataset used is a collection of all reported Salmonella cases in Oregon from 2008-2017 which includes information like sex, age group, county in which the case was reported, and onset of illness. The information in this dataset was deidentified by Oregon Health Authority. The second dataset used was a collection of Oregon population estimates over the same time period. This dataset includes sex and age group specific county level population information. I also obtained county level median ages from AmericanFactFinder. The last dataset used is a shapefile from the Oregon Spatial Data Library containing polygon information of all Oregon counties.

Names of analytical tools/approaches used

I used a direct age adjustment (using the 2014 statewide population as the standard population) to obtain county level age-adjusted Salmonella rates. After calculating county level summary data e.g. proportion of females, proportion of children aged 0-4, median age, and age-adjusted Salmonella rates, I merged this information with a spatial dataframe containing polygonal data of every county in Oregon. After doing this I did both local (between 0-150 km) and global (statewide) spatial autocorrelation to get a Moran’s I statistic for each of the population variables listed above. I produced choropleth maps of each of the variables for Oregon as well. Finally, I produced a heatmap for county-level age-adjusted Salmonella rates using a Getis-Ord Gi* local statistic to evaluate statistically significant clustering of high/low rates of reported Salmonella cases.

Description of the analytical process

After extensive reformatting, I was able to organize cases of Salmonella by age group and by county for the year 2014. After this I formatted 2014 county level population estimates in the same way. I then divided the Salmonella case dataframe by the population estimate dataframe to get rates by the different age groups. To get county age-adjusted rates I created a “standard population”, in this case I used Oregon’s statewide population broken down into the same age groups as above. I then multiplied the each of the county’s age-specific rates by the standard population’s matching age groups to create a dataframe of hypothetical cases. This dataframe represents the number of cases we would expect in each of the counties if they had the same population and age distribution as Oregon as a whole. I summed the expected Salmonella cases by county and divided this number by the 2014 statewide population. This yielded age-adjusted reported Salmonella rates by county.

Given that the population data contained county level populations broken down by age group and by sex I was able to calculate proportions of county populations which were female, and which were young children aged 0-4 years by dividing those respective group populations by the total county population.

After this I performed local and global spatial autocorrelation with Moran’s I using the county level median age, proportion of children, proportion of females, and age adjusted Salmonella rates which were associated with centroid points for each county. The global Moran’s I was calculated using the entire extent of the state and the local Moran’s I was calculated by limiting analysis to locations within 150 km of the centroid. Both global and local Moran’s I statistics were calculated using the Monte-Carlo method with 599 simulations.

Finally, I completed a Hot Spot Analysis using Getis-Ord Gi* to assess for any statistically significant hot or cold spots in Oregon. This was only done for the age-adjusted Salmonella rates. This was completed using the same county centroid points as above. I completed this analysis with a local weights matrix using Queen Adjacency for neighbor connectivity. The weighting scheme was set to where all neighbor weights when added together equaled 1.

Brief description of results you obtained

Choropleth Maps of Oregon:

From the median age map, we can see that there are some clusters of older counties in the northeastern portion of the state and along west coast. Overall, the western portion of Oregon is younger than the eastern portion of the state.

From the proportion of children map there are a few clusters of counties in the northern portion of the state with high proportions of children compared to the rest of the state. Overall, the counties surrounding the Portland metro area have higher proportions of children compared to the rest of the state.

From the proportion of females map, we can see that the counties with the highest proportion of females are clustered in the western portion of the state.

Finally, from the age-adjusted county Salmonella rates map we can see that the highest rates of Salmonella occur mostly in the western portion of the state with a few counties in the northeast having high rates as well. Overall, the counties surrounding Multnomah county have the highest rates of Salmonella.

The global Moran’s I statistics:

County proportions of females: 0.053 with a p-value of 0.15. This suggests insignificant amounts of slight clustering.
County median age: 0.175 with a p-value of 0.02. This provides evidence of some significant mild clustering.
County proportions of children: 0.117 with a p-value of 0.05. This provides evidence of significant mild clustering
County age-adjusted Salmonella rates: -0.007 with a p-value of 0.32. This suggests insignificant amounts of higher dispersal than would be expected.

Local Moran’s I Statistics:

County proportions of females: 0.152 with a p-value of 0.02. This suggests significant amounts of mild clustering.
County median age: 0.110 with a p-value of 0.07. This provides evidence of some insignificant mild clustering.
County proportions of children: 0.052 with a p-value of 0.1617. This provides evidence of insignificant slight clustering
County age-adjusted Salmonella rates: -0.032 with a p-value of 0.5083. This suggests insignificant amounts of higher dispersal than would be expected.

Getis-Ord Gi*:

The heatmap shows a significant hotspot (with 95% confidence) in Clackamas county with another hotspot (with 90% confidence) in Hood River County. Three cold spots (with 90% confidence) are seen in Malheur, Crook, and Morrow counties.

Critique of Methods

The choropleth maps were very useful at showing areas with high/values however this method was not able to detect counties with significantly different values compared their neighbors. Overall, it was useful as an exploratory tool. The global and local Moran’s I calculations were able to detect if high/low values were closely clustered or more dispersed than what is expected. However, I am unsure if this method was completely appropriate given the coarseness of this county level data. At a local scale, only the proportion of women showed a significant amount of clustering, and globally median age and proportion of children showed some amount of significant clustering. Given that most of the Moran’s I statistics were not associated with significant values, I don’t believe this analytical method highlighted a particularly meaningful spatial pattern in my data. The heatmap provided evidence of some significant hot and cold spots in Oregon, however this was based on immediate neighbor weights and perhaps global weights would be more appropriate. Overall, this tool was very useful in detecting significantly high/low Salmonella rates.

Exercise 1: Preparing for Point Pattern Analysis

Exercise 1

The Question in Context

In order to answer my question: are the dolphin sighting data points clustered along the transect surveys or do they have an equal distribution pattern? I need to use point pattern analysis. I am trying visualize where in space dolphins were sighted along the coast of California, specifically from my San Diego sighting area. In this exercise, the variable of interest is dolphin sightings. These are x,y coordinates (point data) indicating the presence of common bottlenose dolphins along a transect. However, these transect data were not recorded and I needed to recreate these lines to my best abilities. This process is more challenging than anticipated, but will prove useful in the short-term view of this class and project and long-term in management ramifications.

The Tools

As part of this exercise, I used ArcMap 10.6, GoogleEarth, qGIS, and Excel. Although I was only intending on importing my Excel data, saved as a .csv file into ArcMap, that was not working, so other tools were necessary. The final goal of this exercise was to complete point-pattern analyses comparing distance along recreated transects to sightings. From there, the sightings would be broken down by year, season, or environmental factor (El Niño versus La Niña years) to look for distributing patterns, specifically if the points were ever clustered or equally distributed at different points in time.

Steps/Outputs/Review of Methods and Analysis

My first step was to clean up my sightings data enough that it could be exported as a .csv and imported as x-y data into ArcMap. However, ArcMap, no matter the transformation equation, seemed to understand the projected or geographic coordinate systems. After many attempts, where my data ended up along the east coast of Africa or in the Gulf of Mexico, I tried a work around; I imported the .csv file into qGIS with the help of a classmate, and then exported that file as a shape file. Then, I was able to import that shape file into ArcMap and select the correct geographic and projected coordinate systems. The points finally appeared off the coast of California.

I then found a shape file of North America with a more accurate coastline, to add to the base map. This step will be important later when I add in track lines, and how the distributions of points along these track lines are related to bathymetry. The bathymetric lines will need to be rasterized and later interpolated.

The next step was the track line recreation. I chose to focus on the San Diego study site. This site has the most data and the most consistently and standardly collected data. The surveys always left the same port of Mission Bay, San Diego, CA traveled north at 5-10km/hr to a specific beach (landmark), then turned around. It is noted on sighting data whether the track line was surveyed on both directions (South to North and North to South), or unidirectional (South to North). Because some data were collected prior to the invention of a GPS and the commercial availability, I have to recreate these track lines. I started trying to use ArcMap to draw the lines but had difficulty. Luckily, after many attempts, it was suggested that I use Google Earth. Here I found a tool to create a survey line where I can mark the edges along the coastline at an approximate distance from shore, and then export that file. It took a while to realize that the file needed to be exported as a .kml and not a .kmz.

Once exported as a .kml, I was able to convert the .kml file to a layer file and then to a shape file in ArcMap. The next step in this is somehow getting all points within one kilometer of the track line (my spatial scale for this part of the project) to associate with that track line. One idea was snapping the points to the line. However, this did not work. I am still stuck here: the major step before I can have my point data with an association to the line and then begin a point pattern analysis in ArcMap and/or R Studio.

Results

Although I do not currently have results of this exercise, fully. I can say for certain, that it has not been without trying, nor am I stopping. I have been brainstorming and milking resources from classmates and teaching assistants about how to associate the sighting data points with the track line to then do this cluster analysis. Hopefully, based on this can be exported to R studio where I can see distributions along the transect. I may be able to do a density-based analysis which would show if different sections along the transect, which I would need to designate and potentially rasterize first, have different densities of points. I would expect the sections to differ seasonally.

Critiques

Although I add in my opinions on usefulness and ease above, I do believe this will be very helpful in analyzing distribution patterns. Right now, it is largely unknown if there are differences in distribution patterns for this population because they move rapidly and at great distances. But, by investigating data from only the San Diego site, I can determine if there are differences in distributions along the transects temporally and spatially. In addition, the total counts of sightings in each location per unit effort will be useful to see the influx to that entire survey area over time.

Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki

Point Pattern Analysis of Tree Distribution by Height in the HJ Andrews Forest

1. Given that HJ Andrew Experimental Forest is a 16,000-acre forest with manipulations spanning back to its establishment as a Long-Term Ecological Research (LTER) site in 1948, it is a highly spatially heterogeneous ecosystem. Forest harvest began in the 1950s and resulted in a mosaic of young plantation forest (~30 percent of total forest area) and old growth (~40 percent of forest) (http://andrewsforest.oregonstate.edu/). My objective is to quantify the spatial pattern of trees across the forest and eventually relate that to quantifiable landscape features.

Motivating Questions: How does the spatial pattern of trees vary across the HJ Andrews Forest? Specifically, I’m exploring the relationship between tree height and tree spacing. One specific question of interest is: How does the mean distance between trees in the same height class differ from the mean distance between a single height class of tree and all other trees? This question attempts to address the clustering vs. dispersion of trees by height.

This is an analysis of the spatial distribution of one variable, tree height, so a consideration of the internal processes that may influence the spatial distribution of this variable is necessary.

Microclimate caused by the clustering or dispersion of trees could be either an attraction or repulsion process. Microclimate influences relative humidity and exposure to wind, along with many other factors, so clusters of trees would tend to have different microclimate features than more dispersed trees.
Population and community dynamics will influence the spatial distribution of trees. The speed at which a colonizer can take over a space and competition between different colonizers influence the distribution. Aboveground and belowground tree growth adn spacing of trees will be influenced by these factors.
Source and sink processes in a forest may result from topographical features, like valleys and hillslopes. I expect valleys to be a source (capable of producing a surplus of organisms) sink they will tend to be in areas near streams, so not water limited and in areas that serve as catchments for nutrients, so not nutrient limited.
Spatial distribution of tree height certainly is different according to the scale. The spatial pattern looks different at a single tree scale compared with a 50-m scale, compared with a 5-km scale. Some of these differences are revealed by Figures 3, 4 and 5, below.

2. My approach is to use k-nearest neighbor to examine distance between a given tree and proximal trees. I created ten height classes by using kmeans to find ten cluster centers in the tree height data, then used K nearest neighbor to examine the distance between each cluster center and the 30 closest trees.

3. For this analysis, I used a LiDAR dataset of vegetation heights downloaded from the HJ Andrews online data portal (http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI011). The selected data is from the third entry, titled, “Vegetation Height digital elevation model (DEM) from 2011 LiDAR, Upper Blue River Watershed (Oct 27^th, 2011 – Nov 1^st, 2011). A description of this dataset can be found here: http://andlter.forestry.oregonstate.edu/data/spatialdata/gi01103.htm.

I performed the majority of the analyses in R and used QGIS for data visualization. I used the ‘st_read’ function from the ‘sf’ package in R to read in the stem map shapefile (stem_map.shp). The stem map shapefile includes crown radii (m) and tree heights (m) as well as the point locations of trees.

Because the stem map shapefile essentially provides a census of trees within the HJA Forest, and that is (1) way too much data to deal with at one time, and (2) not useful to perform statistical analyses on since we can just look at the data mapped out and visually discern where trees are located, I decided to use kmeans to group the trees into 10 height classes, resulting in the data space being partitioned into Veronoi cells.

I used the ‘nngeo’ package to examine k-nearest neighbors relations between and within height clusters. I examined the relationship between a single tree (point) and its nearest neighbor of the same height class. I also examined the relationship between a single tree of one height class and its nearest neighbor of any height class to start to elucidate to what extent trees of similar heights cluster or are dispersed spatially.

I performed T-tests on each tree height class to test if there was a significant difference between (1) the mean distance between a tree and the next closest tree within the same height class and (2) the mean distance between a tree and the next closest tree of any height class. I calculated the mean, standard deviation and kurtosis of each height class distance (both within and between height classes).

4. Results

The table below shows the center of each of the ten height class groupings, mean distances between and within tree height classes, standard deviation and kurtosis of those means, and p-values. Results of all t-tests were significant (p<0.01), meaning that there is strong evidence that the within group mean distances are not equal to zero, so there are differences in mean distances between trees of the same height class (within group) and mean distance to the closest trees of any height class. In other words, the distance between a large tree and its nearest neighbor (of any height class) is significantly different than the distance between a large tree and its nearest neighbor within the large tree height class. The same is true of trees within and outside of the small height class, as well as each of the other ten height classes.

Table 1. Results from k-nearest neighbor analysis and t-tests

Tree Class	Class Center (Height (m))	Mean Distance to 30 closest trees (m)	St Dev Mean Distance	Kurtosis Mean Distance	Mean Distance to Closest Tree (m)	St Dev Mean	Mean Distance to Closest Tree Within Group (m)	St Dev Group Mean	P-value
1	11.8	12.5	4.4	6.4	2.1	2	4	6.3	<0.01
2	15.1	11.7	3.9	3	2.4	1.7	4.9	6.6	<0.01
3	19.9	12.6	3.4	4	3.6	1.3	6.8	6.7	<0.01
4	24.4	13.4	3.1	5.6	4	1.4	6.9	6.5	<0.01
5	29.7	14.6	2.9	3.9	4.6	1.4	8	6.9	<0.01
6	35.5	16.7	3	2.8	5.3	1.6	9.3	7.2	<0.01
7	42	18.6	2.6	4.5	6.1	1.7	10.1	7	<0.01
8	50	19.7	2.5	6.6	6.9	1.8	11.4	7.3	<0.01
9	59	20.5	2.6	3.1	7.5	1.8	13.5	8.7	<0.01
10	70	21.2	2.7	0.7	8.1	1.9	15	11	<0.01

Fig 1. Distribution of all tree heights (m) in HJ Andrews Forest.

Fig 2. Mean distance (m) and standard deviation (m) between trees of each of ten height classes and the next closest 30 trees within that height class. Generally, as trees get larger, mean distance between them is larger.

Fig 3. Distribution of the tallest tree class (70 m tree class) across the HJ Andrews Forest.

Fig 4. A representative distribution of all tree height classes on either side of a road in the HJ Andrews Forest, showing the extent of clustering and the extent of dispersion of height classes. Height classes are in ascending order from smallest class (~12m tall; Class 1) to tallest class (70m tall; Class 10).

Fig 5. A closer look in the same area at Fig. 4, where small trees are clustered near the road and clustered tightly together, while larger trees are more dispersed.

Fig 6. Mean distance between trees of the same height class and other trees of the same height class (blue) and mean distance between trees of one height class and any other tree (red). The overlapping red confidence interval with the blue points suggests that the average distance between small trees is not significantly different than distance between small trees and any trees. The general upward trend suggests that as trees get taller, the distance between them increases and variance slightly decreases.

5. Critique of the method:

The results make sense, but do not provide much more information about the actual distribution of trees (clustering vs. dispersion) than simple maps of point data, so the next step might be to examine tree heights within different management regimes. The current analysis tells me that trees are somewhat clustered by height, and that the mean distance between a tree of one height class to a tree of the same height class is, in most cases, different from the mean distance of a tree of one height class to a tree of any other height class. I’ve examined a map of different management regimes within the HJ Andrews Forest and there are clear areas of old growth, harvested areas, clearly defined plots, etc., so I would expect some of these areas to show tree clustering by height class. The patterns I found using this analysis were not as clear as I was expecting. Using kmeans and nearest neighbor analysis is a great way to start to examine the spatial relationships between and among data, but with such a large and highly varied dataset there can be shortcomings, especially when it comes to drawing any concrete conclusions.

References:

HJ Andrews Online Data Repository: http://andlter.forestry.oregonstate.edu/data/catalog/datacatalog.aspx

Johnson, S.; Lienkaemper, G. 2016. Stream network from 1997 survey and 2008 LiDAR flight, Andrews Experimental Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=HF013 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/66d98881d4eb6bb5dedcbdb60dbebafa.

Spies, T. 2016. LiDAR data (August 2008) for the Andrews Experimental Forest and Willamette National Forest study areas. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI010 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/c47128d6c63dff39ee48604ecc6fabfc.

Spies, T. 2016. LiDAR data (October 2011) for the Upper Blue River Watershed, Willamette National Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI011 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/8e4f57bafaaad5677977dee51bb3077c.

Spies, T. 2014. Forest metrics derived from the 2008 Lidar point clouds, includes canopy closure, percentile height, and stem mapping for the Andrews Experimental Forest.. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=TV081 (10 April 2019) DOI:http://dx.doi.org/10. 6073/pasta/875e10383e8c8aee3c9a49e0155eef1d.

Exercise 1: Principal component analysis for rural index weighting & rural indicator measure contributions for Texas counties

Questions Being Asked:

How much do different indicator measures of rurality in Texas counties each contribute to a basic index of rurality both statistically and spatially? How much does the weighted basic index differ from a simple calculated mean for rurality?

Tools and Approaches Used

The major statistical method I used to answer these questions was principal component analysis (PCA) via the prcomp function in R and subsequent rurality indicator variable weighting for counties in Texas. The raw indicator data used in this analysis are as follows: Texas county median household income (2010 Census Data), Texas county population density (2010 Census data), and rasterized discrete data of land use for the state of Texas (2011 National Land Cover Database).

Description of Analysis Steps

1) Data Wrangling: Both income and population density data were natively in county-level polygon form, so no conversion was required in ArcGIS. Land use required extensive conversion in order to produce the indicator variable that would both be at the appropriate spatial definition and measure the correct aspect of land use. The raw land use discrete data was first reclassified from many categories of land, such as “deciduous forest,” “barren land,” and “scrub/shrub,” into binary “developed” and “undeveloped” classification groups for the entire state of Texas. Then, the resulting binary raster was converted to unsimplified polygons and the “tabulate intersection” ArcGIS tool was utilized in order to intersect Texas county boundaries with the produced unsimplified polygons. This tool allowed for calculation of percent of area of each Texas county that is considered developed land by the USGS.

2) Data Scaling: After the 3 variables were in analyzable form, each were scaled on a 0 to 100-point scale for visual comparison, PCA analysis, and computation of index weights.

3) Creation of Maps of Scaled Variables for Comparison: ArcGIS was utilized to create maps of each of the scaled variables to visually compare the 3 factors to one another.

4) PCA and Weight Computation: The PCA was completed in R using the prcomp function and variable weights were extracted from the procedure by determining the proportion of variance each variable contributed to. This specific step helps answer how much each indicator variable contributes to rurality in Texas.

5) Simple Means Map, Weighted Index Map, and Statistical Comparison: The scaled variables were used to calculate both an unweighted mean rurality score and weighted mean rurality score for all Texas counties. Maps of each of these measures were produced for visual comparison. A student’s t-test was then completed to compare unweighted and weighted rurality scores for counties.

Results

Maps of Scaled Variables of Texas Counties for Comparison

Rough but similar spatial patterns can be seen in all three maps. Counties with large cities and generally more urbanized counties have higher median income, population density, and percent of land area developed (Green=high, yellow=medium, red=low). There are more extreme values in the population density and percent of land area developed maps than median income.

PCA and Weighting Computation

The biplot of the PCA procedure helps visualize how the samples of the 3 variables relate to one another and how much each variable contributes to the first principal component. Important factors to notice in this plot are the direction of the arrows and clustering of the points. Typically, the more the variable arrow points toward the principal component 1 (PC1) axis, the more it contributes to the proportion of variance. The population density and percent land area developed variables happen to overlap one another in this plot, as they have highly similar contributions to the variance in PC1.

This table shows that population density contributes to 44.4% of the variance in PC1, income contributes to 11.11%, and percent land area developed contributes 44.5%. This PCA procedure indicates that when weighting rurality scores by county in Texas using these 3 variables, percent land area developed should have the highest weight, population density should have the second highest weight, and income should have the lowest. The weighting of variables in the weighted county rurality mean score follows these proportions.

Simple Means Map, Weighted Index Map, and Statistical Comparison

Visually comparing the two maps of rurality scores, there is is some overlap, but quite stark differences. County scores overall have become more rural (lower scores) based on simple visual comparison. Statistical analysis will allow a more nuanced view of the actual differences between the two.

The computed t-test indicates that the weighted index, on average, produces county rurality scores that are significantly more rural (lower scores on the 0-100 scale) than the unweighted index (p<0.001). The overall unweighted rurality score mean for all Texas counties is ~17, while the overall weighted rurality score mean is ~7.5. This indicates that the basic weighted index may better capture the multidimensionality of rurality for Texas counties than unweighted mean scores.

Critique

Based on my understanding, PCA is quite useful for creation of weighted indexes. Unfortunately, the variables I utilized for this analysis have somewhat varying patterns of clustering and possible outliers, which can be seen in the above biplot. These possible outliers could have an effect on the weighting of variables in the index. Due to the small sample size of counties and in order to produce a complete picture of all counties in Texas, I did not want to remove any counties from the analysis. PCA as it relates to index weighting also requires that there is either an a priori understanding or statistical proof of a correlation between variables. Based on the directions of the arrows in the biplot, there is likely correlation between the variables, but more considerations may be needed to confirm this. Also, the scaling of the median income variable is slightly off because the maximum value is over 100. This will need to be corrected in a later analysis.

My future analysis will include more health-related variables in the PCA procedure to determine how health variables contribute to rurality in comparison to the basic rural indicator variables analyzed in this exercise.

Ex 1: Mapping the stain: Using spatial autocorrelation to look at clustering of infection probabilities for black stain root disease

My questions:

I am using a simulation model to analyze spatial patterns of black stain root disease of Douglas-fir at the individual tree, stand, and landscape scales. For exercise 1, I focused on the spatial pattern of probability of infection, asking:

What is the spatial pattern of probability of infection for black stain root disease in the forest landscape?
How does this spatial pattern differ between landscapes where stands are clustered by management class and landscapes where management classes are randomly distributed?
Fig 1. Left: Raster of the clustered landscape, where stands are spatially grouped by each of the three forest management classes. Each management class has a different tree density, making the different classes clearly visible as three wedges in the landscape. Right: Raster of the landscape where management classes are randomly assigned to stands with no predetermined spatial clustering. The color of each cell represents the value for infection probability of that cell. White cells in both landscapes are non-tree areas with NA values.

Tool or approach that you used: Spatial autocorrelation analysis, Moran’s I, correlogram (R)

My model calculates probability of infection for each tree based on a variety of tree characteristics, including proximity to infected trees, so I expected to see spatial autocorrelation (when a variable is related to itself in space) with the clustering of high and low values of probability of infection. Because some management practices (i.e., high planting density, clear-cut harvest, thinning, shorter rotation length) have been shown to promote the spread of infection, there is reason to hypothesize that more intensive management strategies – and their spatial patterns in the landscape – may affect the spread of black stain at multiple scales.

I am interested in hotspot analysis to later analyze how the spatial pattern of infection hotspots map against different forest management approaches and forest ownerships. However, as a first step, I needed to show that there is some clustering in infection probabilities (spatial autocorrelation) in my data. I used the “Moran” function in the “raster” package (Hijmans 2019) in R to calculate the global Moran’s I statistic. The Moran’s I statistic ranges from -1 (perfect dispersion, e.g., a checkerboard) to +1 (perfect clustering), with a value of 0 indicating perfect randomness.

Moran’s I = -1

Moran’s I = 0

Moran’s I = 1

I calculated this statistic at multiple lag distances, h, to generate a graph of the values of the Moran’s I statistic across various values of h. You can think of the lag distance of the size of the window of neighbors being considered for each cell in a raster grid. The graph produced by plotting the calculated value of Moran’s I across various lag values is called a “correlogram.”

What did I actually do? A brief description of steps I followed to complete the analysis

1. Imported my raster files, corrected the spatial scale, and re-projected the rasters to fall somewhere over western Oregon.

I am playing with hypothetical landscapes (with the characteristics of real-world landscapes), so the spatial scale (extent, resolution) is relevant but the geographic placement is somewhat arbitrary. I looked at two landscapes: one where management classes are clustered (“clustered” landscape), and one where management classes are randomly distributed (“random”). For each landscape, I used two rasters: probability of infection (continuous values from 0 to 1) and non-tree/tree (binary, 0s and 1s).

2. Masked non-tree cells

Since not all cells in my raster grid contain trees, I set all non-tree cells to NA for my analysis in order to avoid comparing the probability of infection between trees and non-trees. I used the tree rasters to create a mask.
c.tree[ c.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA c.pi.tree <- mask(c.pi, c.tree) # Combine the prob inf with tree, leaving all others NA # Repeat with randomly distributed management landscape r.tree[ r.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA r.pi.tree <- mask(r.pi, r.tree) # Combine the prob inf with tree, leaving all others NA

Fig 2. Filled and hollow weights matrices.

3. Calculated Global Moran’s I for multiple values of lag distance.

For each lag distance, I created a weights matrix so the Moran function in the raster package would know how to weight each neighbor pixel at a given distance. Then, I let it run, calculating Moran’s I for each lag to create the data points for a correlogram.

I produced two correlograms: one where all cells within a given distance (lag) were given a weight of 1 and another “hollow” weights matrix when only cells at a given distance were given a weight of 1 (see example below).

4. Plotted the global Moran’s I for each landscape and compare.

What did I find? Brief description of results I obtained.

The correlograms show that similar values become less clustered at greater distances, approaching a random distribution by about 50 cell distances. In other words, cells are more similar to the cells around them than they are to more-distant cells. The many peaks and troughs in the correlogram are present because there are gaps between trees because of their regular spacing in plantation management.

In general, the highest values of Moran’s I were similar between the landscape with clustered management landscape and the landscape with randomly distributed management classes. However, the rate of decrease in the value of Moran’s I with increasing lag distance was higher for the random landscape than the clustered landscape. In other words, similar infection probabilities had larger clusters when forest management classes were clustered. For the clustered landscape, there was actually spatial autocorrelation at lag distances of 100 to 150, likely because of the clusters of higher infection probability in the “old growth” management cluster.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Filled” weights matrix.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Hollow” weights matrix.

Critique of the method – what was useful, what was not?

My biggest issue initially was finding a package to perform a hotspot analysis on raster data in R. I found some packages with detailed tutorials (e.g., hotspotr), but some had not been updated recently enough to work in the latest version of R. I could have done this analysis in ArcMap, but I am trying to use open-source software and free applications and improve my programming abilities in R.

The Moran function I eventually used in the raster package worked quickly and effectively, but it does not provide statistics (e.g., p-values) to interpret the significance of the Moran’s I values produced. I also had to make the correlogram by hand with the raster package. Other packages do include additional statistics but are either more complex to use or designed for point data. There are also built-in correlogram functions in packages like spdep or ncf, but they were very slow, potentially taking hours on a 300 x 300 cell raster. That said, it may just be my inexperience that made a clear path difficult to find.

References

Glen, S. 2016. Moran’s I: Definition, Examples. https://www.statisticshowto.datasciencecentral.com/morans-i/.

Robert J. Hijmans (2019). raster: Geographic Data Analysis and Modeling. R package version 2.8-19. https://CRAN.R-project.org/package=raster

Exercise 1: Comparison of Interpolation Methods Using Fisheries Abundance Data

Question Asked

I would like to understand how the spatial variability of forage fish in the California Current Ecosystem is related to the spatial pattern of seascape classes (based on remotely sensed sea-surface variables)? In exercise 1, I will asking “what is the spatial pattern of forage fish in the California Current ecosystem?” In order to address this question, I will be testing the use of different types of interpolation on my point-data.

Approaches Used and Methods

To address these questions, the Kriging Interpolation and Inverse Distance Weighting Interpolation tools were employed in ArcMap. All processes were completed in ESRI ArcMap 10.6.1. Interpolation is described as a method of constructing new data using existing data as references. In spatial and temporal analyses, there are a range of different types of interpolation that can be used.

The original data, which includes a series of about 1300 trawls, the catch per unit effort (CPUE) per species per trawl, the latitude, longitude, and water column depth of each trawl, and the functional group of each species caught, were loaded into ArcMap using the “Display X,Y Point Data” function. The four functional groups used in this analysis are Forage, Young-of-the-Year Rockfish (YOYRock), YOYGround, and crustaceans. In the case of this analysis, all fishes that are part of the YOYRock functional group were included.

Representation of the raw YOYRock Trawl data in ArcMap

After the data were uploaded, they had to be converted to feature classes. For the purposes of this exercise, all trawl data from 2002 to 2015 was included as one feature class, though the way in which the data are organized make it easy to break the trawls down at a finer temporal resolution. The result was a Point Shapefile that was then binned into four abundance groups to make interpretation easier. The IDW and Kriging tools (from the “Spatial Analyst” Toolbox) were then employed. The base equation for both interpolations is virtually the same, and both are commonly used methods for continuous data in point form, but there are some major differences in the calculation of some of the stated variables:

Representation of the equation used to assign values to missing cells using the IDW and Kriging methods. Z(si) = the measured value at the ith location, λi = an unknown weight for the measured value at the ith location, s0 = the prediction location, and N = the number of measured values

Z(si) = the measured value at the ith location, λi = an unknown weight for the measured value at the ith location, s0 = the prediction location, and N = the number of measured values

1) IDW: IDW, or Inverse-distance weighting, is what’s known as a deterministic interpolation method, as it relies on surrounding values to populate the resulting surface. One of the major assumptions with using IDW is that it assumes that the variables being mapped decrease in influence linearly as you move away from a given value. In IDW, the weight given to the “measured value at the ith location” is solely calculated linearly, decreasing as you move farther away from a given value. In this case, the “power” value, which corresponds to the weight, was the default 2.

2) Kriging: Kriging is an interpolation method based on geostatistics including autocorrelation. The equation used to calculate the missing values is the same as for IDW, except that the weight variable is calculated using a mathematical function within a certain specified radius of the missing value. For this reason, one of the main assumptions when using Kriging is that the “distance or direction between sample points reflects a spatial correlation that can be used to explain variation in the surface.” (ESRI, ND).

Results

The resulting interpolations can be found below. I’ve included output that focuses on the significant region of Monterey Bay to provide context regarding the proximity of the trawls to one another and to show detail on the boundaries of the interpolations.

Full Interpolation using IDW

Full Interpolation using Kriging

Detail of IDW in Monterey Bay, CA

Detail of Kriging in Monterey Bay, CA

Critiques of Methods

Any analysis of the results will conclude that the Kriging Tool provided a much more robust interpretation of the same patterns that can be observed in the original data and in the IDW interpolation. Both interpolations displayed significant patterns near Monterey Bay, and the Kriging Interpolation also represented additional areas of higher abundance north of San Francisco. While I do believe that both interpolation methods provide a good place to begin analysis, I believe that several adjustments will have to be made in order to create a useable result.

My first mistake was using the entire time series – since the interpolations are distance-based and extremely sensitive to points within close proximity to one another, the clear clusters of point most likely influenced the interpolations. A next step for me will be breaking the data down to an annual resolution, as I feel that shapefiles with one point at each trawl location will provide better data for interpolation. Additionally, this will provide multiple maps, which will allow for a chance to observe how the modeled patterns of YOYRock abundance have changed over time.

Another next step will be exploring the fine adjustments available within each interpolation method. I now have a greater understanding of the mechanics which drive IDW, so I’m eager o rerun the analyses at different powers to see how each impacts the interpolation. Similarly, the radius used to decide which points are considered while calculating a Kriging Interpolation can be adjusted, so that will be done in the future as an experiment.

Finally, I ran out of time to explore the symbology of the results – I hypothesize that classification by a smaller number of classes would result in more robust interpolation maps, as the visualizations now show a vast majority of the space to fall into the lowest class. The interpolation data is relatively bimodal in nature, so an adjustment in the symbology tab would likely result in a a more accurate and precise representation of abundance.

Overall, I see interpolation as a valuable way to identify spatial patterns from point data. In the case of species abundance data, I believe that Kriging is the superior method, as it does not have the linear influence assumption that’s baked into IDW. Additionally, the geostatical methods used in Kriging generally allow for a more robust and precise interpolation regardless of the type of continuous data being used.

The next steps mentioned above will be taken before the presentation of Tutorial 1.

References

http://desktop.arcgis.com/en/arcmap/10.3/tools/3d-analyst-toolbox/how-idw-works.htm

http://desktop.arcgis.com/en/arcmap/10.3/tools/3d-analyst-toolbox/how-kriging-works.htm

Santora, Jarrod & Hazen, Elliott & Schroeder, Isaac & Bograd, Steven & Sakuma, KM & Field, JC. (2017). Impacts of ocean-climate variability on biodiversity of pelagic forage species in an upwelling ecosystem. Marine Ecology Progress Series. 580. 10.3354/meps12278.

Determining spatial autocorrelation of refugee settlements in Uganda

Question that you asked

In analyzing the distribution of settlement boundaries that I obtained from OpenStreetMap, I wanted to know the general clustering and regionality of settlements in order to understand how other spatial statistics that I perform in my next step will behave based on the results from this explanatory step. The question I’m introducing for Exercise A centers around how similar or different nearby settlements are to each other: is there a regionality in settlement sizes? Some future questions that I’m considering are whether or not clustered settlements have higher detection in the World Settlement Footprint classification (which I’m using instead of GHSL because it’s a more localized classification). This would be determined using the percent of OSM settlement area that was detected by WSF.

Name of the tool or approach that you used.

I decided to use spatial autocorrelation to answer this question, since the essence of my question centers around what the pattern of my data looks like, how clustered or not clustered are these settlements, and how that might affect my future analysis and considerations.

For this, I employed the Spatial Autocorrelation tool in ArcGIS Pro, which uses Global Moran I’s algorithm.

Brief description of steps you followed to complete the analysis.

In order to identify the refugee settlements, I went through multiple rounds of querying with OpenStreetMap to extract the boundaries that are directly related to refugee camp boundaries. Hannah Friedrich and I have worked together on defining some of these boundaries, and she took some time to delineate boundaries for a separate study of hers. While I will incorporate these delineated ‘regions of interest,’ I will most likely not move forward with them in further analyses because it would present an issue with scaling up and manual interpretation. Below I list the three polygon data I’m analyzing here.

OSM Boundaries: This is a result of multiple query efforts within Overpass Turbo, a online server for downloading OpenStreetMap data. Various queries on attribute tags were performed in order to select relevant refugee boundary data.

Regions of Interest: This is a result from Hannah’s efforts of limiting areas within the OSM Boundaries that appear to be built up using high resolution imagery available on Google Satellite.

Selected OSM Boundaries: This is a selection from OSM Boundaries that merges polygons of the same settlement into multipolygons; this also a selection of boundary polygons that are less than 2000 hectares to account for some boundaries that are designated versus actually settled.

I then ingested these boundaries into Google Earth Engine, extracted the pixel areas for the WSF pixels within the three different boundary layers, and re-exported these.

I then imported these layers into ArcGIS Pro and performed the Spatial Autocorrelation tool and experimented with multiple parameters.

I ultimately chose the “row standardization” option given the possible bias in ‘sampling’ design, since this creation of this data is most often from the HOT Uganda team and resources might limit where they can travel and collect data.

Brief description of results you obtained.

Ultimately, my results showed clustering within the OSM-identified settlements, which is to be expected. There are a variety of questions that arise from this that might confound my analysis that are still moderately troubling: the method of data recording, the spotlight effect, the presence of organizations able to record data, and the inherent clustering based on human behavior, or drivers such as conflict or environmental conditions. This initial analysis is really to understand the degree of clustering that I might expect in my future analyses – if there is high clustering, then I need to understand that clustering when I’m testing additional questions with my next exercise, like nearest road or nearest urban area. The images below demonstrates the result of an analysis settlement area in the “Selected OSM Boundaries,” “OSM Boundaries,” and “Regions of Interest.”

Moran I, Selection of OSM Boundaries

Moran I, Delineated Regions of Interest

Moran I, All OSM Boundaries

While OSM Boundaries and Selection from OSM Boundaries show high confidence of clustering, the “Regions of Interest” pattern shows a less confident result of clustering. This makes sense, given that both the OSM Selection and OSM Boundaries both have overlapping polygons and varying sizes.

Below is a map illustrating the distribution of refugee areas in northwest Uganda, where most of the settlements are concentrated.

Critique of the method – what was useful, what was not?

Given my shortcomings with understanding statistical analyses, the interpretation of the results was most difficult for me. While my patterns appeared mostly clustered, the z-score and Moran’s Index showed changes when different parameters of comparison were chosen. Ultimately, I’m not sure if there would be a more effective spatial analysis to analyze aspects of these settlements that would prove more helpful in figuring out relevant information for moving forward with Exercise 2 and beyond.

Ex. 1: Spatial Autocorrelation in Stream Cross-Sectional Change In the Andrews Forest

Question that you asked

How much spatial autocorrelation exists in historic streambed changes in the Andrews Forest, both in the across-stream direction and the along-stream direction?

Name of the tool or approach that you used

I addressed each direction of the problems separately using. Since this resulted in one-dimensional, evenly spaced or faux-evenly spaced data, I just used the acf function in R to analyze the data.

Critique of the method – what was useful, what was not?

The methods provided an interesting look at patterns within the cross section data set. However, I think the one-dimensional approach fails to capture interactions between across- and along- channel changes as well as any temporal autocorrelation.

Description of steps you followed to complete the analysis and results you obtained.

For my project, I am looking at changes in the streambed along 4 stream reaches (black boxes) along two streams (blue lines) in the Andrews forest. The watershed of the larger stream, Lookout Creek is shown in black.

Researchers in the Andrews forest repeatedly surveyed fixed cross sections along the reaches from 1978 to 2011. In this exercise, I worked with a subset of the data from 1978 to 1998. The cartoon below does not contain real data but shows the arrangement of numbered reaches (blue lines) and their fixed endpoint stakes (blue dots) along a stream.

Part A – across-channel changes

Each individual cross section spans from bank to bank and show the topography and bathymetry across the width of the stream. Below are two consecutive years of cross-sectional profiles of the stream bed at Lower Lookout Creek, cross section # 3.

Profile from 1995.

Profile from 1996.

It looks like there was a lot of change between these two profiles. The active stream channel appears to have avulse from one side of the cross section to the other.

I compared the two profiles to see how much elevation was gained or lost.

Here grey line represents the surface of the stream bed in 1995, and the black line represents the surface of the stream bed in 1996. The area show in red represents material that was eroded away, and the green area represents material that was newly deposited.

We can simplify the plot above to show only the positive or negative elevation change across the channel. The figure below shows the same information as the figure above, but directly in terms of elevation change.

You can clearly see from this figure that this site at this year had one broad area of elevation gain or deposition, and one broad area of elevation loss or erosion. This isn’t precisely an attraction/repulsion phenomenon, but there can be positive feedbacks associated with erosion and deposition. For example, sediment deposited in a stream may locally obstruct or slow down the water passing by it and encourage deposition in adjacent areas.

Now, not every pair of years had this *much* elevation change, but also, not every pair of years shows such broad regions of positive or negative elevation change. Here is a figure of the elevation change at the same cross section between 1996 and 1997. You can see that elevation change is both smaller in magnitude and shows up in smaller patches.

I used spatial autocorrelation metrics to investigate the patch size. Note that autocorrelation at some scales is always expected when looking at topographic change. Even if you had extremely precise and accurate measurements, you would expect the stream bed change at one point to be strongly correlated with the change a few cm away both because the two points probably experience very similar forces and also because gravity tends to smooth out topographic irregularities in nature. But in most scenarios, you would probably not expect to find a correlation between the changes in a streambed with a change at a point 100 m away from the bank.

I picked two distances that seemed potentially interesting: 1 meter and 5 meters. Then I investigated autocorrelation at 1m and 5m lag distances using the acf function in R.

These lag distances are superimposed on the first change graph in the figure below.

I expected that all cross sections would have higher autocorrelation at the shorter lag distance than the longer one, but I also expected that a set of cross sections with very large patch sizes might show relatively higher autocorrelation at the larger lag distance compared to the small-patched cross sections. I need to spend more time looking into how the magnitude of the changes influences the autocorrelation function so that I can understand it better.

I repeated the process for every cross section at every year. Results are shown below.

The cross sections showed mostly positive autocorrelation at 1 m. Some years showed more autocorrelation across the board than others, and some individual cross sections seemed like they frequently had high autocorrelation values at multiple years.

Different patterns emerge at this spatial scale. At a 5 m lag, neutral or negative autocorrelation was much more common.

Part B – along-channel changes

I also wanted to look at autocorrelation between cross sections. Was the change at one cross section related to the change at cross sections immediately up- or down-stream?

Because I don’t have true geographic locations of the cross sections, I couldn’t do any fancy two dimensional work like interpolating the data into a river bathymetry model. Instead, I consolidated the cross-sectional change data into a single metric for each cross section.

Here is another look at the change plot I showed earlier. This is the same plot as before, but I’ve colored the positive values green to show deposition and the negative values red to show erosion.

I simplified this plot by removing the horizontal across-stream component and summing the red and green areas separately.

Then I had a choice: I could add the red and green areas together to show the total area of the cross section that changed in any way, or I could subtract the red area from the green area to show the net deposition in the cross section. Each of these metrics could be interesting for different reasons. For example, the total area of change might be relevant for ecological processes, but the net overal change could be relevant for tracking net sediment transport in these streams.

I started by adding the boxes together to calculate the total area of change.

In this way, I made a score for every cross section for every year pair. Hypothetical scores for the cartoon river are shown below.

Then I used the acf function again in R to determine how correlated adjacent cross section scores were with each other, compared to more distant cross section scores. This method is not fully spatial explicit, since I did not include any information about how far apart each cross section was from its neighbors. I just assumed that they were roughly equidistant within each reach. Indeed, the distance between cross sections can become hard to define when they are placed along curved parts of the stream.

I calculated these values with every year pair on each of the four reaches. I then repeated it with the subtraction method described above.

For the “addition” method, autocorrelation values were generally close to neutral with some reaches in some years tending towards positive values. Positive values could occur if changes in one reach directly influenced an adjacent reach (i.e. if upstream channel change influenced the hydraulics directly downstream) or if adjacent cross sections were influenced by similar other factors (i.e. two adjacent cross sections placed on a steep part of the reach might experience more similar changes to each other than two cross sections placed in a shallower part of the reach.)

The pattern is different for net change, where negative autocorrelation is more common. I think this is very interesting, because it implies that parts of the stream that had net deposition were more likely to be adjacent to parts of the stream that had net erosion. This could be an expression of a sink/source phenomenon where material scoured from one area is transported and deposited immediately downstream at the next cross section.

Exercise 1: Spatial Distribution of Juniper Saplings

1.Question asked and data used:

What are the spatial patterns of juniper sapling establishment within a treated watershed 14 years after most juniper were removed?
Two data sets were used for this analysis:
- Thirty-three belt transects (30 m in length, 3 m in width) were used to assess the density of juniper saplings within the watershed (data in form of excel spreadsheet listing latitude and longitude with number of juniper per transect)
- An orthomosaic created from UAV imagery of a small area within the watershed (data in form of multispectral raster, with brightness values for red, green, blue, near-infrared, and re-edge wavelengths)
Watershed map (National Agricultural Imagery Program image from 2016 shown) with belt transects and UAV study area:

2. Approach and tools used: A number of techniques were initially applied to explore the data and to examine which approaches were more (or less) effective.

Belt transects:
- Kriging
- Inverse Distance Weighted (IDW) interpolation
- Spatial autocorrelation (Global Moran’s I)
- Hot-spot analysis
Classified raster (UAV orthomosaic):
- Hot-spot analysis

3. Steps used in analysis:

Slope and aspect calculation:
- The slope and aspect tools (Spatial Analyst Tools) were applied to 30 m DEM (from earthexplorer.usgs.gov). This information was noted for hot spots and cold spots but not used for further analysis during this exercise.
- The extract multi values to points was used to extract the slope, aspect, and elevation value
Belt transects:
- Projected data into NAD 1983 UTM Zone 10N
- The location of each survey was symbolized by color based on the number of juniper found.
- Kriging (Spatial Analyst Tools and Geostatistical Wizard) was used for initial exploratory analysis to assess general spatial distribution across the watershed
  - The number of juniper per transect used as the z-value field
  - Histogram and QQPlot used assess distribution of the number of juniper per transect for the geostatistical layer
  - For the raster layer, the predicted values were compared to the observed values by extracting the values at each survey location
- IDW interpolation
  - Juniper used as input feature
  - Maximum neighbors:15; Minimum neighbors:10
- Spatial autocorrelation (Global Moran’s I) (Spatial Statistics Tools)
  - Calculated using the number of juniper per transect
  - HTML file created lists Moran’s index, z-score, and p-value
- Hot Spot Analysis (Getis-Ord Gi) (Spatial Statistics Tools)
  - Conceptualization of spatial relationships: Fixed distance band
  - Distance method: Euclidean distance
  - Default was used for the distance band

Classified Raster (orthomosaic):
- Supervised classification (support vector machine) was used to identify juniper within the UAV orthomosaic
- General procedure for classification: create training samples-> assess using interactive training sample manager and scatterplots->create .ecd file->apply classifier->apply Majority Filter tool
- A point file was created from pixel clusters identified as juniper within the image
- Hot Spot analysis (Getis-Ord Gi)
  - Conducted to assess areas of concentration of juniper saplings within the sample area
  - Integrate tool used to aggregate juniper data (5 m used for analysis, but other distances were initially tested)
  - Hot Spot analysis tool using aggregated data created in previous step as input layer (other inputs were the same as those used for the belt transect hot spot analysis)

4. Overview of Results:

Belt Transects
- Kriging. I used two different approaches to kriging. First, I used the kriging tool under spatial analyst tools and then used the geostatistical wizard to calculate ordinary prediction kriging. The resulting maps using these two approaches were different. In particular, the use of the geostatistical wizard created maps more similar to the IDW while the geostatistical wizard created a map with different contours.
  - Related statistics:
    - minimum:0.05
    - maximum: 6.97
    - mean: 2.83
    - standard deviation: 0.86
  - Spatial Analyst Kriging Map:
  - Using the export values function, the predicted values of this kriging method at each belt transect site were very similar (within 0.05) to the observed values.

- - Ordinary Prediction Kriging:
  - The QQ plot indicates that assumptions of normality may not be met:
  - The ordinary prediction kriging also tended to overestimate the juniper density based upong the distribution chart:
  - - Related cross-validation/error statistics for the ordinary prediction kriging:
      - Mean -0.0583700520511708
        Root-Mean-Square 2.09329560839956
        Mean Standardized -0.0193126589526469
        Root-Mean-Square Standardized 0.986330022079785
        Average Standard Error 2.09938534113134
- IDW
  - While general trends were similar to kriging, the size and shape of contours between the methods were different.
  - Mean: 3.08, range: 0.00 to 7.00, standard deviation: 1.16
  - IDW
- Spatial autocorrelation (Global Moran’s I)
  - Moran’s I based on juniper per transect: -0.019, with a p-value of 0.92
  - Indicates that the pattern of juniper density is considered random and not significantly different than a random distribution
- Hot Spot analysis (Getis-Ord Gi)
  - One hot spot found (90% confidence): northwestern aspect (300 degrees) with 8.8% slope
  - One cold spot found (95% confidence): north-northwestern aspect (347 degrees) with 11.5% slope
  - Remaining points were considered insignificant
  - Map of Hot Spot analysis:

Classified Raster
- Hot Spot analysis (Getis-Ord Gi)
  - Within the area covered by the orthomosaic, four hot spots were found:
    - One area with 99% confidence: 11% slope, 325 degree aspect
    - Three areas with 95% confidence: 3% slope, 254 degree aspect; 11% slope, 167 degree aspect; and 8% slope,137 degree aspect
  - UAV Orthomosaic Hot Spot map:

5. Discussion/Critique of Methods.

Based on the maps produced using the IDW and kriging methods, some general trends in juniper spatial distribution exist within this watershed. For example, we see lower densities in the north-northeastern areas and greater densities of juniper in the southeastern and western areas. However, both the IDW and the kriging raster using the spatial analyst tool produced the characteristic “bulls-eye” pattern often associated with IDW. When compared to the ordinary prediction kriging maps, different interpretations of juniper density in the watershed might be made. Compared to the belt transects data, the kriging created using the spatial analyst tool was similar to the observed values while the ordinary prediction kriging tended to overestimate the distribution. However, more ground data is necessary to determine how accurate these prediction would be in other areas. One of the biggest takeaways from this is to carefully consider the approach used and the nature of the data (for example, the use of ordinary versus simply kriging, etc.). From a user standpoint, I found the geostatistical wizard the most useful approach — particularly as it made inspecting the statistics and semivariogram very easy. However, in the future I would explore different methods of kriging within this tool.

The spatial autocorrelation and hot spot tools were useful, although the results did not provide much significant information regarding the spatial distribution of juniper in the cases examined here. For future analysis, particularly when examining the relationship between juniper density and other watershed characteristics (e.g., slope, aspect, etc.) these tools may become more important. In the case of the UAV orthomosaic, this provided a “test run” for what will be a larger dataset. The steps taken prior to analysis, particularly creating a point layer from the classified pixel clusters, will be time intensive and may require alternative approaches. At the limited scale of the UAV orthomosaics these hotspots did not provide much useful information, but if extrapolated over a larger area more patterns may be observed.

The distribution of juniper saplings in this case may be associated with a number of factors. Juniper seeds may be spread through birds or wind, resulting in the spread of juniper saplings throughout the watershed. At a much larger scale, if seed sources are less available, this distribution may be more localized. This pattern may also vary with the presence of mature juniper. As previously discussed some patterns emerged in this data (lower densities in the northern areas of the study area, for example) which may be associated with the spatial distribution of other characteristics, such as soil moisture. For example, sources and areas of higher juniper may include areas were we anticipate higher soil moisture such as flatter slopes with northerly aspects. These factors will be assessed further in exercise 2.

Spatial variation of economic losses resulting from a joint earthquake/tsunami event: An application to Seaside, OR

Question

What is the spatial variability of economic value of buildings as well as losses resulting from a joint earthquake/tsunami event? How does this spatial variability relate to independent earthquake and tsunami events, as well as the intensity of the hazard?

The purpose of this analysis was to consider the spatial variability of initial economic value, as well as economic losses resulting from a joint earthquake/tsunami event. The losses were deaggregated by hazard (earthquake only, tsunami only, joint earthquake/tsunami), as well as intensity of the event (100-year, 250-year, etc.).

Tool and approach

Two methods were implemented to evaluate the spatial variability economic value and losses: (1) interpolation via kernel density, and (2) a hotspot analysis using the Getis-Ord Gi* statistic. The economic losses were determined using a probabilistic earthquake/tsunami damage and loss model. This model implements Monte-Carlo methods to estimate the expected economic losses following an earthquake/tsunami event. For this application, 10,000 simulations were ran, from which the average loss at each building was computed for earthquake only, tsunami only, and a joint earthquake/tsunami event.

Description of steps

The average losses at each building resulting from the earthquake/tsunami loss model were added as an attribute to a GIS shapefile. Two methods to evaluate the spatial distribution were considered:

Interpolation via kernel density: Spatial interpolation was performed using a kernel density estimate. A kernel with a size proportional to the value of the attribute under consideration (in this case economic value/loss) is placed at each data point. A map is then created by taking the sum of all kernels. The kernel radius and shape can vary to produce different results. In this analysis, a quartic kernel was utilized with a radius of 200 meters. The interpolation was performed using the built in interpolation feature in QGIS 3.
Hotspot analysis using the Getis-Ord Gi* statistic: Hotspot analysis was performed using the Getis-Ord Gi* statistic. This statistic results in a p-value and z-score at each attribute, providing insight into whether the null-hypothesis can be rejected (in this case, spatial randomness). As such, features with a small p-value and a very large (or very small) z-score indicate that the null can be rejected (or that the data is not spatially random). Consequently, applied across an entire spatial dataset, the hotspot analysis identifies statistically significant clusters of high (or low) values. The hotspot analysis was performed using the available Hotspot Analysis plugin for QGIS 3.

Description of results

The results of the analysis are shown in Figure 1. The columns correspond to interpolation and hotspot analysis respectively. The first row shows the building values, whereas the second shows the economic losses resulting from a joint earthquake/tsunami event (2,500-year return period).

Areas of high economic value and losses can be easily observed from the interpolation analysis. Here, areas of red correspond to larger damages. Similarly, statistically significant clusters of large (and small) damages can be observed from the hotspot analysis. Again, red corresponds to a statistically significant hot spot (e.g. a cluster of large values and losses), whereas blue corresponds to a statistically significant cold spot (e.g. a cluster of small economic values and losses).

A large concentration of economic value is centrally located along the coast, and is due to the presence of resorts and condominium complexes. This area is observed from both the interpolation and hotspot analysis. Interestingly, more clusters are observed from the hotspot analysis as opposed to the interpolation. This could be explained by the scaling of the interpolation. In this case, the red regions correspond to a maximum value of $20M. If this value was reduced by half, more areas of high concentration would be observed.

The hotspot analysis provides insight into statistically significant clusters of high and low values, as opposed to single points of high values; however, when comparing interpolation and hotspot analysis, it should not be neglected that the results of the latter are visually more difficult to observe. This is due to the discrete nature of the Getis-Ord Gi* statistic (e.g. each point corresponds to a p-value and z-score, as opposed to the continuous surfaces of interpolation). This results in polygons that are shaded according to confidence levels.

Figure 1: Comparison of interpolation and hotspot analysis for both initial building value and economic losses

In addition to the initial value and economic losses resulting from the 2,500-year earthquake/tsunami event, interpolated maps were deaggregated based on hazard (earthquake only, tsunami only, combined) as well as intensity of the event (return years 100, 250, 500, 1,000, and 2,500). The results are shown in Figure 2, where each row corresponds to the hazard, and each column to the intensity of the event. Furthermore, the total economic losses to all buildings in Seaside were determined based on hazard and intensity (Figure 3).

Figure 2 shows that the economic losses are spatially consistent across Seaside for the 100-year event, and begin to exhibit spatial variability as the intensity increases. Losses begin to accumulate for the 250-year event near the center of Seaside, and it can be seen that the earthquake is the primary driving force. Similar trends are observed for the 500-year event. The 1000- and 2500-year events begin to see significant tsunami losses that are not as spatially concentrated as the earthquake losses, but are more evenly distributed along the coast. Figure 3 shows that the tsunami losses begin to dominate for the 1000-year event.

Figure 2: Earthquake and tsunami hazard deaggregated by hazard and intensity

Figure 3: Total earthquake and tsunami damages across Seaside, OR

Critique

Both the interpolation and hotspot analyses have limitations. As previously mentioned the hotspot analysis can be aesthetically challenging. Additionally, difficulties may arise in communicating the confidence levels to community planners and resource managers who may not have a statistical background.

Similarly, spatial interpolation via kernel density has its own limitations. As there are subjective options when performing the interpolation and viewing the results (e.g. radius, color scheme, and maximum values), the resulting maps could easily be deceiving. Figure 4 shows the same data but use of a different radius to define the kernel. It can be seen that the map on the right appears more severe than the map on the left. The practicality of a spatial interpolation map ultimately depends on the GIS analyst.

Figure 4: Comparison of interpolation resulting from different kernel radii.

Courtney’s Ex 1: Spatial patterns of ion concentrations in groundwater

1: Question being asked

How do ion and isotope concentrations vary spatially? And what trends in concentrations account for variability between my samples?

For detailed background information, I point you towards my problem statement for this class. In short, I’m hoping to use geochemical and isotope values for wells 32 well I sampled last summer in order to better understand how groundwater flows across faults in the Walla Walla Subbasin in northeastern Oregon. In the analysis for this post, I only used a subset of 18 wells for which I had the largest suite of measured parameters.

2: Approach used

For this exercise, I’m investigating spatial patterns of my ion and isotope concentrations. I could do this by individually examining distributions of all 19 of the parameters that I sampled for… but that would be graphically overwhelming and not terribly useful. It’s too difficult to draw conclusions by comparing 19 different maps of distributions. Instead, I’m using a method called principal component analysis (PCA for short) to statistically evaluate my samples by the groups of parameters that account for large amounts of the variation that differentiates one sample from another.

The nuts and bolts of PCA involve sophisticated analysis that calculates eigenvectors, linear algebra, and n-dimensional properties of a data set, and I will let people more knowledgeable than me explain this if you’re curious about how exactly PCA works.

https://blog.bioturing.com/2018/06/14/principal-component-analysis-explained-simply/

R spits out two kinds of useful graphic output for PCA based on the concept of a “biplot”. The first is called a loading plot, where two principal components are graphed against each other, for example PC1 and PC2. The value for each parameter in PC1 and PC2 is used to give the point a cartesian coordinate which defines a vector in comparison to the origin. The second kind of output is called a scree plot, which applies the same concept to the sample points (called “individuals” in the code) instead of to the parameters (called “variables” in the code). I used an R package called ggbiplot to combined the scree and loading plots in one figure.

https://blog.bioturing.com/2018/06/18/how-to-read-pca-biplots-and-scree-plots/

https://learnche.org/pid/latent-variable-modelling/principal-component-analysis/interpreting-score-plots-and-loading-plots

3: Methods

I used R to calculate the principal component vectors for my data and to create charts for them, and then symbolized my data in ArcGIS Pro to see if there was a spatial pattern to the information in the PCA output. The code in R is fairly manageable. I’ve included it below with my comments.

# Source: https://www.datacamp.com/community/tutorials/pca-analysis-r

# Preliminary set up stuff – add libraries, set working directory

library(devtools)

library(ggbiplot)

setwd(“~/_Oregon State University/Thesis/Summer Fieldwork/PCA”)

# Before importing your data, make sure that every entry in its table has a numeric entry in it.

# This process will spit out garbage if you have any blank cells or text.

# The number of rows in the data set needs to exceed the number of columns. I used an argument like [,c(1:11)] to enforce this in my data

#Process 1 of 2: read data set 1 into R, view it, compute PCA, display summaries and biplots

# it’s important to set the row names so we can use them as labels in ggbiplot

PCAdata.ions2 <- read.csv(“ions_allchem20190401_usgsdwnld.csv”, row.names = 1)

View(PCAdata.ions2)

Ions2.pca <- prcomp(PCAdata.ions2[,c(1:11)], center = TRUE, scale. = TRUE)

summary(Ions2.pca)

ggbiplot(Ions2.pca, choices=c(1,2), labels=rownames(PCAdata.ions2))

ggbiplot(Ions2.pca, choices=c(1,3), labels=rownames(PCAdata.ions2))

#read data set 2 into R, view it, compute PCA, display summaries and biplots

PCAdata.nonions <-read.csv(“NonIons_allchem20190401_usgsdwnld.csv”, row.names = 1)

View(PCAdata.nonions)

NonIons.pca <- prcomp(PCAdata.nonions[,c(1:8)],center = TRUE, scale. = TRUE)

summary(NonIons.pca)

ggbiplot(NonIons.pca, labels=rownames(PCAdata.nonions))

ggbiplot(NonIons.pca, choices=c(1,3), labels=rownames(PCAdata.nonions))

# if you want plots other than PC1 vs PC2 or PC1 vs PC3 just change the argument “choices = (1,2)”

# Process 2 of 2 add tools to get individual coordinates

# Source http://www.sthda.com/english/wiki/get-pca-extract-the-results-for-individuals-variables-in-principal-component-analysis-r-software-and-data-mining

# only install the package if you haven’t before. You will need to have RTools 3.5 already installed

# install.packages(“devtools”)

# library(“devtools”)

# install_github(“kassambara/factoextra”)

# now run things, it will spit out tab-delimited data. The “ind” command outputs PC information for the individuals, and the “var” command outputs the PC information for the variables.

library(factoextra)

ind <- get_pca_ind(Ions2.pca)

ind$coord

ind2 <- get_pca_ind(NonIons.pca)

ind2$coord

# copy that info into text files, import them into csv format, join in Arc!

var1 <- get_pca_var(Ions2.pca)

var1$coord

var2< get_pca_var(NonIons.pca)

var2$coord

Next, I symbolized my data in ArcGIS Pro. Because I only had 18 samples, it wasn’t too onerous to use selection in the attribute table to export data to another layer to symbolize the groups I saw in the principle component analysis biplot.

To examine how the principle components affected individual points, I used the second process in the above code to generate a tab-delimited chart of the principal component loadings for each sample. I copied this into a .txt file, imported it into Excel to create a CSV, and then joined that CSV to my shapefile of well locations. For the variables’ coordinates, I likewise created .txt files and imported the into Excel, but I symbolized them in Excel.

4: Results

In this section, I’ll show the graphs and maps that I created for the ion section of my data, and give an explanation of what I learned from this analysis.

For my ion data, the first two out of the 11 principle components explain about 80% of the variation in my data.

The first one, PCI, explains 55% of the variation in my data set and is shown on the X axis of the graph below. It is strongly influenced by concentrations of calcium, magnesium, chloride, alkalinity, sulfate, bromide, and sodium, which all have eigenvalues on the X axis greater than |1|. Comparing the scree plot to the loading plot, wells 56287, 3074, 4179, and 4167 are the primary individual drivers of this PC.

PC2 explains another 25% of the variance in my data, and is shown on the Y axis. Fluoride, silicon, and manganese are the primary variable drivers of variation between the individuals in PC2.

CvS PC1 PC2 ions biplot with groups

Based on the coordinate information for the variables, the variation explained by PC1 is due to increased levels of calcium, magnesium, potassium, sodium, alkalinity, bromide, chloride, and sulfate at four wells. This combination of ions all being similar could be explained by recent recharge influenced by agricultural chemicals, such as fertilizers. In an undisturbed basalt setting though time, calcium and magnesium tend to decrease as sodium and potassium increase. The fact the these four parameters vary together in PCI indicates that a process other than ion replacement is driving ion concentration. The variation in PC2 is predominantly explained by fluoride, dissolved silica, and manganese. Elevated concentrations of these three variables in comparison to the other variables indicates that the groundwater has spent more time in the basalt.In PC2, calcium and magnesium are inversely related to sodium, indicating that the cation concentrations are driven by the ion replacement process.

While the figure above locates wells based on their grouping on a biplot comparing two principle components, the following figures show the wells color-coded by their loadings in PC1 and in PC2. I’m hypothesizing that wells with smaller PC1 values are influenced by agricultural recharge, and wells with more positive values on PC2 are more influenced by the processes of ion dissolution as water travels through the basalt.

color-coded map of PC1 values for the 18 wells

PC1 is heavily driven by four particular wells, which show up here as being in colors of blue and blue-grey.

The distribution of PC2 values shows a pattern that I find really interesting. More negative – bluer – values indicate samples which were less influenced by the ion replacement and dissolution processes of groundwater traveling through the basalt. The values at high elevation are negative, which makes sense because they’re closer to the upland recharge zone, but so do some samples in the lowlands to the north which are downgradient of certain faults. This could indicate that those PC2 negative downgradient wells are cut off from the longer flowpaths of groundwater by faults that act as barriers to groundwater flow. The pinker – more positive – points indicate samples that were more influenced by the along-flowpath ion reactions.

Next, I present the PCA results for the non-ion parameters. Unlike the “ions” set whose title was somewhat self-explanatory, this PCA brought together different types of chemical and physical properties of water. These include pH, temperature, and specific conductivity which were measured in the field, the analyzed values of hydrogen and oxygen isotopes as well as tritium, the elevation of the bottom of the well, and the cation ratio. The cation ratio is a standard calculated parameter for groundwater chemistry which is equal to ([Na] + [K]) / ([Ca] + [Mg]). As water spends more time in contact with the rock, its ion exchange with the rock leads to a smaller cation ratio. The first biplot graphs PC1 (~40% of variation) on the X axis against PC2 (22% of variation) on the Y axis, and the second biplot graphs PC1 on the X axis versus PC3 (~18% of variation) on the Y axis.

The PC1 grouping explains variance among the samples by inversely correlating tritium, deuterium (d2H) and oxygen-18 (d18O) with pH and the cation ratio. More positive PC1 values show that the individual has high d18O and d2H values, and low cation ratio and pH values. All d18O and d2H values are negative; values that are closer to 0 (“larger: for the purposes of this plot) are more enriched in the heavier isotope. PC2 is predominantly driven by an inverse correlation between well bottom elevation and temperature. PC3 inversely correlates tritium, specific conductance, and well bottom elevation with temperature, deuterium, and 18-oxygen.

Yellow/orange points on this map have more positive PC1 loadings. They have higher d18O and d2H values, and lower cation ratio and pH values. The lower cation ratio and pH values indicate that the sample is less chemically evolved based on the expected ion exchange reactions in the basalt. I found it interested that samples on that end of the spectrum were also more enriched in the heavier isotopes of hydrogen and oxygen.

Unfortunately I can’t seem to upload the maps for PC2 and PC3 for non-ions, I’m getting a message that all of my allowance for image uploads has been used up…

5: Critique

I’m pretty happy with what I put together using principal component analysis. In a perfect world I would have learned to do the spatial visualization elegantly in R instead of messing around with exporting data to Arc, and made my variable charts in R instead of exporting them to excel. However for the purposes of this project the less elegant method was quicker for me.

Exercise 1: Ventenata spatial clustering

Question Asked

I am interested in understanding the invasion potential of the recently introduced annual grass ventenata (Ventenata dubia) across eastern Oregon. Here I ask, what is the spatial pattern of the ventenata invasion across the Blue Mountains Ecoregion of eastern Oregon?

Tools and Approaches Used

To address this question, I (1) tested for spatial correlation at various distances using Moran’s I spatial autocorrelation coefficients plotted with a correlogram, and (2) performed hot-spot analysis (Getis-Ord Gi) to identify statistically significant clusters of areas with high and low ventenata cover.

Description of Analysis Steps

1a) Moran’s I: To compute Moran’s I spatial autocorrelation coefficient for all of my sample units, I used the “ape” package in R version 3.5.1. The first step to this analysis was to convert the ventenata data and associated coordinates into a distance matrix. Once the distance matrix was created, the Moran.I function computed the observed and expected spatial autocorrelation coefficients for the variable of interest (ventenata abundance). The function produces a test statistic that tests the null hypothesis of no correlation. See Gittleman and Kot (1990) for details on how the Moran.I function calculates Moran’s I statistics.

1b) Correlogram: I plotted a correlogram using Moran’s I coefficients with increasing distances (lags) to examine patterns of spatial autocorrelation in my data. I used the correlog function in the spdep package in R to plot a correlogram with lag intervals of 10,000m. The function has the option of randomly resampling the data at each increment to incorporate statistical significance. This randomization tests the null hypothesis of no autocorrelation. I ran the function with resamp = 100. Black points on the correlogram are indicative of Moran’s I values significantly larger or smaller than expected under the null hypothesis.

2) Hot Spot Analysis: I used the hot spot analysis (Getis-Ord Gi^*) tool in Arc GIS to identify statistically significant clusters of areas with high and low ventenata cover across my study area. The tool produces z-scores and p-values that test the null hypothesis of a random distribution of high and low values rather than clusters of high or low values. High z-scores indicate clusters of high values and low z-scores indicate clusters of low values. Low p-values indicate that these clusters are more pronounced than would be expected by chance.

Results

1a) Moran’s I: The Moran’s I spatial autocorrelation coefficient estimate for all of the points across the entire sample area was 0.3 ± 0.05 (p < 0.3). This value is not particularly informative, as it only indicates that the data is positive spatially autocorrelated, but does not provide information to describe the spatial pattern. I chose to follow the Moran’s I up with a correlogram to uncover the spatial pattern driving the autocorrelation.

1b) Correlogram: The Moran’s I spatial correlogram shows a general trend of decreasing autocorrelation from 0 to about 70,000m where sudden jumps in Moran’s I values occur to up to ~0.3. Following this jump, the correlation decreases to -0.5 to -0.2 between 120,000 and 152,000m, then increases to ~0.3 at 170,000m, decreases to almost -1.0 just after 200,000m, and finally increases to almost 1 at 220,000m. The general trend appears to be decreasing from 0.2 to -0.9 at 220,000m with some high peaks interspersed. These high and low peaks indicate distinct ventenata patches distributed throughout the study area, suggesting a clustered spatial pattern of the ventenata invasion. The extreme high and low values at distances over 200,000 are likely a result of the few sample units being compared at these distances, thus these are not so informative of the overall spatial pattern.

2) Hot Spot Analysis: Hot spot analysis in ArcGIS depicted clusters ranging from high ventenata cover (large red circles) to low ventenata cover (small blue circles) across my study area (Fig. 2) using the calculated z-scores and p-values for each sample unit. The resulting map shows distinct clusters of high, low, and moderate ventenata cover distributed across seven sampled burn perimeters (displayed in light orange). The highest cover clusters are all located within the Ochoco and Aldrich Mountains in the center of the study region. The fires on the perimeters of the region exhibited clusters of low to no ventenata cover.

Critique of Methods Used

When run on all of the data across the entire region, Moran’s I did not produce a useful statistic, indicating only if the data was spatial autocorrelation without indicating a spatial pattern. However, when visualized with a correlogram at varying distances, the correlation coefficients suddenly told a story of spatial clustering. The results from the hot spot analysis reinforce the findings from the correlogram by clearing depicting clusters on a map of the study area. The hot spot analysis further explores these results by mapping the clusters of high and low ventenata cover on top of each of my sample units, providing a useful visualization of exactly where the clusters of high and low cover fall across the region.

References

Gittleman, J. L. and Kot, M. (1990) Adaptation: statistics and a null model for estimating phylogenetic effects. Systematic Zoology, 39, 227–241.

Exercise 1: What is the spatial pattern of western hemlock dwarf mistletoe at the Wolf Rock reference stand?

For Exercise 1, I wanted to analyze the spatial pattern of western hemlock dwarf mistletoe infections in live western hemlocks on my 2.2 ha reference stand (Wolf Rock). This was without considering any attributes of the western hemlock trees themselves. Simply, what was the spatial pattern of infection?

To answer this I used the “Average Nearest Neighbor” tool in the Spatial Statistics toolbox in ArcMap. This tool calculates a z-score and a p-value from that z distribution. This is a commonly used method in dwarf mistletoe literature for assessing the clustering of infection centers. Also, the equations for this tool assume that points are free to locate wherever in space and that there are no barriers to spread.

ArcMap makes running these analyses very simple so I created a selection of infected trees (red dots), created a new feature, and then ran the tool. The p-value from my test was 0.097 and my Nearest Neighbor Index was 0.970, indicating that the spatial pattern of the infections are somewhat clustered with an alpha of 0.10.

Average Nearest Neighbor is a good test for analyzing whether or not a set of coordinates are clustered. The degree of clustering of may be harder to interpret as a lower p-value may not necessarily mean points are more clustered. Also I was unable to see where my clusters are, and if my intuitions match the analysis (see map). One other important consideration is the study area. Changes in analysis area can drastically change the result of your clustering analysis (i.e. larger study areas may make data look more clustered). Lastly, there was no option for edge correction. This may have skewed some of the clustering results along the edge of my study site and 2.2 ha is pretty small to be subsampled without losing a lot of my data.

Prologue

After confirming that my infections were clustered, I wanted to see if the pattern I saw in my map, was actually on the ground. I wanted to know, where are infected trees clustered with infected trees and where are uninfected trees clustered with uninfected trees? Again, this was without considering any attributes of the western hemlock trees themselves.

I used the “Optimized Hot Spot Analysis” tool in the Mapping Clusters toolbox to analyze the incidence of infection data (0 = absence, and 1 = presence). The Optimized Hot Spot Analysis tool can automatically aggregate incidence data that are normally not appropriate for hot spot analysis. It also calculates several other metrics for me that made analysis easy. I could take these automatically calculated metrics and alter them in a regular hot spot analysis if needed.

This map displays clustering that matched up closely with my intuitions from Map 1. On the left, the blue values show a cluster of uninfected trees that are closely clustered with other uninfected trees. The larger swath on the right show a cluster of trees that are closely clustered with other infected trees. In the middle a mix of uninfected trees and infected trees are mixed without displaying any significant clustering. Lastly, small clusters in the top left and bottom left of infected trees were identified. These clusters may be edge of larger clusters outside my stand, or lightly infected trees that are starting a new infection center. These results will be extremely valuable in informing my steps for Exercise 2 because I can assess the conditions of both patches and determine differences between the two. I can also determine if distance to the refugia impact the clustering of infection because it appears the infected cluster is closer to the fire refugia.

The hot spot analysis was extremely useful for analyzing and displaying the information I needed about the clustering and was very useful for building off of the Average Nearest Neighbor analysis.

My data set also included a severity rating for dwarf mistletoe infected western hemlocks in my study site. I ran a similar hot spot analysis to above to determine if there were any similarities with how severity played out in the stand compared to solely incidence data. My data ranged from 0 – 5, 0 indicating uninfected trees and 5 indicating most heavily infected. These are classified data, not continuous but still appropriate for the optimized hot spot analysis. Western hemlock dwarf mistletoe forms infection centers, starting from a residual infected western hemlock that survived some disturbance. From there the infection spreads outwards. Another facet of infection centers is that the most heavily infected trees are almost always aggregated in the center of the infection center and infection severity decreases as you move towards the outside of the infection center. This is intuitive when you think about infected trees in terms of the time they’ve been exposed to a dwarf mistletoe seed rain: the trees in the center of the infection center likely have been exposed to infectious seed the longest. These trees can be rated using a severity rating system that essentially determines the proportion of tree crown infected. This is calculated in a way that gives a rating that is easily interpretable, in this case, 0-5.

This third map tells me about how severity is aggregated in the stand. I can see that the wide swath in the middle of the stand, associated with the fire refugia, has the largest aggregation of severely infected trees. This is what I expected in the stand because the trees in the fire refugia survived the fire and provide an infectious seed source for the post-fire regeneration. Also, on the edges of this high severity cluster, are lower severity values indicating the expected pattern of infection centers are playing out. The west side of the stand shows a large clustering of low severity ratings. We can see that the high density of uninfected trees, falls into our cold spot of low or no severity. Interestingly, the hot spot of trees found previously in the southwest corner, is actually a cluster of low severity trees. This may be a new infection center forming or an exterior edge of another infection center outside the plot. Lastly, the two pockets of low severity on the east side of the stand are more distinct when considering their severity.

This second application of hot spot analysis tells another story about my data and how dwarf mistletoe is patterned spatially. The non-significant swath in the center of my stand using the incidence data turns out to be a significant clustering of highly infected trees among other new observations.

Exercise 1: The Variogram

Question:

How does the variance between the depths to the first lava flow in my filed area vary with increasing distance?

The Tool: Variogram

I used a variogram to analyze the variance within my dataset. Variograms are discrete functions calculated using to measure the average correlation between pairs of measurements at various distance (Cameron and Hunter 2002). These distances are referred to as the binned distances (Anselin, 2016). In this study, binned distances determine the distances by which the depth to first lava flow is autocorrelated (Aneslin, 2016).

variog(geodata,coords = geodata$coords,data = geodata$data,max.dist=”number”) (R documentation)

The R code above need the geodata, an array of the data you are testing, the cords, or the coordinates those data correspond too.

Brief Methodology:

I selected data based on the quality of the descriptions, in the well log and assigned each well log I have interacted with a value from 0 to 3. Data with a score 0 represents either a dry well or a destroyed well. Data that has a score of 3 is well described and denotes either the depth to first water or the standing water level. I used data with a score of 3 for this analysis. I denoted the depth to the first lava flow in each of the well logs.

Figure 1: My field area, the well log locations are the blue circles, a rough delineation of my field area is in while.

First I normalized my data, giving a mean of 0 and a standard deviation of 1. Then I determined a max distribution, which determines the max bin size the variogram uses. The max distribution determines the lag increment, or the distances between which the increment is calculated (Anselin, 2016).

The data are projected, meaning that their horizontal distance is measured in meters. However, they are recorded in decimal degrees and located at the center of the township and range section that they are located in. Rather than convert my data from decimal degrees to meters, I recognized that there are around 111 km in 1 degree (at the equator), that there are 0.6214 miles in a kilometer and that there are 6 miles in a township. This helped me determine the max distribution for the variog function in R. I decided upon a max distributions of 0.09 and 0.5 degrees.
I then used R’s variog function on the normalized data with the max distribution of 0.09 and 0.5 degrees.

Succinct Results:

Figure 2: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.09 degrees. Note the low semivariance with the lower bin size (.02 to 0.08 degrees).

Figure 3: Variogram of the MLV-LP-BVM triangle with a max distribution of 0.5 degrees. Note the low saw-toothed pattern of semivariance. From 0.01 to 0.08, semivariance is low, it spikes up, and lowers again around 0.2, spikes again at 0.3 degrees and lowers again at 0.4.

Critique:

I tested max bin sizes of 0.5 and 0.09 degrees to see how the variogram changes with an increasing bin size. Changing the max bin size, called the max distribution in R, changes the variogram. The smaller bin size, 0.09 degrees, limits the max bin size to a narrow range of values. In effect, the variogram only tests the covariance of data points that are separated by a maximum .09 or degrees. Increasing the bin size increases the distance around which the covariance of the data points are tested. Thus, mad distributions of 0.5 result in a spikey plot. Normally, one would expect the variogram to plateau at larger bin sizes, representing large variance with the data with larger distances, but figure 2 does not.

At its simplest, the variogram in figures 1 implies that data points that are correlated in space are more likely to have similar depths to the first lava flow. You can see this in the low variance in you find in locations that are close to each other, the smaller distances, and the higher variance in distances that are farther from each other.

Figure 2, with the max distribution of 0.5 degrees one could made an argument for a distribution of the locations of lava flows based on the locations of volcanic centers. Medicine Lake Volcano lies in the North of the study area and Lassen Peak lies in the south. The two spikes in variance with location might be linked to the distances between those two centers. In other words, distances that correspond to low values of semivariance (>1) correspond to either regions that lie on the same lava flow, or another near surface lava flow sourced from another volcanic center in the region (figure 3). Rather than finding the same lava flow at depth, we are seeing different lava flows at similar depths.

Figure 4: My field area with the rough delineations of two of the near surface lava flows in the region. 1 degree of longitude corresponds to roughly 111 km.

Lava flows are not attracted or repulsed to or from each other, but they do follow the laws of physics. Often, Volcanoes build topography, when lava erupts from them, the lava will flow from the high elevations of the volcano, to basins, using paleo-valleys as conduits for flow. Thus, if you know the paleo topography you can understand where a lava might have flowed, and where it might emplace. Predicting paleo-topography can be difficult in old volcanic regimes, but on the geologic timescales we are looking at, I can predict that the topography of the MLV-LP-BVM triangle has not changed much over the past 5 ma.

Lavas flows from high to low topography and from Volcanos. The two main volcanos in my field area are Lassen Peak in the South and Medicine Lake Volcano, in the North. Moreover, lava flows from high to low elevation. Lavas emplace in basin, if they sit long enough, the top soils form, the basin subsides, and another the cycle repeats.

My data does have different spatial patterns at different scales. If you look at regions that are within 0.1 of a degree of distance then you would expect to see a similar depth to the lava flow. If you move to 0.5 of a degree, you see a sees-saw effect, where the depth to the lava flow moves from having lava close to the surface of deeper down. This variation stems from the proximity of the data point to the sources and the sinks of the lava flow.

I would use this technique again, it helped me think about my problem.

Sources:
Cameron, K, and P Hunter. 2002. Using Spatial Models and Kriging Techniques to Optimize Long-Term Ground-Water Monitoring Networks: A Case Study. Environmetrics 13:629-59.

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Exercise 1: The Spatial Patterns of Natural Resource Governance Perceptions

Question

What are the spatial patterns of natural resource governance perceptions in the Puget Sound?

Tools and Approaches

Moran’s I (with correlograms) and Semivariograms in R studio
Kriging and IDW in ArcGIS Pro
Hotspot Analysis in ArcGIS Pro

Analysis Steps

To compute Moran’s I, I used the “ape” library in R which has a function called Moran.I(). This function takes the variable in question (governance perceptions), and a distance matrix to compute the observed and expected values of Moran’s I, as well as the standard deviation and a p-value. For this analysis, I also subset my data to examine spatial autocorrelation by demographics including area (urban, suburban, rural), political ideology, life satisfaction, income, and cluster (created by running a cluster analysis on the seven variables which comprise the governance index). I created correlograms for the variables that were significant (urban, conservative, and liberal) using the “ncf” library and the correlog() function. These figures give a better picture of spatial autocorrelation at various distances. To create semivariograms, I used the “gstat” and “lattice” libraries which contain a function called variogram. This function takes the variable of interest along with latitude and longitude locations. The object created can then be plotted. For this analysis I used the same subsets as in the Moran’s I analysis.
To preform interpolation on my data, I loaded my point data into ArcGIS Pro. I then used the Spatial Analysis toolbox to preform Kriging and IDW to compare the outputs of the two techniques. I used my indexed variable of governance perceptions. The values of the variable vary from 1 to 7. I then also uploaded a shapefile bounding the sample area, as well as a shapefile of shoreline, to delineate my study area better.
To run a hotspot analysis I used my previously loaded point data inArcGIS Pro. I then used the Spatial Analysis toolbox to preform ‘hotspot analysis.’ I used my indexed variable of governance perceptions with values from 1 to 7. I used the shapefile of shoreline to delineate my study area better.

Results

The Moran’s I calculation was insignificant for rural, suburban, cluster groups, life satisfaction, and income, suggesting no spatial autocorrelation of governance perceptions by these subsets.

The Moran’s I calculation was significant for urban:

Observed value: -0.014

P-value: 0.0002

The Moran’s I calculation was also significant for ideology:

Conservative

Observed value: -0.006

P-value: 0.002

Liberal

Observed value: -0.002

P-value: 0.05

This suggests that in these subsets there is spatial autocorrelation between individual governance perceptions.

The semivariograms for the subsets that are significantly spatially autocorrelated are presented below.

None of these plots suggest high degrees of spatial autocorrelation. The urban plot does so more than the ideology plots, but the y axis scale is still very small.

The plot (top Urban, bottom left Liberal, bottom right Conservative) help to confirm the findings from above. The Moran’s I fluctuates around zero without much variation. The large spike in variation that the graphs do show are only for non significant points. Significant points are filled in, where non-significant points are open circles.

2. Interpolation

The kriging (bottom left) with individual points and IDW (bottom right), do not look incredibly different in terms of general trends. The kirging with shoreline (top) gives possibly the most interesting visual of spatial patterns. In general, perceptions are better (more green) in the center, where there is greater shoreline. There are also two sections that appear much more negative. To examine these locations further, I preformed a hotspot analysis.

3. Hotspot Analysis

This image confirms the two bright red spots from the interpolation to be “cold spots” or spots that the value of perception is significantly lower than the average perception (neutral) at a 99% confidence. The orange dots are a 95% confidence. The green corridor appears to hold in the southern part of the Sound and is confirmed at a “hotspot” or a spot that the value of perception is significantly higher than the areas surrounding it at a 99% confidence level.

The three main areas of red or orange correspond to the cities of Shelton (bottom), Port Angeles (west), and Everett with a little of south Whidbey Island (east).

Critique

I believe all methods are useful, but some are redundant. I think it would probably be sufficient to do only one of each type of method—spatial autocorrelation and interpolation—but it is interesting and more convincing to see the same type of analysis done in different ways. The p-values from the Moran’s I appear to agree with the shape of the curve’s in the semivariograms, where the smaller p-values have more defined shapes. The same goes for the interpolation methods, while they are interesting to see side-by-side, they essentially tell the same story. I think in this case, the hotspot analysis shows the most interesting interpretation of the data because it indicates areas of potential concern.

Monitoring Stage 0 Restoration Effects on the South Fork McKenzie River

Research Question

How is the spatial presence of aquatic organisms in the lower South Fork McKenzie River related to the spatial presence of hydromorphological changes induced by Stage 0 stream restoration?

Global hydrologic systems historically contained anastomosing (braided and connected) channels and active floodplains. Anthropogenic disturbances of the late 19^th and early 20^th centuries and prevailing natural disturbances are responsible for the channelization and incision of stream systems worldwide. Stage 0 riparian restoration restores degraded streams to historic conditions through large woody debris placement and filling in channels to increase aquatic habitat quality and availability. In this study, I will explore Stage 0 restoration effects on a 2 mile reach of the lower South Fork McKenzie River 45 miles east of Eugene, OR. I will test these effects by spatially relating early post-restoration species colonization to restoration-induced hydromorphological changes. These morphological changes include habitat features created by the reduction and redirection of water flow energy, such as riffles, pools, side channels, slack water, and sediment and organic material deposition. Aquatic organisms depend on these features for critical habitat. Therefore, sampling for aquatic species presence before and after restoration will indicate how successful Stage 0 restoration was in creating habitat features for these organisms.

Datasets

I will analyze two biological datasets to detect species presence and multiple remote sensing datasets to detect hydromorphological features. The first biological dataset includes pre- and post-restoration lentic (still freshwater) and lotic (rapid freshwater) aquatic macroinvertebrate samples taken at established transects and randomly throughout the study area. The second biological dataset includes eDNA samples taken at these same transects, intended to capture up to 48 species in a single sample (macroinvertebrates, fish, amphibians, crayfish). The remote sensing datasets include pre- and post-restoration aerial lidar, bathymetric lidar, Structure from Motion, RGB, multispectral, and thermal infrared products acquired with an Unmanned Aerial System for the 2 mile reach. The temporal resolution for all datasets are approximately 1 year (summer 2018 – summer 2019). The datasets are collected pre- and post-implementation (summer 2018) in the early summer during high flow conditions and early fall for low flow conditions. The spatial resolution of the UAS datasets is fine scale (sub-meter to cm).

Hypotheses

In the biological datasets, I expect to see macroinvertebrate presence clustered around features characterized by low velocity, shallow depth, and high organic material. These features allow macroinvertebrates foraging ease and shelter from predators. I expect species type to be dominated by post-disturbance colonizers throughout the system. I expect the eDNA data to return descriptions of upstream-downstream species presence. I expect non-macroinvertebrate species to be present in areas with more pools, side channels, and high wetted area, which provide habitat for resting, feeding, and nurseries. I expect the macroinvertebrate eDNA data to reflect the physical sampling data. In the remote sensing datasets, I expect habitat features to be present downstream of and adjacent to large woody debris, sediment deposits, and organic material, since these factors reduce and redirect flow energy for establishment of lentic and lotic features (e.g. pools, riffles, slack water).

Approaches

All analyses will consider pre- and post-restoration variables to determine rate and degree of change. I would like to learn about spatial autocorrelation analyses such as Moran’s I. It is likely that in these datasets I will find patterns of dependent observations based on localized conditions. For example, aquatic species presence may be grouped according to centralized nursery or hatch locations, so they are not truly independent samples. Fish species presence may be grouped based on macroinvertebrate presence as a food source, and certain species may be grouped by areas of similar substrate sizes. Regression analyses showing relationship strengths between different combinations of species and morphology variables (such as PCA) could indicate the likelihood of certain features affecting species colonization.

Expected Outcome

I will produce site maps showing point and polygon locations of certain hydrologic features alongside point locations of specific aquatic species presences and types. For the eDNA datasets, I will produce graphs indicating the presence and type of species detected. Ideally, I will be able to determine correlation coefficients for the relationships between specific hydrologic features created and the presence and location of aquatic species types, likely presenting them in a correlation matrix.

Significance

This research presents a novel opportunity to study Stage 0 restoration. Powers et al. (2018) present one of the only formal studies investigating Stage 0 restoration outcomes. This study will add to the limited knowledge surrounding this relatively unexamined strategy. To my knowledge, this proposal will be the first study testing Stage 0 UAS monitoring and one of the few existing studies linking aquatic organism sampling to UAS hydromorphology datasets. On a regional scale, U.S. Forest Service fisheries and hydrology divisions in the Pacific Northwest and across the United States will design restoration effectiveness monitoring objectives using results from this study. The McKenzie Watershed Council will determine their implementation techniques and success rates on future projects with results from this study. The study results will provide a viable Stage 0 restoration monitoring methodology for agencies and landowners on a global scale. The South Fork McKenzie River also sustains fish species listed as Endangered and Threatened under the Endangered Species Act, specifically the Chinook salmon (Onchoryncus tshawytscha) and bull trout (Salvelinus confluentus), respectively (USFWS 2019). These species use the South Fork McKenzie River for annual spawning and rearing habitat and feed on resident macroinvertebrates (Meyer et al. 2016). The Chinook salmon and bull trout are among the western U.S.’s most controversial game fish due to the environmental policies surrounding their protection. Conflicts among agencies, companies, and the public frequently arise concerning these species’ conservation. Researching restoration effects on Chinook salmon and bull trout habitat and food sources will provide scientific basis for management decisions regarding these fish.

Level of Preparation

I took an Arc-Info (ARC Macro Language) class during my undergraduate program. I am currently taking a Python class for geospatial programming, which is my only experience with this language. I have experience in image processing through an undergraduate degree in land use and GIS, multiple years of professional experience in remote sensing and GIS, and my current MS program in Forest Geomatics. I have academic and professional experience with R, C++, ArcGIS, and multiple types of remote sensing software for 2D and 3D analysis.

References

Meyer, K., Hammons, B., Hogervorst, J., Powers, P., Weybright, J., Bair, B., Robertson, G., Mazullo, C. (2016). Lower South Fork McKenzie River Floodplain Enhancement Project 80% Design Report. Prepared by the Willammette National Forest and McKenzie Watershed Council, March 4, 2016.

Meyer K. (2018). Deer Creek: Stage 0 alluvial valley restoration in the western Cascades of Oregon. In StreamNotes: The Technical Newsletter of the National Stream and Aquatic Ecology Center, David Levinson (editor). US FOrest Service, Fort Collins, CO, May 2018. https://www.fs.fed.us/biology/nsaec/assets/streamnotes2018‐05.pdf

Newson, M.D., Newson, C.L. (2000). Geomorphology, ecology and river channel habitat: mesoscale approaches to basin-scale challenges. Progress in Physical Geography 24(2), 195 – 217.

Powers, P. D., Helstab, M., & Niezgoda, S. L. (2019). A process-based approach to restoring depositional river valleys to Stage 0, an anastomosing channel network. River Research and Applications, 35(1), 3–13. https://doi.org/10.1002/rra.3378

U.S. Fish and Wildlife Service. (2019). Environmental Conservation Online System. https://www.fws.gov/endangered/

The Biogeography of Coastal Bottlenose Dolphins off of California, USA between 1981-2016

Background/Description:

Common bottlenose dolphins (Tursiops truncatus), hereafter referred to as bottlenose dolphins, are long-lived, marine mammals that inhabit the coastal and offshore waters of the California Current Ecosystem. Because of their geographical diversity, bottlenose dolphins are divided into many different species and subspecies (Hoelzel, Potter, and Best 1998). Bottlenose dolphins exist in two distinct ecotypes off the west coast of the United States: a coastal (inshore) ecotype and an offshore (island) ecotype. The coastal ecotype inhabits nearshore waters, generally less than 1 km from shore, between Ensenada, Baja California, Mexico and San Francisco, California, USA (Bearzi 2005; Defran and Weller 1999). Less is known about the range of the offshore ecotype , which is broadly defined as more than 2 km offshore off the entire west coast of the USA (Carretta et al. 2016). Current population abundance estimates are 453 coastal individuals and 1,924 offshore individuals (Carretta et al. 2017). The offshore and coastal bottlenose dolphins off of California are genetically distinct (Wells and Scott 1990).

Both ecotypes breed in summer and calve the following summer, which may be thermoregulatory adaptation (Hanson and Defran 1993). These dolphins are crepuscular feeders that predominantly hunt prey in the early morning and late afternoon (Hanson and Defran 1993), which correlates to the movement patterns of their fish prey. Out of 25 prey fish species, surf perches and croakers make up nearly 25% of coastal T. truncatus diet (Hanson and Defran 1993). These fish, unlike T. truncatus, are not federally protected, and neither are their habitats. Therefore, major threats to dolphins and their prey species include habitat degradation, overfishing, and harmful algal blooms (McCabe et al. 2010).

This project aims to better understand that distribution of coastal bottlenose dolphins in the waters off of California, specifically in relation to distance from shore, and how that distance has changed over time.

Data:

This part of the overarching project focuses on understanding the biogeography of coastal bottlenose dolphins. Later stages in the project will require the addition of offshore bottlenose sightings to compare population habitats.

Beginning in 1981, georeferenced sighting data of coastal bottlenose dolphin off the California, USA coast were collected by R.H. Defran and team. The data were provided in the datum, NAD 1983. Small boats less than 10 meters in length were used to collect the majority of the field data, including GPS points, photographs, and biopsy samples. These surveys followed similar tracklines with a specific start and end location, which will be used to calculate the sighting per unit effort. Over the next four decades, varying amounts of data were collected in six different regions (Fig. 1). Coastal T. truncatus sightings from 1981-2015 parallel much of the California land mass, concentrating in specific areas (Fig. 2). Many of the sightings are clustered nearby larger cities due to logistics of port locations. The greater number of coastal dolphin sightings is due to the bias in effort toward proximity to shore and longer study period. All samples were collected under a NOAA-NMFS permit.Additional data required will likely be sourced from publicly-available, long-term data collections, such as ERDDAP or MODIS.

Distance from shore will be calculated in a program such as ArcGIS or R package. These data will be used later in the project to compare to additional static, dynamic, and long-term environmental drivers. These factors will be tested as possible layers to add in mapping and finally estimating population distribution patterns of the dolphins.

Figure 1. Breakdown of coastal bottlenose dolphin sightings by decade. Image source: Alexa Kownacki.

Hypotheses:

I predict that the coastal bottlenose dolphins will be associated with different bathymetry patterns and appear clustered based on a depth profile via mechanisms such as prey distribution and abundance, nutrient plumes, and predator avoidance.

Approaches:

My objective is to first find a bathymetric layer that covers the coast of the entirety of California, USA to import into ArcMap 10.6. Then I need to interpolate the data to create a smooth surface. Then, I can add my dolphin sighting points and create a way to associate each point with a depth. These depth and point data would be exported to R for further analysis. Once I have extracted these data, I can run a KS-test to compare the shape of distribution based on two different factors, such as points from El Niño years versus La Niña years to see if there is a difference in average sighting depth or more common sighting depths based on the climatic patterns. I am also interested in using the spatial statistic analysis tool, Moran’s I, to see if the sightings are clustered. If so, I would run a cluster analysis to see if the sightings are clustered by depth. If not, then maybe there are other drivers that I can test, such as distance from shore, upwelling index values, or sea surface temperature. Additionally, these patterns would be analyzed over different time scales, such as monthly, seasonally, or decadally.

Expected Outcome:

Ideally, I would produce multiple maps from ArcGIS representing different spatial scales at defined increments, such as by month (all Januaries, all Februaries, etc.), by year or binned time increment (i.e. 1981-1989, 1990-1999), and also potentially grouping based on El Niño or La Niña year. Different symbologies would represent coastal dolphin sightings distances from shore. The maps would visually display seafloor depths in a color spectrum by 10 meter difference. Because the coastlines of California vary in terms of depth profiles, I would expect there to be clusters of sightings at different distances from shore, but similar depth profiles if my hypothesis is true. Also, data with the quantified values of seafloor depth would be associated with each data point (dolphin sighting) for further analysis in R.

Significance:

This project draws upon decades of rich spatiotemporal and biological information of two neighboring long-lived cetacean populations that inhabit contrasting coastal and offshore waters of the California Bight. The coastal ecotype has a strong, positive relationship with distance to shore, in that it is usually sighted within five kilometers, and therefore is in frequent contact with human-related activities. However, patterns of distances to shore over decades, related to habitat type and possibly linked to prey species distribution, or long-term environmental drivers, is largely unknown. By better understanding the distribution and biogeography of these marine mammals, managers can better mitigate the potential effects of humans on the dolphins and see where and when animals may be at higher risk of disturbance.

Preparation:

I have a moderate amount of experience in ArcMap from past coursework (GEOG 560 and 561), as well as practical applications and map-making. I have very little experience in Modelbuilder and Python-based GIS programming. I am becoming more familiar with the R program after two statistics courses and analyzing some of my own preliminary data. I am experienced in image processing in ACDSee, PhotoShop, ImageJ, and other analyses mainly from marine vertebrate data through NOAA Fisheries.

Literature Cited:

Bearzi, Maddalena. 2005. “Aspects of the Ecology and Behaviour of Bottlenose Dolphins (Tursiops Truncatus) in Santa Monica Bay, California.” Journal of Cetacean Research Managemente 7 (1): 75–83. https://doi.org/10.1118/1.4820976.

Carretta, James V., Kerri Danil, Susan J. Chivers, David W. Weller, David S. Janiger, Michelle Berman-Kowalewski, Keith M. Hernandez, et al. 2016. “Recovery Rates of Bottlenose Dolphin (Tursiops Truncatus) Carcasses Estimated from Stranding and Survival Rate Data.” Marine Mammal Science 32 (1): 349–62. https://doi.org/10.1111/mms.12264.

Carretta, James V, Karin A Forney, Erin M Oleson, David W Weller, Aimee R Lang, Jason Baker, Marcia M Muto, et al. 2017. “U.S. Pacific Marine Mammal Stock Assessments: 2016.” NOAA Technical Memorandum NMFS, no. June. https://doi.org/10.7289/V5/TM-SWFSC-5.

Defran, R. H., and David W Weller. 1999. “Occurrence , Distribution , Site Fidelity , and School Size of Bottlenose Dolphins ( Tursiops T R U N C a T U S ) Off San Diego , California.” Marine Mammal Science 15 (April): 366–80.

Hanson, Mark T, and R.H. Defran. 1993. “The Behavior and Feeding Ecology of the Pacific Coast Bottlenose Dolphin, Tursiops Truncatus.” Aquatic Mammals 19 (3): 127–42.

Hoelzel, A. R., C. W. Potter, and P. B. Best. 1998. “Genetic Differentiation between Parapatric ‘nearshore’ and ‘Offshore’ Populations of the Bottlenose Dolphin.” Proceedings of the Royal Society B: Biological Sciences 265 (1402): 1177–83. https://doi.org/10.1098/rspb.1998.0416.

McCabe, Elizabeth J.Berens, Damon P. Gannon, Nélio B. Barros, and Randall S. Wells. 2010. “Prey Selection by Resident Common Bottlenose Dolphins (Tursiops Truncatus) in Sarasota Bay, Florida.” Marine Biology 157 (5): 931–42. https://doi.org/10.1007/s00227-009-1371-2.

Wells, Randall S., and Michael D. Scott. 1990. “Estimating Bottlenose Dolphin Population Parameters From Individual Identification and Capture-Release Techniques.” Report International Whaling Commission, no. 12.

——-

Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki

My Spatial Problem: Streamflow variability and associated processes in rain-dominated, coastal basins

My research explores the variability in streamflow patterns in rain-dominated systems of the Pacific Northwest. Rain-dominated climate regimes occur primarily in the coastal portion of the PNW. Because precipitation only occurs as rain and does not occur in significant quantities during the summer season, underground storage is a crucial component of both the water cycle and streamflow stability in these systems. The objectives of my research are: first) to describe variations in stream hydrograph stability across multiple catchments and multiple catchment scales, and second) to use estimations of catchment storage processes to help explain potential variations in streamflow patterns.

I will be analyzing multiple datasets in order to meet my objectives. Those include: geophysical data, land-use/management data, and streamflow data. All landscape data will be analyzed at the finest spatial resolution available for the dataset. Hourly streamflow data will be analyzed for the available period of record, which varies by site. Second-fourth order streams in the Siletz and Smith river basins have hourly discharge data for the recent 5-8 years. USGS hydrological stations have similar data for 10-30 years.

I expect to see that streams in different hydrogeologic setting demonstrate different streamflow patterns over time. Streams located in more permeable, thicker lithosphere, may demonstrate more stable stream flows. In the winter, that may appear as muted storm peaks, while in the summer, that may appear as more sustained baseflows through the non-rainy season. Land cover may play an important role in streamflow regimes as well. The amount of water taken up and stored by vegetation may depend on the density, age, structure, and species composition of the forest. Watersheds with a large areas managed under industrial timber production may confound streamflow behavior.

I envision approaching my objectives by first developing some descriptive statistics for the hydrographs at each site. Such descriptors may include: recession analysis, dynamic storage, 7q10, arc peak, and various other streamflow analysis metrics with which I am not yet familiar. All sites are of varying contributing areas, so that will have to be taken into consideration in the analysis. Once descriptive statistics are developed across each site, I would like to integrate an analysis of the landscape attributes as potential explanatory variables for any variations observed in hydrograph statistics across space to see how much explanatory power they have, and how much variation there is.

The product of this project is expected to include both maps and statistical relationships. For maps, I would like to produce: 1) a depiction of the hydrogeologic settings across the study area, and 2) a depiction of expected streamflow patterns given the hydrogeologic settings. I would like to understand the statistical relationship between various streamflow metrics and the hydrogeologic setting of the given stream.

The results of this analysis may be important to resource managers and the scientific community as it will contribute information about hydrological processes, with a focus on the critical zone, in the PNW coastal landscapes. The persistence of streams and rivers in this region is crucial to several species of native salmonids, as well as to the economic well-being of the local communities. With potential variations in the climate, understanding underlying hydrological processes and the drivers of such processes may contribute to more proactive management approaches. Furthermore, an analysis that is relevant to the fine-scale processes that exist in the study area may contribute more accurate classifications of hydrologic regimes and analyses of streamflow vulnerability to long-term changes in the hydrological cycle.

As far as preparation for this project: I am well-versed in ArcGIS software. I have used Model Builder and Python some, though could use more practice. I am most comfortable with spatial and statistical analyses through R software. I have only used imagery analyses in a class, and I’d like to use it for my analysis so I am looking forward to learning more through this class.

A stain on the record? Have forest management practices set up PNW landscapes for a black-stain-filled future?

Describe the research question that you are exploring.

I am looking at how forest management practices influence the spread of black stain root disease (BSRD), a fungal root disease that affects Douglas-fir in the Pacific Northwest. While older trees become infected, BSRD primarily causes mortality in younger trees (< 30-35 years old). Management practices (e.g., thinning, harvest) attract insects that carry the disease and are associated with increased BSRD incidence. As forest management practices in the Pacific Northwest change to favor shorter-rotations of Douglas-fir monocultures, the distribution of Douglas-fir age classes is shifting towards younger stands and the frequency of harvest disturbance is increasing across the landscape. Though limited, our present understanding of this disease system indicates that these management trends, as well as the resulting disturbance regime and forest landscape age structure, may be creating favorable conditions for BSRD spread.

In this course, I would like to use spatial analyses to answer the question of whether forest management and the conditions that it creates act as a driver of the spread of black stain root disease. Specifically:

How do spatial patterns of forest management practices and the forest stand and landscape conditions that they create relate to spatial patterns of BSRD infection probabilities at the stand and landscape scale?
How do spatial patterns of forest management practices relate to landscape connectivity with respect to BSRD by affecting the area of susceptible forest and creating dispersal corridors and/or barriers throughout the landscape?

Example landscape with stands of three different forest management regimes (shades of green) and trees infected by black stain root disease (red). Forgive the 90s-esque graphics… NetLogo, the program I am using to develop and run my model, is powerful but old-school.

Describe the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I will be analyzing the raster outputs of a spatial model that I built in the agent-based modeling program NetLogo (Wilensky 1999). The rasters contain the states of forested landscapes (managed as individual stands) at a given time during the model run. Variables include tree age, presence/absence of trees, management regime, probability of infection, infection status (infected/not infected), and cause of infection (root transmission, vector transmission).

The forested landscapes I am looking at are about 3,000 to 4,000 ha, with each pixel representing a ~1.5 m x 1.5 m area that can occupied by one tree. I run each model for a 300-year time series with 1-year intervals, though raster outputs may be produced at 10-year intervals.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I hypothesize that landscapes with higher proportions of intensively managed, short-rotation stands will have higher probabilities of BSRD infection at the stand and landscape scales. In landscapes with high proportion of short-rotation stands, there will be large areas of suitable habitat for the pathogen and its vectors, frequent harvest that attracts disease vectors, and greater levels of connectivity for the spread of disease. In landscapes with a large proportion of older forests managed for conservation, I hypothesize that these forests will act as barriers to the spread of BSRD. High connectivity could be evidenced by greater landscape-scale dispersion of infections, whereas low connectivity would lead to a high degree of clustering of infections in the landscape.

I also hypothesize that intensively managed, short-rotation stands will have the highest probabilities of infection, followed by intensively managed, medium-rotation stands, and finally old-growth stands. However, I hypothesize that each stand’s probability of infection will depend not only on its own management but also on the management of neighboring stands and the broader landscape. At some threshold proportion of intensive management in the landscape, I hypothesize that there will be a shift in the scale of the drivers of infection, such that landscape-scale management patterns overtake stand-scale management as a predictor of infection probability.

Approaches: Describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to learn about landscape connectivity analyses and spatial statistics such as clustering/dispersion as well as spatiotemporal analyses to analyze the relationships between discrete disturbance events and disease spread. I would like to learn how to separate the effects of connectivity from the effect of the area of suitable pathogen habitat. I am most interested in using R or Python to analyze my data, and I would like to move away from ESRI programs because of my interest in open-source and free tools for science and the prohibitive cost of ESRI software licenses for independent researchers and organizations with limited financial means.

Expected outcome – What do you want to produce – Maps? Statistical relationships?

My primary interest is to evaluate statistical relationships between spatial patterns of management and disease measures, but I would also like to produce maps to demonstrate model inputs and outputs (i.e., figures for my thesis).

Significance – How is your spatial problem important to science? To resource managers?

From a scientific perspective, this research aims to contribute to the body of research examining relationships between spatial patterns and ecological processes and complex behaviors in ecological systems. This research will examine how the diversity of the landscape age structure and disturbance regimes affect the susceptibility of the landscape to disease, contributing to literature relating diversity and stability in ecological systems. In addition, “neighborhood” and “spillover” effects will be tested by analyzing stand-scale infection probability with respect to the infection probability of neighboring stands and more broadly in the landscape. Analysis of threshold responses to changes in stand- and landscape-scale management patterns and shifts in the scale of disease drivers will contribute to understanding of cross-scale system interactions and emergent properties in the field of complex systems science.

From an applied perspective, the goal of this research is to inform management practices and understand the potential threat of black stain root disease in the Pacific Northwest. This will be achieved by improving understanding of the drivers of BSRD spread at multuiple scales and highlighting priority areas for future research. This project is a first step towards identifying evidence-based, landscape-scale management strategies that could be taken to mitigate BSRD disease issues. In addition, the structure of this model provides a platform for looking at multi-scale interactions between forest management and spatial spread processes. Its use is not restricted to a specific region and could be adapted for other current and emerging disease issues.

Your level of preparation – How much experience do you have with: (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

Over the past 5 years, I have worked on and off with all the programs/platforms listed. For some, I have been formally trained, but for others, I have been largely self-taught. However, lack of continuous use has eroded my skills to some degree.

a. I have frequently used ArcInfo for making maps, visualizing data, and processing and analyzing spatial data. However, I do not have a lot of experience with spatial statistics in ArcInfo.

b. Modelbuilder/Python: Last spring, I took GEOG 562 and learned to program in Python, developing a script that used arcpy to prepare and manipulate spatial data for my final project. I felt comfortable programming in Python at that time, but I have not used Python much since the course.

c. I have frequently used R to clean and prepare data, perform simple statistical analyses (ANOVA, linear regression), and create plots. I have taken several workshops on using R for spatial analysis, but I have used rarely used the R packages I learned about outside of those workshops.

d. I have used ENVI to correct, patch, and combine satellite images, and I have performed supervised classifications to create land cover maps. I have worked primarily with LANDSAT images. I have also used CLASlite (an image processing software designed for classifying tropical forest cover).

e. Covered in part d.

References

Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

Adam Bouché

The Geography of Exclusion

Description of the research question:

My research focuses on vulnerable populations, specifically refugees and internally displaced peoples. This is a small part of a larger project funded by NASA, “Mapping the Missing Millions” and is largely defined as the “geography of exclusion.” I am hoping to understand why settlements have been excluded from global population datasets; we know that this happens often, but not specifically the mechanisms of why these settlements are missing from these datasets. Hence, my question recognizes that the classification methods used for population datasets are imperfect and I’m seeking to understand why they are imperfect. This means I will need to understand the spatial distribution of the settlements identified in both sets and analyze the intersections and exclusions between them and understand why these exist. This might also mean figuring out how close an OpenStreetMap settlement is to an urban center or a road and figuring out if these metrics affect the classification.

My research question is as follows:

How do the settlements identified by OpenStreetMap (OSM) compare to settlements identified in global population datasets via classification and what about these classification metrics fails to detect settlements known to OSM?

Description of the dataset:

The crux of my data is a comparison of UNHCR and OpenStreetMap (OSM) to a global population dataset, Global Human Settlement Layer (GHSL). OpenStreetMap is a global open source dataset and contains both point and polygon information. Through the UNHCR point data that identifies settlement locations, I have identified boundaries that are attributed as delineating refugee settlements. A potential disclaimer with OSM data is that it’s an open source dataset contributed to by volunteers. This means that attribution can be unclear or inconsistent, despite validation. I can also use other OSM data like roads and urban areas to expand my spatial analyses for a proximity assessment.

I will also make use of the rich Landsat and Sentinel data available for my spectral analysis. This will either be at 30 meter resolution (Landsat) or 10 meter resolution (Sentinel). The temporal extent depends on the satellite: Landsat 7 is from 2000 and forward; Landsat 8 is 2014 to present, and Sentinel-2 was launched in 2015.

For this class, I will focus my analysis on Uganda, given its high prevalence of refugee settlements and extensive OSM dataset with a strong Humanitarian OpenStreetMap Team presence.

Figure 1. Layoun Refugee Camp boundary (blue) in an urban false color composite of Landsat 8 imagery.

Figure 2. Global Human Settlement Layer overlay with Layoun Refugee Camp boundary. White indicates measured human settlement.

The images above are an example of a refugee settlement in Algeria. The area in blue in the NW corner is the settlement; the area in the SW is a nearby town. However, this settlement is not identified in the Global Human Settlement Footprint, although this specific settlement has existed since at least 2001.

Hypotheses:

I expect that settlements not detected by GHSL will have a different and less distinct spectral signature than settlements detected by GHSL. By “distinct,” I am referring to how different the spectral signature in the settlement is to the spectral signature immediately around the settlement. By “different” spectral signature, I am referring to the concept that the classification in GHSL is looking for a specific type of spectral signature, and that this does not match the spectral signature found in the settlements indicated by OSM. I also expect that settlements not detected by GHSL will be further from known roads and high density urban areas than settlements detected by GHSL.

Approaches & Analyses:

With my OSM data, I can use these vector boundaries to analyze the spatial and spectral patterns of these settlements. I will analyze the size of these settlements, the spectral signature in these settlements, proximity to resources (roads, water, cities).

With the global population dataset, I can identify pixel clusters that indicate settlements, and perform similar analysis to identify size, spectral signature, and proximity to resources.

While these analysis can help me identify the differences between these settlements, I also still need to analyze the classification methods of GHSL to understand why these differences might be significant and have resulted in different settlement detections.

Expected Outcome:

I will need to present the statistical relationships between the refugee settlements that are and are not detected in my target population dataset. Because I’m also seeking to understand why these settlements are excluded in the classification, I will need to connect the spatial relationships that I find with the classification methods that GHSL uses. This will be a more verbal description, but I plan to make maps to illustrate these spatial relationships and characteristics. These relationships and characteristics include settlement size, border complexity, proximity to roads, and spectral signature.

Significance:

This project addresses the exclusion of settlements and populations within various global datasets. This has a greater relevance given that so much derived data relies on this, whether for distributing aid and resources, analyzing displacement, or understanding human migration. By understanding what factors contribute to the inclusion or exclusion includes settlements in these datasets, more users can understand the limitations of what is possible to detect and where the gaps in population detection is more likely to occur.

Level of preparation:

I have substantial experience with ArcInfo products. I’ve been using ArcGIS Pro for over a year now, and prior to that I spent 2 years working with ArcMap daily in a professional capacity, took three classes that exclusively taught in the ArcDesktop interface, and employed ArcInfo for projects in multiple other classes. My image processing skills are also extensive, ranging from two classes using ENVI Classic, a GIS internship that included georeferencing satellite imagery, and most recently a class and outside research using Google Earth Engine. My experience with R is limited to a summer research project in 2016. I have some basic programming in GIS skills (very limited ArcPy use but recent and frequent ModelBuilder use) and will be learning more throughout this term as a participant in Robert Kennedy’s GIS Programming class.

A UAS and LiDAR based approach to maximizing forest aesthetics in a timber harvest

Bryan Begay

Research Question:

Can LiDAR derived from an Unmanned Aerial System (UAS) create a point cloud driven visualization model for maximizing forest aesthetics in a highly visible timber harvest?

Context

A variable retention thinning is planned to be implemented in a harvest unit on the McDonald-Dunn Forest in a visible area near Corvallis. UAS systems offer an efficient way to collect data over large areas to create high quality data sets from LiDAR that can capture the structure of a forest stand. There is a need for a model/methodology that utilizes UAS LiDAR point clouds to generate a visualization model to create a timber harvest in an areas with high visibility that maximize forest aesthetics. Inputs for the model include DTMs, Google Earth Pro view shed tool, and point clouds. The point clouds can be manipulated to visualize an optimal silvicultural prescription that maximizes forest landscape aesthetics. Ancillary data of view shed and terrain from DTMs are inputs expected to help create a visualization model.

A description of the data set you will be analyzing, including the spatial and temporal resolution and extent:

The data set I will be using will include high resolution LiDAR point clouds of a stand, Digital Terrain Models (DTM) from LiDAR point clouds flown by the USFS previously, and additional ancillary data from Google Earth Pro. The Google Earth Pro data will use the view shed tool for assessing the visual impact of regions in the harvesting unit. The spatial resolution will be using high resolution LiDAR point clouds on an area that is a few square kilometers. The temporal resolution will span data acquisition before the harvest, and then an assessment of the computer based prescription after harvest. The temporal resolution of the point cloud collected from the UAV will be collected in a discrete time frame of one day. The DTM data set and google earth pro data sets will be variable, but I anticipate them to be newer high resolution Google Earth imagery and high resolution LiDAR data sets.

Hypotheses:

I hypothesizes that LiDAR point clouds can be used in a visualization model to create a silvicultural prescription in a timber harvest that maximizes forest aesthetics in a logged area . Google Earth Pro view shed tool, high quality LiDAR point clouds, and a large body of literature on forest aesthetics provide a data set that is very rich in inputs to create a visualization model for timber harvests that maximizes forest aesthetics.

Approaches:

I would like to do some sort of analysis looking at the spatial relationship between forest aesthetics and timber harvests. A part of this analysis would look at the relationship of the spatial pattern of residual structure left from the thinning and the landscape aesthetics.

Expected outcome:

I would like an expected outcome to be a visualization model of the harvest unit that utilizes view shed and point clouds that maximizes forest aesthetics in a high viewership area.

Significance:

This spatial problem is important to the profession of forestry as well as other land managers, since it helps maintain the social license for foresters to practice forestry in areas that are highly visible. Public acceptance of harvesting practices is increased when forest aesthetics is taken into account, so creating a methodology and model to assist in creating silvicultural prescriptions that increase forest aesthetics is critical for public acceptance of forestry.

Level of preparation:

A. I have experience in ArcGIS.

B. No experience in modelbuilder and Python programming in GIS.

C. Some experience in R.

D. Experience in Digital Image Processing.

E. I’ve used Google Earth Engine and very little experience with MATLAB.

How winter wetland habitat change over time affects songbird communities

A description of the research question that you are exploring.

For my research, I am exploring the relationship between the spatial pattern of the differences between present (2019) and past (1995) wintering songbird community composition metrics (abundance, richness, evenness, and weighted rarity) and the spatial pattern of landscape-level land cover variable changes (listed here: Landscape Variables) in the same timeframe via the mechanisms of change over time and landscape-level variable importance to habitat suitability. I will be looking at data for 20 wetland wintering habitat sites in the Corvallis area.

I am interested in comparing the community composition metrics (abundance, richness, evenness, weighted rarity) of songbirds from 1995 to those found in 2019. I will then look at how the above-mentioned landscape level variables at these sites (within a 100m buffer, 500m buffer, and 1km buffer from the wetland) have changed from 1995 to 2019 at those same sites to determine if and what variable changes influence songbird community composition.

Other spatial factors likely influence songbird community composition metrics but I am only concerned with those that were included in Adamus’s 2002 dissertation study.

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I have species richness data in the form of a spreadsheet from 1995 including songbird abundance data for my 20 sites recorded via point count surveys from January 4th to March 20th. I have the same information collected by the same methods at those same sites from January 4th 2019 to March 20th 2019. I will use this data to calculate the species metrics (listed above).

I am still in the process of gathering my spatial datasets for this project. As of now, I have open source areal imagery from Google Earth Engine that my advisor used to analyze the sites in 1995 and that same areal imagery from 2019. I hope to locate LIDAR data, NVI data, and more sophisticated ground cover data for my sites in this class. One of the reasons I enrolled in this class was to get ideas and aid with obtaining this data.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I predict that the greater the change in the landscape level variables at a site from 1995 to 2019 the greater the difference in community composition measurements between those years via the landscape level variables influence on habitat suitability for wintering songbirds. Additionally, I think the changes in the variables that my advisor determined to be most influential to wintering songbird community composition metrics at these sites in 1995 will have the greatest effect on the change in species measurements from 1995 to 2019. For example, he found that wetlands with a higher percentage of open canopy forest cover in the surrounding area had a positive correlation with high abundance (Adamus, 2002) at a site so I hypothesis that those sites that have lost the most open canopy forest will also have the greatest decrease in abundance.

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to learn how to use LIDAR and/or NVI data to determine ground cover in regards to my advisors’ categories (attached). I would also like to explore methods for comparing the amount of change from 1995 to 2019 at my sites that is suitable to my data and my purposes.
Expected outcome: what do you want to produce — maps? statistical relationships? other?
I expect to have a spreadsheet with quantified variable change as well as species richness measurements in order to reevaluate variable importance as well as the statistical relationship between landscape level variable differences and species richness differences. As an intermediate step, I will also produce a map(s) portraying the change at my sites from 1995 to 2019.

Significance. How is your spatial problem important to science? to resource managers?

The quality of wintering habitat is correlated with overall survivability and reproductive success for songbirds the following year (Norris et al. 2004). It is important to know how these habitats have changed as well as the consequences of those changes in regards to songbird community metrics. Therefore, it is extremely important for both science and resource managers. If we want to assure that our environment remains healthy and balanced with stable songbird communities it is this work and work like it is necessary. It is also important to those who wish to manage songbird populations so they know where to allocate resources when it comes to habitat variables to preserve.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

a. I am in between proficient and having a working knowledge of the basics of Arc-Info from taking various courses and teaching the basics of Arc-Pro.b. I have a working knowledge with modelbuilder and GIS programming in Python because I took courses on both subjects and now help teach a programming for ArcGIS class.
c. I have a working knowledge of R because I took an R course related to species distributions and I have used R in two statistics courses.
d. I am a novice in image processing but did take a digital terrain modeling course last term.
e. I am a novice in but would like to learn about software that helps me analyze NVI and LIDAR data

References

Adamus P,. 2002. Multiscale Relationships of Wintering Birds with Riparian and Wetland Habitat in the Willamette Valley, Oregon. Oregon State University.

Norris R., Marra P., Kyser K., Sherrt W., & Ratcliffe L. 2004. Tropical Winter Habitat Limits Reproductive Success on the Temperate Breeding Grounds in a Migratory Bird. Biological Sciences (271).

Assessing western juniper sapling re-establishment in a semiarid watershed

1. My spatial problem

A description of the research question that you are exploring.

Research question: How is the spatial pattern of juniper density (A) related to the spatial pattern of slope, aspect, and a combination of the two (B) via soil moisture and solar radiation?

The expansion of western juniper has become a concern in many rangeland areas, and is associated with a number of ecological and hydrological impacts [1,2]. The study site is located in a semiarid watershed in central OR, and was established to assess ecohydrological characteristics associated with juniper expansion and removal. The majority of western juniper was removed from this watershed in 2005 -2006 and juniper saplings have become re-established in this area. The objectives of this project are 1) to determine the relationship between the density of western juniper sapling re-establishment and slope and aspect in this watershed and 2) assess density changes in vegetation cover at this watershed since juniper removal.

The intent of this project is also to establish a methodology that will be expanded to include patterns of soil moisture, soil surface temperature, and soil type (but additional data needs to be collected). While juniper density and vegetation cover for the purposes of this project are the dependent variables, in the future I am exploring how certain ecohydrological characteristics vary with juniper density.

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

A combination of National Agriculture Imagery Program (NAIP), unmanned aerial vehicle (UAV) imagery, and ground-based data will be used. Forty-one belt transects (3m by 30m) representing areas of varying aspect and slope were conducted in summer of 2018 to assess juniper density across the entire watershed. UAV-based multispectral imagery (red, green, blue, red-edge, and near-infrared bands) was collected at a study plot in the watershed in October 2018, but needs to be expanded to represent topography of the watershed. Resolution of the UAV imagery is approximately 2.5cm/pixel. NAIP imagery (1 m resolution) will be used to assess density over time. A 10 m digital elevation model (DEM) will be used to model the topography of the watershed. Alternatively, a higher-resolution DEM created from UAV imagery may be used if it is available. Prior to analysis, support vector machine (SVM) supervised classification will be used to identify juniper in the UAV and NAIP imagery and to assess overall vegetation cover in NAIP imagery. It should be noted that while I have used SVM classification successfully with multispectral UAV imagery to assess juniper density (shown in the figure below), the accuracy of this process will need to be assessed with the NAIP data as there is a large difference in spatial resolution (2.5 cm compared to 1 m). If I am unable to identify juniper successfully in NAIP imagery, then only UAV and ground-based data will be used to assess juniper density.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I expect that juniper density will be positively related to north aspects and negatively related to slope angle as these characteristics promote higher levels of soil moisture. However, these patterns may vary with both soil type and amount of non-juniper vegetation cover. Similar to past studies [2], we may anticipate that overall vegetation cover may change at this watershed in response to the removal of juniper. Changes in vegetation cover associated with juniper density may be related to juniper transpiration, soil moisture, and canopy interception.

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

Initially, I will assess the spatial pattern of slope, aspect, and a combination of the two in the watershed. Further, the relationship between observed juniper density and the slope angle and aspect characteristics associated with highest soil moisture will be assessed. I would like to integrate the NAIP imagery, DEM, transect data, and UAV imagery for analysis of juniper density. During this term, this model will focus on slope and aspect but I would like to establish a model that I can expand to use to assess other characteristics as well (such as soil moisture, soil temperature, and vegetation cover). Using the NAIP imagery, I would like to see assess temporal changes in vegetation cover. In particular, I am interested in understanding how to best use data with different resolutions and how to expand this methodology for more complicated analysis in future research.

Expected outcome: what do you want to produce — maps? statistical relationships?

I would like to create maps characterizing juniper density within the watershed and determine if there is a statistical relationship between juniper density and slope and aspect (that can be expanded as more variables are addressed). Additionally, I would like to assess changes in total vegetation cover over time in this watershed.

How is your spatial problem important to science? to resource managers?

The expansion of western juniper is associated with a number of ecohydrological changes, to include reduced undercanopy soil moisture [3], reduced productivity[4] and increased erosion[5]. The removal of western juniper is labor intensive and expensive, and can be difficult to implement. An improved understanding of the spatial patterns associated with juniper re-establishment after removal can be used to target treatment when appropriate but also to improve our understanding of the ecohydrologic impacts of juniper expansion within these communities. This analysis can also provide information about which areas are at greatest risk based on topography. Additionally, understanding changing in vegetation cover helps to determine the timing and impacts of removal. While beyond the scope of this project, this information may be used to determine how the rate of juniper expansion relates to other ecohydrological characteristics over time.

2. Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

I have used ArcMap/Pro a moderate amount in a classroom setting (GIS 1&2, GIS in water resources), for independent research, and in the workplace. Outside of a classroom setting, I have largely used ArcInfo for map creation and geospatial analysis (primarily unsupervised and supervised classification).
I have used Modelbuilder a limited amount, and it has largely been limited to class assignments in Geog 560/561. I do not have experience with GIS programming in Python.
I have some experience using R, primarily for basic statistical analysis. I have no experience using R for spatial statistics.
I have limited experience using ENVI, but I would need to refresh the basic procedures. I do not have experience with MATLAB.
I have some experience with Google Earth Engine, although I would need to review the basics to be effective.

References

Coultrap, D.E.; Fulgham, K.O.; Lancaster, D.L.; Gustafson, J.; Lile, D.F.; George, M.R. Relationships between western juniper (Juniperus occidentalis) and understory vegetation. Invasive Plant Sci. Manag. 2008, 1, 3–11.
Dittel, J.W.; Sanchez, D.; Ellsworth, L.M.; Morozumi, C.N.; Mata-Gonzalez, R. Vegetation response to juniper reduction and grazing exclusion in sagebrush-steppe habitat in eastern Oregon. Rangel. Ecol. Manag. 2018, 71, 213–219.
Lebron, I.; Madsen, M.D.; Chandler, D.G.; Robinson, D.A.; Wendroth, O.; Belnap, J. Ecohydrological controls on soil moisture and hydraulic conductivity within a pinyon-juniper woodland. Water Resour. Res. 2007, 43, W08422.
Miller, R.F.; Svejcar, T.J.; Rose, J.A. Impacts of western juniper on plant community composition and structure. J. Range Manag. 2000, 53, 574–585.
Reid, K.; Wilcox, B.; Breshears, D.; MacDonald, L. Runoff and erosion in a pinon-juniper woodland: Influence of vegetation patches. Soil Sci. Soc. Am. J. 1999, 63, 1869–1879.

Cross-sectional Change in the Andrews Forest

Background and Research Questions

This project explores stream bed mobility in the HJ Andrews experimental forest in relationship to peak discharge events and channel geometry. The HJ Andrews Experimental Forest is a Long Term Ecological Research (LTER) Site located east of Eugene on the western slope of the Cascade Range. The site has been managed since the 1950s for ecological and forestry research, and the largest stream in the forest has been gauged since 1950. From 1978 to 2011, researchers conducted repeated cross sectional surveys on five reaches in three different creeks in the forest. An analysis of these cross-sectional profiles will help researchers and managers gain a better understanding of how, when, and where stream beds responds to extreme hydrologic events.

Water flowing through streams exerts stresses on the bed material. Whether or not these stresses have the capacity to mobilize sediment and change the shape of a stream bed depends on the amount of water moving through the stream along with other factors. As with many geomorphological processes, bed sediment transport is dominated by movement during extreme events. Although cross sectional changes overall appear to be strongly related to the magnitude of greatest flow between measurements, I am interested in investigating confounding factors including the extent of temporal and spatial autocorrelation in the data set.

I would like to explore (a) if and (b) in what manner aggradation or erosion in one cross section might be related to changes in adjacent cross sections. I would also like to know if aggradation or erosion in one cross section in one year are related to changes in the same cross section in an adjacent year.

Data

I am analyzing cross sectional measurements of five reaches within the Andrews Forest, which range from 12 m to 55 m in horizontal extent and 1.2 to 7.3 meters in vertical extent. The cross sections are variously spaced along an along-stream dimension, and they are were surveyed every one to five years over a period of 30 years between 1978 and 2011. The vertical precision of the data set is roughly 1 cm (though the data are unlikely to be accurate to 1 cm) and the horizontal precision varies from 1 cm to several decimeters.

The cross sectional change between two adjacent pairs of years at one cross section is shown below. Areas of erosion are shown in red and areas of aggradation are shown in green.

Hypothesis and Approach

I predict that there will be a relationship between changes at one cross section during one year and the same cross section at another year. Portions of the stream bed that have been recently scoured or contain newly emplaced sediment may be less armored than undisturbed portions of the stream bed. These less armored portions of the stream bed may be more susceptible to future disturbance. Alternately, changes at a cross section may represent longer-term processes related to channel geometry: e.g. a series of cross sections could show continued incision of a cut bank over multiple years.

I do not expect to see a detectable relationship between changes at adjacent cross sections because I expect that the most important geographic controls on channel change are either smaller (e.g. pools) or much larger (e.g. along-stream variation in discharge) than the distance between cross sections.

I want to use this class as an opportunity to explore statistical relationships, but I don’t know yet what kinds of analyses are best suited to this problem. I’d like to learn more in general about how to handle spatial autocorrelation, and when it is and isn’t a statistical issue.

Justification
This project could be scientifically useful for improving our understanding of sediment transport in Pacific Northwest mountain streams. Resource managers may also have an interest in sediment transport because it relates to stream channel mobility (“How much can we depend on this creek staying in the same place?”) and ecology (“How vulnerable is stream life to disruption via bed transport?”). From a resource management perspective, it is becoming increasingly useful to study peak flow events because downscaled climate models for our region predict increased frequencies of large storms.

Preparation

I feel good about my experience with spatial technology, and I’m most interested in learning about how to use that technology to answer statistical questions. I am highly proficient with Arc software. I have TAed one undergraduate class and independently taught another short undergraduate class in ArcGIS. I have a working knowledge of ArcPy, but I still need to use references regularly to write code. I conducted some undergraduate remote sensing research using ArcPy and used ArcPy for research at a government science agency. I work extensively in R. I’ve done image processing using Arc software, Python, ENVI, ImageJ, and raster tools in R, but it’s a very broad field, and I definitely think I could learn more.

Aerial remote sensing detection of Leptosphaeria spp. on Turnip

Introduction

Of the many pathogens disrupting healthy growth of brassica species in the valley is the pathogen commonly referred to as Blackleg. This fungal pathogen has been reported to nearly wipe out canola production in Europe, Australia, Canada and in more recent years has devastated the United States (West et al., 2001). In 2013 the pathogen was reported for the first time in the pacific northwest since the 1970’s (Agostini et al., 2013) and has since been reported in the Willamette Valley (Claassen, 2016). There are two known species of Blackleg, Leptosphaeria maculuns and Leptosphaeria biglobosa. These are not to be mistaken with the potato bacterial pathogen Pectobacterium atrosepticum, which is also referred to as Blackleg. While much of the crop failure in canola has been associated with Leptosphaeria maculuns, both species are found in the valley and seem to be of similar consequence to turnip.

Classification of cercospera leaf spot for instance has been accomplished applying a support vector machine model but utilized a hyperspectral camera with high spectral and spatial resolution (Rumpf et al., 2010). Because plant diseases can oftentimes be difficult to see even with the naked eye, researchers have struggled to successfully detect specific plant diseases as spatial resolution decreased. While this analysis focuses solely on detection at 1.5 meters, it is possible for detection of blackleg despite lowered spatial resolution as result of increased flying elevations.

Here we consider how spatial patterns from diseased leaves is related to ground truth disease ratings of turnip leaves based on spectral signatures. With the application of a support vector machine model, classification of diseased versus non-diseased tissue is expected to generate a predictive model. This model will be used to determine if single turnip leaves which are diseased and non-diseased are accurately categorized based on the ground truth.

Data

This project will be using a data set derived from roughly 500 turnip leaves, harvested here in the valley during the months of February to April of 2019. Roughly 200 of these leaves will be used in training the data model. The remaining leaves will be used for the data analysis in determining accuracy of the model versus ground truth. The spatial resolution is less than 1 mm and images are in raster format of single turnip leaves with five bands (blue, green, red, far-red, NIR). I do not anticipate using every band for the analysis and will likely apply some VI as a variable. The camera being used is a mica sense Red-edge which is 12-bit radiometric resolution.

Hypothesis

I hypothesize that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves.

Approaches

In order to accomplish this analysis, I will be using ArcGIS Pro where I have quite a bit of experience, but not particularly on this subject or type of analysis. The workflow for analysis will begin with image processing where I have little experience but don’t require expertise in this area. I hope to conduct the image processing in Pix4D where I will begin with image calibration based on the reflectance panel in each image. Followed by cropping down to simply the leaf under assessment. From here there may be some smoothing and enhancing the contrast of the image but is still undetermined.

Images will then be brought into ArcGIS Pro for conducting a spatial analysis. I intend to use spatial pattern analysis of manually classified disease versus unsupervised segmentation of the leaves for exercise 1. Next I plan on then using this information in spatial regression to improve image-based classification for exercise 2. For exercise 3 I intend to use the support vector model wizard, which will be used for training a model. This involves highlighting regions of diseased tissue and regions of non-diseased tissue to obtain a trained model when a sufficient number of pixels to create support vectors is reached. The x and y-axis for the model are yet to be determined but will likely be NIR and red-edge digital number values. Some alternatives are using different VI’s such as NDVI as explanatory variable. Turnip images which were never used for training the model will be used for analysis of the support vector machine’s ability to classify diseased or non-diseased regions and then leaves entirely. I anticipate every leaf to have at least a few pixels which will classify as diseased and will therefore set a threshold for a maximum number of diseased pixels in the image, while yet classifying it as non-diseased. I also might require a certain number of pixels to be bordering one another qualify as diseased region. The methodology may require some troubleshooting, but the expectations are clear and the methods to reach that outcome are mostly drawn out.

Expected Outcome

I expect the model to have very high accuracy after the model is fine tuned for accuracy based on contrast in spectral signatures I expect to see between diseased versus non-diseased leaves. Below I have outlined the three outcomes I would like to ultimately achieve. Due to time restrictions, the scope of my research is limited to outcomes 1 & 2 below.

Train a support vector machine model for classification of pixels in turnip leaves as either diseased or non-diseased.
Accurately apply the SVM model on turnip leaves from many geographical locations in the valley with different levels of diseases severity and different times in the year.
Scale up from 1.5 meters and test the ability of the model to maintain accurate classification of blackleg on turnip.

Significance

I intend to publish and further the collective knowledge in aerial remote sensing. This more specifically applies to those in the area of agronomy or plant pathology. This is very applied science and is a resource for those in the industry of agriculture.

Traditionally detection for this pathogen has depended on a reliable field scout who may need to cover fifty acres or more looking for signs or symptoms of this disease. Nowadays, precision agriculture has introduced the use of drones to perform unbiased field scouting for the grower. This saves time and can be very reliable if done properly. An important aspect of disease control relies on early detection. If early detection can be accomplished, growers have time to respond accordingly. This may allow for early sprays with lighter applications rates or less controlled substances, cultural control of nearby fields, etc. in order to stop the spread of disease.

Works cited

Agostini, A., Johnson, D. A., Hulbert, S., Demoz, B., Fernando, W. G. D., & Paulitz, T. (2013). First report of blackleg caused by Leptosphaeria maculans on canola in Idaho. Plant disease, 97(6), 842-842.

Claassen, B. J. (2016). Investigations of Black Leg and Light Leaf Spot on Brassicaceae Hosts in Oregon.

Rumpf, T., Mahlein, A. K., Steiner, U., Oerke, E. C., Dehne, H. W., & Plümer, L. (2010). Early detection and classification of plant diseases with Support Vector Machines based on hyperspectral reflectance. Computers and Electronics in Agriculture, 74(1), 91-99.

West, J. S., Kharbanda, P. D., Barbetti, M. J., & Fitt, B. D. (2001). Epidemiology and management of Leptosphaeria maculans (phoma stem canker) on oilseed rape in Australia, Canada and Europe. Plant pathology, 50(1), 10-27.

Courtney’s Spatial Problem

In the formula “How is the spatial pattern of A related to the spatial pattern of B via mechanism C?”, my research question for this class is made of the following parts:

A: ion and isotope concentrations in wells tapping the basalt aquifer
B: mapped faults
C: groundwater flow as determined by hydraulic conductivity of the geologic setting

This all adds up into:

“How is the spatial pattern of ion and isotope concentrations in wells tapping the basalt aquifer related to the spatial pattern of mapped faults via the mechanism of groundwater flow as determined by hydraulic conductivity of the geologic setting?”

This question might have thrown you, the reader, into the figurative deep end! For context, I’m studying the basalt aquifer in the Oregon portion of the Walla Walla Basin. This is located in north eastern Oregon, about thirty minutes northeast of Pendleton.

map showing the state of Oregon, with an inset map showing the study area

Figure 1: Walla Walla Sub-Basin Location

The wells that I am studying are drilled into the three most extensive formations within the Columbia River Basalts that are present in my study area: the shallow Saddle Mountains layer, the intermediate Wanapum Basalt , and the deepest Grande Ronde Basalt. The wells and faults that I am studying are predominantly on the transition area between the “lowland” areas where the basalts are covered by layers of sediment deposited in the ~15 million years since they were deposited and the upland areas where the basalt is exposed at the surface.

geologic map showing formations, folds, and faults in the entire Walla Walla Basin, highlighting my study area

Figure 2: Geologic map of the study area

Wells in this area are between 300 and 1,200 feet in depth, and primarily serve irrigation and municipal uses. Over the past 50 years there has been a noticeable downward trend in groundwater elevations in many of the wells. My research is part of a larger Oregon Water Resource Department project that seeks to better understand this groundwater system, faults and all. Faults add an element of the unknown to models of groundwater flow unless they are specifically studied, as they can formed either barrier or pathways for groundwater flow depending on their structures and characteristics. Faults with low hydraulic conductivity can block or significantly slow groundwater flow, while faults with higher hydraulic conductivity allow water to flow through them more easily. My research uses ion chemistry and isotope concentrations to characterize the path that groundwater has taken through the subsurface into the well.

Datasets:

A: In my research I have analytical data for 32 wells, whose XY locations were determined by field confirmation of my collaborators’ well log database. As groundwater is a 3D system, I have to consider Z values as well. The well depths and lithology information is also from my collaborators’ database, and was based on information of varying quality recorded at the time that the well was drilled. My analytical data provides a snapshot of water chemistry during the summer of 2018. I have only one temporal data point per well. At all 32 wells, I collected samples to be analyzed for pH, temperature, specific conductivity, oxygen isotopes 16 and 18, and hydrogen isotopes 1 and 2. At a subset of 18 of those wells I collected additional samples for tritium, carbon 14, and major ion analysis.

B: The shapefile of faults mapped at the surface was created by Madin and Geitgey of the USGS in their 2007 publication on the geology of the Umatilla Basin. There is some uncertainty in my analysis as far as extending this surface information into the subsurface. USGS studies have constrained proposed ranges of dip angles for the families of faults that I am studying, but not exact angles for any single mapped fault.

Figure 3: locations of wells that were sampled mapped with mapped fault locations.

Hypotheses:

Where faults act as barriers, I hypothesize that parameter values will differ in groups on either side of a fault. Specifically, a barrier fault might cause older, warmer water to rise into upper aquifer layers, and the downstream well might show a signature of more local recharge.

Where faults act as conduits, I hypothesize that water chemistry and isotopes of samples from wells on either side of the fault would indicate a relatively direct flowpath from the upstream well to a downstream well. Over a short distance, this means that ion and isotope concentrations would not differ significantly in wells across the fault.

Approaches:

I would like to use principal component analysis to identify grouping trends of the samples, and then map the results. Additionally, a bivariate comparison of wells on either side of the fault could be interesting? I would like to find some way to bring in distance from a fault into the model too.

Expected outcome: My output would be a mixture of statistical relationships and maps of those relationships.

Significance. How is your spatial problem important to science? to resource managers?

The Walla Walla Subbasin’s basalt aquifers have recently been deemed over-allocated by the Oregon Water Resource Department (OWRD), and water managers are looking for methods to better regulate the aquifer when wells run dry. However, the faults are a big unknown when considering how to enforce the prior appropriation doctrine where junior permit holders are regulated off in times of water shortage. If a junior and senior water permit holder have wells that are separated by a fault, is it likely that stopping the junior permittee’s water use would actually result in more water available to the senior permit holder?

My approach is not novel in my scientific field. Several studies have evaluated similar parameters elsewhere in the Columbia River Basalts and also used statistical methods, but have not focused specifically on faults.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

Arc-GIS: Highly proficient in Desktop and Pro

R – novice, can copy/paste/edit code to suit my basic needs

Python – novice, took a class two years ago but have forgotten much of it

Image processing – working knowledge of ENVI from GEOG 580, and of Gravit from GEOG 572

Other spatiotemporal analysis – I haven’t really worked with software besides ArcGIS or QGIS?

Individual foraging specialisations of gray whales along the Oregon coast through prey preferences

Research Question

The Pacific Coast Feeding Group (PCFG) is a subgroup of the Eastern North Pacific (ENP) population of gray whales (Scordino et al.2018). The ENP, a population of 24,000-26,000 individuals, migrates along the U.S. west coast from breeding grounds in the lagoons of Baja California, Mexico to feeding grounds in the Bering Sea (Omura 1988). The PCFG, currently estimated at 264 individuals (Calambokidis et al.2017), stray from this norm and do not complete the full migration, instead choosing to spend their summer feeding months along the Pacific Northwest coast (Scordino et al.2011). Since gray whales as a species already exhibit specialization (by being the only baleen whale that benthically forages) and since the PCFG display a second tier of specialization by not using the Bering Sea feeding grounds, it seems plausible that individuals within the PCFG might have individual foraging specializations and preferences. Therefore, my research aims to investigate whether individual gray whales in Port Orford exhibit individual foraging specializations. Individual foraging specializations can occur in a number of different ways including habitat type (rocky reef vs sand/soft sediment), distance to kelp, time and distance of foraging bouts, and prey type and density. For this class, my research question is whether prey species drives the amount of time a foraging whale will spend at a specific foraging location.

Data

Prey data

The prey data has been obtained through zooplankton tow net samples from a research kayak in the summers of 2016, 2017 and 2018. Kayak sampling effort varies widely between the three years due to weather creating unsafe conditions for the team to collect samples. These samples have been sorted and enumerated to the zooplankton species level so that for each day when a prey sample was collected, we have known absolute abundances of prey species at each sampling station.

Whale data

The whale data is in the form of theodolite tracklines of gray whales that used the Port Orford study area during the summers of 2016, 2017 and 2018. Since whale tracking occurs at the same sites as prey sampling, we are able to map the prey community present at a particular location that whales forage at. The tracklines occur on a very fine spatial resolution as the study area is approximately 2.5 km in diameter, though some of the tracklines extend out to approximately 8 km offshore. Furthermore, as whales forage in the area, photographs are taken of each individual in order to match the trackline with a particular individual. This way, potential individual specializations may be detected if there are repeat tracklines of an individual.

Hypotheses

There will be differences in time spent by individual gray whales foraging in an areas with different prey communities. However, these differences will likely not be constant/stable over time. Most likely, foraging will be largely driven by availability of prey and therefore individual whales will be rather flexible in their foraging strategies.

Approaches

Individual patterns in time and space use within the Port Orford study area will be assessed through the identification of foraging bouts. Theodolite tracks from 2016-2018 longer than one hour will be included in this analysis. Using ArcGIS, tracklines will be clipped so that only foraging points within the study sites are included in this analysis. Radii will be created around each of the 12 zooplankton sampling locations to identify overlap between whale foraging and known prey communities at sampling stations. Time spent within these radii will be calculated. Statistical analyses (likely GAMs) will be run in order to identify whether individuals spend more and/or less time at a foraging location based on the prey species that are present at that location.

Expected Outcomes

This analysis will likely result in plots of time spent at a foraging locations against abundance of different prey species, which will be based on the results of the statistical analyses. This will allow the comparison of whether individual whales have preferences for different kinds of prey species. Maps might also be very nice to visualize the movements of whales however I am unsure how the time element of foraging bouts could be incorporated into this (perhaps with shading of color?).

Significance

This spatial problem is important to science since genetic evidence suggests that there are significant differences in mtDNA between the ENP and PCFG (Frasier et al.2011; Lang et al.2014), and therefore it has been recommended that the PCFG should be recognized as being demographically independent. In the face of a proposed resumption of the Makah gray whale hunt as well as increased anthropogenic coastal use, there is a strong need to better understand the distribution and foraging ecology of the PCFG. This subgroup has an important economic value to many coastal PNW towns as many tourists are interested in seeing the gray whales. Therefore, understanding what drives their distribution and foraging habits will allow us to properly manage the areas where they prefer to forage.

Proficiencies

I have novice/working knowledge of Arc-Info and Modelbuilder. I have never used Python before. I have working knowledge of R and am proficient in image processing.

Literature Cited

Calambokidis, J. C., Laake, J. L. and A. Pérez. 2017. Updated analysis of abundance and population structure of seasonal gray whales in the Pacific Northwest, 1996-2015. Draft Document for EIS.

Frasier, T. R., Koroscil, S. M., White, B. N. and J. D. Darling. 2011. Assessment of population substructure in relation to summer feeding ground use in the eastern North Pacific gray whale. Endangered Species Research 14:39-48.

Lang, A. R., Calambokidis, J. C., Scordino, J., Pease, V. L., Klimek, A., Burkanov, V. N., Gearin, P., Litovka, D. I., Robertson, K. M., Mate, B. R., Jacobsen, J. K. and B. L. Taylor. 2014. Assessment of genetic structure among eastern North Pacific gray whales on their feeding grounds. Marine Mammal Science 30(4):1473-1493.

Omura, H. 1988. Distribution and migration of the Western Pacific stock of the gray whale. The Scientific Reports of the Whales Research Institute 39:1-9.

Scordino, J., Bickham, J., Brandon, J. and A. Akmajian. 2011. What is the PCFG? A review of available information. Paper SC/63/AWMP1 submitted to the International Whaling Commission Scientific Committee.

Scordino, J., Weller, D., Reeves, R., Burnham, R., Allyn, L., Goddard-Codding, C., Brandon, J., Willoughby, A., Lui, A., Lang, A., Mate, B., Akmajian, A., Szaniszlo, W. and L. Irvine. 2018. Report of gray whale implementation review coordination call on 5 December 2018.

Grant Z’s Spatial Problem

My spatial problem is about land use/land cover (LULC) change associated with the establishment of artisanal, small-scale gold mines (ASGM) in rural Senegal. Put in the vernacular we’ve learned in the course, I’d phrase my question this way:

How is the spatial pattern of LULC change related to the spatial pattern of ASGM establishment via the mechanism of deforestation?

As part of their establishment, ASGM requires clearing the land where the mines will be, as well as additional timber harvesting to build the homes where the miners will live while working and additionally to bolster the mines shafts themselves. As such, I’m curious as to what the exact change is that accompanies ASGM establishment, as this is a sub-set of my graduate thesis which seeks to understand more broadly how ASGM impacts the environment and the lives of the miners themselves, to understand better if a household diversifying into ASGM is better suited towards adapting to future climate change than if they hadn’t.

The dataset I have is very high resolution (VHR) satellite imagery (panchromatic and multispectral) courtesy of the Digital Globe Foundation. The panchromatic imagery has a resolution of .3m while the MS imagery is around 1.24m — as part of the preprocessing I’ve pansharpened the images so the overall imagery is stills sub-meter, which is necessary to investigate ASGM as its footprint is too small for detection with Landsat or other sensors. The imagery covers about 16 gold mines I’ve identified, and has imagery from 2018 and 2009/2010.

My present hypothesis is that, while obviously there will be a decrease in LULC at the mine in general, the area around the mine will also be indicative of some change — in my literature review, I’ve found some information that the environment up to 20km around a gold mine can be impacted. I think in this case the impact won’t be as drastic, but it’s what I’m expecting.

Currently I’m not sure how to approach the problem. The first step will be to just map out the mining locations first in ArcPro and from there try to see how the forest cover has changed between present and past. Beyond that I’m not sure how to answer the question.

Ultimately I’d like to produce maps, to demonstrate the environmental impact that ASGM has (or not, potentially!). I’ve also considered producing similar maps showing the relative impact subsistence agriculture has had in non-mining villages as a comparison.

I feel somewhat prepared for this task. I’m comfortable working with ArcGIS and have some knowledge of ArcPy (though I’m a bit rusty). I can use ModelBuilder pretty competently and feel that I have a good grasp of what to do in that arena. My major hurdle right now is not knowing what I don’t know — I need some more exposure to the tools available to me for addressing this problem. However, I feel confident that with enough time and work, this problem is not intractable!

One problem is that, given the rural area and part of the world I’m looking at, Digital Globe does not frequently sample the area and so the imagery I have is not necessarily on the anniversary date. Given the drastic climate differences (rainy season and dry season), the vegetation may look drastically different.

As an example, bellow is Kharakhena in November 2008, with the original village highlighted in red.

Here is the village of Kharakhena in March 2017. The village is highlighted in yellow and the mine in blue.

The difference is very dramatic, but I’m not sure how to approach this analytically.

This is my spatial problem! I believe that this is significant as a part of my thesis work exploring livelihood diversification in rural Kedougou. As 70% of the population in the region lives below the poverty line, and most people engage in subsistence farming as their primary livelihood, I am interested in how ASGM operates as a livelihood diversification option. Specifically, I’m interested in assessing whether it is an “erosive” coping strategy: one which households undertake out of lack of other options and one which may diminish the household’s overall capacity to react to future stresses and shocks. I’m assessing this through an evaluation of the “Three Capitals” (sometimes five) which constitute a household’s assets. The three capitals are Economic, Environmental, and Social. I’m assessing the impact of ASGM on miners’ Social and Economic capital through interviews which I’ve already conducted, and I’m assessing the impact of ASGM on miners’ Environmental capital through remote sensing. Together, I hope to paint a holistic picture of ASGM’s impact on miners’ livelihood capitals in the region in order to better understand if it is indeed an “erosive” coping strategy, which, if it is, needs to be known in order to help miners and farmers find different ways to adapt to future climate change.

Spatial pattern of invasion

A description of the research question that you are exploring.

I am interested in how the spatial pattern of invasion by the recently introduced annual grass, ventenata, is influenced by the spatial pattern of suitable habitat patches (scablands) via the susceptibility of these habitat patches to invasion and ventenata’s invasion potential. Habitat invisibility is determined by the environmental characteristics, community composition, and spatial arrangement of suitable habitat patches and the invasibility of ventenata is influenced by its dispersal ability and fecundity.

I am also interested in understanding how spatial autocorrelation influences relationships between ventenata, plant community composition, and environmental variables within and across my sample units.

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I collected spatial data (coordinates and environmental variables) and plant species abundance data (including ventenata) data for 110 plots within and surrounding seven burn perimeters across the Blue Mountain Ecoregion of eastern Oregon (Fig. 1). Target areas were located to capture a range of ventenata cover from 0% ventenata cover to over 90% cover across a range of plant community types and environmental variables including aspect, slope, and canopy cover within and just outside recently burned areas. Once a target area was identified, plot centers were randomly located using a random azimuth and random number of paces between 5 and 100 from the target areas. Sample plots were restricted to public lands within 1600m of the nearest road to aid plot access. Environmental data for sample plots includes canopy cover, soil variables (depth, pH, carbon content, texture, color, and phosphorus content), rock cover, average yearly precipitation, elevation, slope, aspect, litter cover, and percent bare ground cover. I am planning to identify and calculate spatial pattern of habitat patches using Simpson Potential Vegetation Type raster data (Simpson 2013) to 30m resolution in Arc GIS which was developed to identify potential vegetation types across the Blue Mountain Ecoregion.

Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

Ventenata appears to readily invade unforested, rocky scabland patches, but is less prominent in surrounding forested areas. These areas may act as “hotspots” of invasion from which ventenata can spread into surrounding, less ideal habitat types. The “biodviersty-invasion hypothesis” (Elton 1958) posits that more biodiverse areas will be less susceptible to invasion, but propagule pressure hypotheses suggests that areas close to areas that are heavily invaded will be more likely to be invaded (Colautti et al. 2005). If environmental factors such as biodiversity influence invasion success, I would expect diverse habitats to have a higher resistance to the ventenata invasion and be less invaded, but if propagule pressure is a stronger driver of the ventenata invasion, diversity may be trumped by proximity to invaded patches which may increase these patches risk of invasion despite species composition.

Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I would like to develop an overall understanding of the spatial pattern of invasion by performing Moran’s I to test for spatial autocorrelation. Additionally, I would like to identify hotspots of invasion using hotspot analysis. From these hotspots, I am interested in predicting spread using kernel density functions that estimate ventenata distribution. Finally I would like to relate the spatial pattern of vententata to environmental characteristics of habitat patches (including species composition) through the use of cross-correlation and geographically weighted regression.

Expected outcome: what do you want to produce — maps? statistical relationships? other?

I would like to produce figures that represent multivariate statistical relationships between ventenata, patch size, location, and environmental/ community variables. I am also interested in creating a map depicting areas at highest risk of invasion based on spatial and environmental data if appropriate considering the statistical relationships that result from the analysis.

How is your spatial problem important to science? to resource managers?

Ventenata is rapidly invading natural and agricultural areas throughout the inland Northwest where associated ecological and economic losses are readily becoming evident. However, possibly the most concerning aspect of the invasion is ventenata’s potential to increase fire intensity and frequency in invaded scabland patches, that prior to invasion supported light fuel loads and acted as natural fire breaks for the surrounding forest. Such shifts to the fire regime could dramatically alter landscape-scale biodiversity and cause additional socioeconomic losses. Despite these concerns, little is known of the drivers influencing ventenata’s invasion potential and few management options exist. Understanding how the spatial arrangement and size of scabland patch influences susceptibility to invasion by ventenata could help managers target areas at the highest risk for invasion and mitigate losses.

Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software

I have a working knowledge of programming in R and manipulating spatial data/ map making in ArcGIS. I have taken introductory level classes in R and ArcGIS, and used these tools for work and for my research. I have no experience in modelbuilder, GIS programming in python, and image processing. I am eager to learn how to use these tools and apply them to help answer ecological research questions.

References:

Colautti, R. I., Grigorovich, I. A., & MacIsaac, H. J. (2006). Propagule pressure: a null model for biological invasions. Biological Invasions, 8(5), 1023-1037.

Elton, C.S. (1958). The ecology of invasions by animals andplants. T. Methuen and Co., London.

Simpson, M. 2013. Developer of the forest vegetation zone map. Ecologist, Central Oregon Area Ecology and Forest Health Program. USDA Forest Service, Pacific Northwest Region, Bend, Oregon, USA

Using Spatial Statistics to Determine the Subsurface Spatial Distribution of Lava Flows in Northern California

Research Question

I am trying to determine potential fluid flow paths based on the spatial distribution of lava flows in the Medicine Lake, Lassen Peak Big Valley Mountain area of Northern California. The final goal of this project is a 3-D subsurface framework of the geology from which we can model fluid and heat flow.

Thus we want to know “How the spatial pattern of the depth of the lava flows attributes of wells is related to the spatial pattern of lava flow depth (B1), which in turn is related to pre-lava-flow topography (B2) and the regional geology (B3), because lava flows follow topography and subside post emplacement at rates that follow basin subsidence rates in the region, and form barriers for/conduits of groundwater (C1).”

Dataset

My data are a series of more than 1500 well logs. Each contain data that pertains to a map coordinate and with information about changes in lithology, or rock type, with respect to depth. Well logs are collected the year the well is made. I have well logs that range from the 1960s to the present. Temporal data is less important for the initial part of my study (Fig 1).

Figure 1: What is a Well Log? A well log is a record of the changes in rock type that occur with changes in depth and a location at the surface of the earth.

I also have data from geologic maps. Geologic maps provide the spatial extent of the surface exposure of a geologic unit, which includes information about its contacts with other rock units in x,y space as well as information about the elevation (Fig 2).

Figure 2: My study area in Northern California, it lies between the Big Valley Mountain, Medicine Lake Volcano and Lassen Peak with the Pit River draining the middle of it. The green dots represent the center of the township and range section that the wells are located in, and the size of the circle indicates the number of wells in that section.

Hypothesis

Figure 3: Cross section of a Basalt Column (Lyle 2000). In Volcanic systems, fluid flow is limited to the Vesicular flow tops, the rubbly bases (P.P.C in this diagram) and sedimentary interbeds that lie between vertically stacked flows. These sections of the rock are the only locations with high enough permeability and porosity to allow the movement of water.

This means that if we know where the lava beds lie, and we know their contacts, then we can outline potential zone of flow. Determining the patterns of how lavas flow and emplacement then allows us to determine the lava flow’s spatial extent, and therefore potential flow paths. I expect lavas to follow the laws of physics. They travel as viscous fluids and fill basins, and so I would expect them to be thickest where paleotopography was lowest, they will likely thin out at the edges, and they will be down slope from the volcano or vent that erupted them. Given the regional geology, I would expect the thickest lava flows to lie in the Pit River basin, near the range bounding Big Valley Mountain Fault (Fig 3).

At its simplest, we expect lava flows to follow the geologic principals of original horizontality and of cross cutting relationships.

Approaches

I would like to learn both how to apply variograms and kriging and how they work; plus any new techniques that I am not yet aware of. We can also make the assumption that all our residuals with follow the rules of stationarity. This means that any irregularities in the data represent unacknowledged geologic features.

Expected outcome

My first goal is to create a 3-D subsurface map of the connectivity and contacts of lava flows in the Medicine Lake, Lassen Peak Big Valley Mountain region. Ideally I would begin the process with Geog 566. I would like to have a few surface I can test in the field by the end of this term, but understanding different methods with which I can make this framework is my first goal.

Relevance

Understanding the distribution of lava flows in the region ties into the regional geology of my study area. As I stated in Question 3, lavas fill basins. Basin morphology, and the amount of space lavas can fill depends on the slip rates and distribution on faults in the Medicine Lake, Lassen Peak Big Valley mountain triangle. By constraining slip rates in the basin, we both build a better picture of the regional geology, and we can make more rigorous checks on the validity of our statistical outputs.

Another equally important point is the next step in the project. After we make the 3-D framework, the USGS will use it to model fluid and heat flow in the region. Comprehending potential changes in groundwater flow in the region will allow city manages in the region to better manage water in the future.

Preparation

Arc-Info	Not much, ArcMap, yes
Modelbuilder and/or GIS programming in Python	Working knowledge of python outside of GIS programming
R	None
Image Processing	Working Knowledge
Relevant Software	Matlab

Sources

Lyle, P. “The Eruption Environment of Multi-Tiered Columnar Basalt Lava Flows.” Journal Of The Geological Society, vol. 157, 2000, pp. 715–722.

Epidemiology of Western Hemlock Dwarf Mistletoe Post – Mixed Severity Fire

Description of the Research Question

My study site is located in the HJ Andrews Experimental Forest, where the two most recent mixed severity fires burned leaving a sizable fire refugia in the middle of the stand. Western hemlock dwarf mistletoe (WHDM) survived the fire in this refugia. WHDM spreads via explosively discharged seed and rarely by animals. This means that for the pathogen to spread, the seed must reach susceptible hosts. WHDM is a obligatory parasite of western hemlock. The regeneration post fire may pose a barrier to the pathogens spread because of the structure and composition. Lastly, the structure of the fire refugia may determine the rate of spread. This is because the structure largely determines where infections exist in the vertical profile of the canopy.

My research question is: How is the spatial pattern of WHDM spread extent and intensification from fire refugia related to the spatial pattern of the structure and composition of the fire refugia and the surrounding regeneration via barriers to viable seed reaching susceptible hosts?

I have three objectives related to the spatial organization or the regenerating forest surrounding the fire refugia and one related to the fire refugia itself. My objectives are to determine how fire refugia affects dwarf mistletoe’s spread and intensification through:

The stand density of regenerating trees and of surviving trees, post fire.
The stand age and structure of regenerating trees and surviving trees in fire refugia.
The tree species composition of the regenerating trees adjacent to fire refugia.

AND…

Whether intensification dynamics of WHDM inside a fire refugia resemble those of WHDM in other infection centers.

Dataset Description

Spatial locations of the extent of spread and intensity (also referred to as severity) of WHDM infection include presence/absence of infection in susceptible hosts and will have a severity rating for each infected tree. The presence/absence data will be two measurements, one from 1992 and one from 2019. The intensity rating will be from 2019 only.

Spatial locations/descriptions of the structure and composition of the fire refugia and regeneration surrounding refugia include X,Y of each tree, a variety of forest inventory attributes such as diameters, heights, and species for each tree, and delineation of the fire refugia boundaries. This data has been measured several times: 1992, 1997, 2013, and 2019. The GPS coordinates were recorded with handheld GPs units most likely under canopy cover so resolution may vary.

These data sets are bounded by a 2.2 ha rectangle that confines the study area.

Hypotheses

The spatial pattern of western hemlock dwarf mistletoe in the study site will be several discrete clusters. This arrangement forms because WHDM spreads from an initial infection point outward. The initial infection serves as an approximate center point and the cluster, or infection center, grows outward from there. Separations between clusters are maintained by forest structure and composition or disturbances. New clusters form from remnant trees surviving disturbances, or random dispersal of seed long distances by animals. In this case, the fire refugia protect the remnant trees from disturbance and these trees are the new focal points for infection centers

The spatial pattern of the forest structure and composition will drive the direction and rate of spread of WHDM because variances in these two attributes will cause varying amounts of barriers to WHDM seed dispersal. Because seeds are shot from an infected tree, that seed needs to reach a new uninfected branch. This means physical barriers to spread can affect seed dispersal and non susceptible species will stop spread.

Approaches

I would like to utilize spatial analyses that allow me to understand how the clustering of WHDM is related to the boundaries of the fire refugia. Also, how the infection centers have changed over time utilizing forest structure and composition metrics of the regeneration surrounding the refugia. Lastly, something that can incorporate a severity rating on a scale instead of simply presence/absence data and describe the distribution of severely infected trees vs lightly infected trees and how that relates to the fire refugia boundary and forest metrics of the regeneration surrounding the refugia.

Expected Outcome

I would like to produce statistical relationships that can determine the significance of forest density, species composition, age, and structure on the ability of WHDM to spread. Also, I would like to produce statistical relationships that can describe whether or not a fire refugia alters the way WHDM spreads and intensifies when compared to commonly observed models. Maps as visuals for describing the change over time would be a useful end product as well.

Significance

Understanding the spread patterns of WHDM is important for resource managers seeking to increase biodiversity and produce forest products. Focus has shifted to creating silvivultural prescriptions that emulate natural disturbances that are still economically viable and that maintain ecosystem functions. Disturbance events can control WHDM but also create opportunities to increase its spread and intensification so managers need to have an understanding of how a particular forest structure will affect WHDM. Also, if we want to maintain biodiversity, understanding how WHDM infection centers created by fire develop is important. Fire frequency and severity may be increasing in the future and the loss of mixed severity fire would mean a significant loss of WHDM. Land managers seeking to emulate burns can use this information to plan burns that preserve patches of WHDM if desired and understand how the pathogen will progress 25 years later. This is not usually the case for forest pathogens.

Level of Preparation

I have worked in ArcMap quite a bit, but I haven’t much experience with the wide range of functionality of ArcInfo. I used ModelBuilder somewhat to keep my queries and basic analyses organized. No experience programming in Python. I have taken two stats classes before this using R and feel I have a working knowledge and have no problem learning new tools in it. However, I have very little work with image processing such as working with rasters and some small exposure to LiDAR processing.

Predicting spatial distribution of Olympia oysters in the Yaquina estuary, OR

My spatial problem

A description of the research question that you are exploring.

My research aims to determine the current abundance and spatial distribution of native Olympia oysters in Yaquina Bay, Oregon. This oyster species has experienced massive decline in population due to overharvest during European settlement of the western United States. Yet its value to the ecosystem, its cultural importance, and its tastiness have made the Olympia oyster a current priority for population enhancement. For my research, I will be focusing on a local population of Olympia oysters in the Yaquina estuary. The goal of my project is to gather baseline information about their current abundance and spatial distribution, then develop a repeatable biological monitoring protocol for assessing this population in the future. Using spatial technology, I will first assess whether the distribution of Olympia oysters can be predicted using three habitat parameters: salinity, substrate availability, and elevation. In collaboration with the Oregon Department of Fish and Wildlife (ODFW), I will use the results of this spatial analysis and field surveys to determine ‘index sites’, which are specific locations within the estuary that are indicative of the larger population. These index sites will be revisited in the future by ODFW’s Shellfish and Estuarine Assessment of Coastal Oregon (SEACOR) team to assess changes in population size and spread over time. If predictions of Olympia oyster distribution are accurate based on the habitat parameters I’ve identified, then I’d also like to analyze potential species distribution under future environmental conditions and under different management scenarios, including habitat restoration and population enhancement.

For this course, I will be exploring this main research question:

How is the spatial pattern of Olympia oysters in the Yaquina estuary [A] related to (caused by) the spatial pattern of three habitat parameters (salinity, substrate, elevation) [B]?

A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I will be using three spatial datasets, representing each of the habitat parameters, overlaid on one another to rank most to least likely locations for Olympia oyster presence. The salinity dataset is based on historical measurements (1960-2006) and represents a gradient from highest salinity (~32psu) at the mouth of the estuary to fresher water up stream (<16psu). Elevation is represented through a bathymetric dataset from 2001-2002, sourced from the Environmental Protection Agency office in Newport, OR. The substrate data comes from the Oregon ShoreZone mapping effort in 2014, which is managed and updated by the Oregon Coastal Management Program. There’s a couple different ways this data can be used, either as a substrate layer that characterizes substrate type broadly (low resolution) or through vector data with associated data tables that describe the substrate within a tidal zone along the shoreline (higher resolution, but spatial extent is limited).

The images here show the three habitat parameter spatial datasets:

Yaquina Bay bathymetry derived from subtidal soundings in 1953, 1999, 1998, and 2000 by U.S. Army Corps of Engineers.
Data from EPA.

Salinity figure digitized from Lewis et al. (2019) based on Oregon’s wet-season salinity measurements (average salinity November-April).
Lewis, N. S., E. W. Fox, and T. H. DeWitt. 2019. Estimating the distribution of harvested
estuarine bivalves with natural history-based habitat suitability models. Estuarine, Coastal and Shelf Science, 219: 453-472.

Substrate component classes of Yaquina Bay based on data classifications from Coastal and Marine Ecological Classification Standard (CMECS) ‘Estuarine Substrate Component’ layer.
Data from Oregon Coastal Management Program.

Predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I am hypothesizing that the distribution/spread of Olympia oysters in Yaquina Bay is influenced by availability of appropriate habitat parameters; where these parameters align within the appropriate range will determine where the oysters can be found. However, I think that I will find that not all of the parameters equally influence oyster distribution. For example, Olympia oysters have been observed to tolerate a broad salinity range, but are absolutely not present without suitable substrate. I am expecting to see that the influence of a particular habitat parameter changes depending on where the oysters are located within the estuary. I’m curious to see, if possible, which parameter will be most important at what life stage and what may drive changes in population per specific site in the estuary.

I do expect that I will be able to make a prediction about where the oysters will be located based on the habitat parameters, though I am uncertain that the resolution of the spatial data is sophisticated enough to capture nuances in distribution. For example, Olympia oysters are known to be opportunistic in finding suitable substrate and will settle on a wide variety of hard surfaces, including derelict boating equipment, discarded shopping carts, and pilings.

Describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I want to be able to produce a model that can predict where the oysters are located based not just on the three habitat parameters of interest, but under various environmental conditions and different management scenarios. For example, where might the oysters settle in a given year if rainfall is substantially higher, or if adult oysters spawn earlier, or if oyster growers create Olympia oyster beds for harvest, or if a new habitat restoration site is established, etc.

I’m not especially handy at statistical analysis, so I would like to gain a better understanding of statistics through spatial data. I know that I will need to use statistics to determine how successfully the prediction of Olympia oysters aligns with actual observations in the field, but currently unsure how to do that. A recent study in Yaquina estuary was just released using a similar approach for predicting the distribution of five other bivalve species. This study used R to generate a logistic regression model to determine the probability of each species presence within a given area. I would like to do something similar for my analysis, but need some help.

What do you want to produce — maps? statistical relationships?

The desired products of this research are habitat suitability maps of the current and future (pending the success of the initial effort) distribution of native Olympia oysters for use by ODFW. As a part of this effort, I will create a map of index site locations to be used in future species monitoring. I would also like to generate a predictive model that can determine distribution of oysters based on annual changes in the local environment (El Nino conditions, heavy rainfall, restoration efforts, introduction of invaders, etc.). While salinity, substrate, and elevation seem to be the main factors influencing oyster distribution, there are a number of other factors that can have effects, including temperature, proximity to the mouth of the estuary, and tidal retention.

How is your spatial problem important to science? To resource managers?

ODFW currently does not have reliable baseline information on the distribution of Olympia oysters in Oregon. As an ecological engineer, the species provides a number of important benefits to the ecosystem, including water filtration and habitat for other marine creatures. It is culturally significant to local tribes, including the Confederated Tribes of Siletz. This species is not currently listed as threatened or endangered, but if it becomes listed one day, then that designation will trigger a number of mitigation and conservation measures that will be difficult and expensive for agencies and private landowners. Additionally, there’s been some exploration that if the population can become robust again, there is potential to grow and harvest this species as a specialty food product. Given the current slow food movement and interest in local products, Olympia oysters could fit well in this niche.

How much experience do you have with:

(a) Arc-Info – Little experience, used a bit with older versions of ArcMap.

(b) Modelbuilder and/or GIS programming in Python – I am comfortable with ModelBuilder, but have no experience with Python.

(c) R – Some experience; I took Stats 511 where we used R heavily in a series of lab exercises. I have not applied my own data in R.

(d) Image processing – I have used a variety of Adobe products for graphic design, including Photoshop and InDesign.