Tag Archives: HJ Andrews

Relation of Potential Soil Carbon to Tree Height and Density across HJA

Major uncertainties exist in the geospatial distribution of terrestrial carbon (C) sources and sinks and the factors that influence soil C distribution and change. Temperate and boreal forests serve as a net sink of atmospheric CO2, so my aim is to understand how landscape features and vegetation vary across the HJ Andrews Forest (HJA) and determine how those factors influence soil C storage and CO2 release.

Research Question: How might landscape features and vegetation characteristics relate to accumulation of soil carbon, using tree height and density as a proxy for soil C?

Dataset description: I used LiDAR data from 2008 and 2011 at 0.3-0.5 m vertical resolution and at 1 m2 horizontal resolution covering the Lookout Creek Watershed and the Upper Blue River Watershed to the northern extent of HJA. These LiDAR data include a high hit model (tree tops) and a bare earth model from 2008. I also downloaded a shapefile of polygons of the reserved control areas within the HJA forest boundaries and used these areas to subset some of my later analyses.

Hypotheses:

  1. Large storm events primarily topple larger (older) trees on ridges and upper slopes. Soils in these positions are typically thinner, making them quickly saturate from precipitation and making trees in those positions more vulnerable (Lal 2005; Overby 2003). In addition, the tallest trees on ridges may be exposed to greater risk from lightning strikes, greater wind gusts and more snow accumulation, resulting in shorter trees dominating ridgelines. Shorter trees have smaller canopies and occupy less physical space than taller trees, so shorter trees occupy denser stands and as they grow taller, each individual tree occupies more space. Because of these factors, I hypothesize that shorter trees that are more closely spaced will cluster more densely along ridges than other landforms.
  2. I hypothesize that the tallest trees will be more densely clustered at low elevations along valleys because waterways carve out valleys and bring nutrients that then accumulate and build thick, carbon-rich soil. These areas tend to not suffer from the moisture limitations of ridges and steep slopes and are more protected from strong winds, so trees in these areas can maximize growth based on available solar radiation.
  3. High density young forest stands exhibit characteristics similar to mature forests, like closed canopies and high LAI. However, recently disturbed forests have higher nutrient availability than undisturbed forests which has been shown to cause a shift in C-allocation from below- to aboveground, so I expect younger stands to negatively correlate with soil C accumulation.
  4. I hypothesize that because vegetative growth is limited by solar radiation, vegetative growth on slopes that face S and W have greater exposure to solar radiation and will result in greater overall biomass (more dense stands of smaller trees, but not necessarily more tall trees). However, if the slope is too steep, I expect this pattern to diminish since trees will experience greater wind and water erosion and more tree mortality.

Note: I have yet to test hypotheses 3 and 4, but they are avenues for continuing research.

Approaches: I used a k-nearest neighbor analysis and k-means to cluster trees into 10 height classes and related each tree in the center of each height class to its distance from its 30 nearest neighboring trees of the same height class. I ran several hot spot analyses on tree heights and tree distances (tree density). Since my original hot spot analyses were subset by height class, which led to the algorithm naturally finding the arbitrary bins (upper and lower bounds) the height classes were based on, I performed new hot spot analyses on all the trees and all the tree distances within just the reserved control areas of the forest. I performed a linear regression to compare the regions of taller than expected trees to the regions of greater than expected distances between tree and Chi-squared tests independence for the hot spot analyses to compare hot spots and cold spots with variables like tree spacing, slope and elevation. Since hot spots between tree height and spacing did not overlap in all cases, I wanted to know what landscape features might explain this difference. Covariates I explored included slope, aspect, elevation, and landform. I calculated a confusion matrix between the Z-scores of height and distance for all the hot spot bins (-3,-2,-1,0,1,2,3), then further constrained the analysis to only the most extreme hot and cold spots (-3 and 3). I then compared mean height, distance, slope and elevation between the four combinations of the extreme hot and cold spots.

My objective for the final synthesis was to find regions that are higher or greater than expected for given parameters (height and tree density) and group these clusters into polygons where they overlap. I hypothesized that that soil C accumulation positively correlates with clusters of trees that are both taller and more densely spaced than expected. Conversely, I hypothesized that shorter trees that are more widely spaced negatively correlate with soil C accumulation. Therefore, overlapping hot spots of tree height and tree density and overlapping cold spots of tree height and tree density are prime areas for my planned future soil C sampling sites.

Results: I produced maps and statistical relationships between tree height and tree spacing. These included hot spot maps overlaid on elevation maps, density plots, graphical comparisons of Z-scores and many others (see previous blog posts). I found that hot spots of taller than expected trees overlapped with hot spots of greater than expected distances of trees, but not in all cases (Fig 1). When I examined a table of each of the distance hot spot bins compared with each of the height hot spot bins, I found that the most compelling correlations were between bins (0,0), (-3,-3) and (3,3) with 5%, 33% and 43% of total data, respectively. The proportion of data not covered by overlapping hot spots and by overlapping cold spots was minimal, but when mapped was visually compelling enough to warrant further exploration. I was particularly interested in areas with short trees and greater than expected distance between trees (bin -3,3) and areas with tall trees and shorter than expected distance between trees (bin 3,-3). I expect that bin (-3,3) would correlate with less soil C, so identifying those areas could be useful to sampling. I expect that bin (3,-3) would correlate with greater soil C. I can identify from Fig 2 where statistically significant hot spots and cold spots are spatially close and plan to sample heavily in those areas.

Fig 1. Tree height vs. distance between trees from hot spot analyses shows a highly linear pattern.

Fig 2. Tree density (mean distance in m between 30 closest trees) hot spot bins compared with tree height (m) hot spot bins in Lookout Mountain area. Bin comparisons (-3,-3) are areas of shorter than expected trees in more dense stands. Bin (3,-3) is taller than expected trees that are in more dense stands. Bin (3,3) is taller than expected trees in more sparsely populated stands.

 

Significance: Many of my results thus far confirm ecological phenomena that we already know to be true across forested landscapes. For example, I examined tree distance to stream and found that taller trees were more common closer to streams and less common at greater distances from streams. This makes sense with other landform features like valleys and ridges, so smaller trees tend to be along ridges (cold spots of tall trees according to my analysis) and taller trees tend to be in valleys. However, I have identified regions of taller than expected trees and regions of shorter than expected trees, and if those correlate with respectively more and less soil C, that provides evidence for LiDAR as an effective way to quantify soil C. If other landscape features that can be determined from geospatial data co-vary with tree height and/or tree density, it may be possible to quantify soil C at fine resolution. These data can be used to identify areas that have the potential to sequester more soil C and forest management could be tailored to support those regions.

Geospatial Learning: I learned a ton about ArcMap, QGIS and packages that were new to me in R like spatstat, caret, nngeo, and others. I had very little experience with and GIS before this class, so it was a steep learning curve but I’ll continue to learn as I perform this final synthesis. I learned how to perform hot spot analyses in ArcMap and export them to work with in Q. I learned how to manipulate spatial data in R and load it in Q and ArcMap for viewing. These are only a few examples of the many things I’ve learned!

Statistics Learning: I learned the limitations of hot spot analyses, with one of my criticisms being that it’s basically a smoothing function. Since the LiDAR dataset I’m using is basically a census of tree heights, running hot spot analyses is reducing the information in that dataset unnecessarily. I already knew that my data were spatially autocorrelated, so I had to take that into account for all of my analyses. I learned that the confusion matrix is great for visually discerning where a model predicts best and where it doesn’t predict well, but that the scientist must figure out the reasons for the good or poor predictions.

Because interactions among geomorphic processes, landforms, and biota occur on various temporal and spatial scales, it is necessary to understand the scale of an ecosystem process when performing spatial analyses. It is also necessary to consider the potential reasons why a particular spatial pattern did not emerge from an analysis. Reasons can range from data not being explained by expected process or multiple spatial scales of interactions, failing to use the right tool, not being at the right spatial scale, data being too coarse, failing to include the right variables or a flaw in one’s fundamental idea about how the system works. This is why it is important to (1) formulate clear, specific hypotheses before performing an analysis and (2) explore spatial patterns using multiple approaches if possible.

Sources:

Harmon M.E. (2009) Woody Detritus its Contribution to Carbon Dynamics of Old-Growth Forests: the Temporal Context. In: Wirth C., Gleixner G., Heimann M. (eds) Old-Growth Forests. Ecological Studies (Analysis and Synthesis), vol 207. Springer, Berlin, Heidelberg

Lal, R. (2005). Forest soils and carbon sequestration. Forest ecology and management220(1-3), 242-258.Overby et al. 2003: Impacts of natural disturbance on soil C dynamic in forest ecosystems: Soil C sources include litter fall and tree mortality, root turnover and root exudates.

Dixon, R. K., Solomon, A. M., Brown, S., Houghton, R. A., Trexier, M. C., & Wisniewski, J. (1994). Carbon pools and flux of global forest ecosystems. Science263(5144), 185-190.

Final Project: Western Hemlock Dwarf Mistletoe Spatial Patterns and Drivers

Western Hemlock Dwarf Mistletoe

Western hemlock dwarf mistletoe is a hemi-parasite of primarily western hemlock trees. It absorbs water, nutrients, and carbohydrates from its host. Infected branches produce small structures called aerial shoots that have minor photosynthetic capabilities, but are primarily for pollination and seed production and require a certain amount of light to emerge from an infection site. Seeds are explosively discharged from aerial shoots when fully mature. Once landing on a susceptible host branch they being germination and mechanical penetration of the host branch.

Research Question

I wanted to explore the spatial patterns of dwarf mistletoe infections after a mixed severity fire and the roles post-fire, forest structure and fire refugia play.

How does the spatial pattern of post-fire stand structure and species composition affect spatial patterns of dwarf mistletoe distribution throughout the stand through physical barriers to susceptible hosts and seed dispersal?

Description of Dataset

I have a 2.2 ha rectangular study area (Wolf Rock), just northwest of the HJ Andrews Exp. Forest that was stem mapped in 1992. Each tree has an X and Y coordinates, as well as the tree species, height, diameter, and a variety of other tree inventory related data. I only have ages for a few western hemlocks, but many more for Douglas-fir. Only one western hemlock core was identified as being over 100 years old, at 170 years. I have a polygon layer for the fire refugia that are documented on the study area. For the western hemlocks I have a presence/absence of western hemlock dwarf mistletoe as well as a severity measure. However, I am unsure of the scale and ethod of rating, so I will not be using it. The infection presence and absence are from a single measurement season.

Hypotheses and Predictions

Western hemlock dwarf mistletoe spreads easily through moderate canopy densities, where distances between trees are close to the average dispersal distance of 2-4 meters. Disturbances that create patchy gaps increase likelihood of spread because of increased light reaching infected branches and increase the rate of infection abundance because of the lack of physical barriers such as high densities of branches and foliage.

Disturbances can also remove the disease from a forested stand by killing or knocking down infected western hemlocks. Post-disturbance regeneration can enable or inhibit the parasite’s reintroduction to the stand. Non-susceptible hosts such as Douglas-fir or western redcedar regenerate readily alongside western hemlock and will intercept seeds. Some gaps are only conducive to western hemlock regeneration which will be readily infected by any surviving infected western hemlocks post-disturbance.

The Wolf Rock stand experienced two fires that created a mosaic of ~110 year old regeneration and >110 year old remnant trees in fire refugia. Remnant, infected western hemlocks survived the most recent fire in those fire refugia. From this stand structure, I have several hypotheses and predictions:

  1. Remnant, infected western hemlocks form the center of new infection centers post disturbance so infections in the regenerating susceptible hosts, will be clustered around these remnant trees.
  2. Non-susceptible hosts regulate the rate of infection spread through physical barriers to dispersal, so infection cluster size will have an inverse relationship with non-host density.
  3. Western hemlock infection spreads from a central remnant tree, so uninfected western hemlocks will have a dispersed spatial pattern from the remnant western hemlock, regardless of non-host density.
  4. Post-fire regeneration with higher western hemlock composition will have more susceptible hosts and less physical barriers to spread so infection cluster size will have a positive relationship with western hemlock composition.

Analysis Approaches

For Exercise 1,  I wanted to know about the spatial pattern of western hemlock trees infected with western hemlock dwarf mistletoe. I used a hotspot analysis to determine where clusters of infected and uninfected trees were in my 2.2 ha study area. I discovered a hot spot and a cold spot, indicating two clusters, one of high values (infected) and one of low values (uninfected).

For Exercise 2, I wanted to know how the spatial pattern of these clustered infected and uninfected trees were related to the spatial pattern of fire refugia outlined in my study site. I used Geographically Weighted Regression to determine the significance of this relationship, however I did not find a significant relationship between a western hemlock, its intensity of clustering and infection status, and it’s distance to its nearest fire refugia polygon edge.

For Exercise 3, I wanted to know how the spatial patterns of remnant western hemlocks related to the spatial patterns of regeneration western hemlocks, uninfected western hemlocks, and non-western hemlock tree species. I used the Kcross and Jcross functions in spatstat in R and prepared the data in ArcMap to analyze spatial relationships between trees. I found clustering between regenerating western hemlocks and non-hosts to remnant western hemlocks but the uninfected western hemlock’s spatial pattern was independent of the remnant western hemlocks.

Results

I produced several maps that showed the spatial patterns in my study site which were helpful for understanding and investigating further relationships. I produced several charts from my exercise 3 analysis that were useful for visual representations of the relationships between trees. In Exercise 3 I also produced a map from raster addition that gave me the best visualization of where western hemlock and non-host trees were in the stand. Exercise 2 produced a map and statistical relationship but was not significant in explaining a hemlock’s infection and density status.

Significance

The biggest finding was that the fire refugia polygons are not significant for my analysis, the remnant infected hemlocks are more important explanatory variables in spatial patterns of infected trees. This supported hypothesis 1. Because refugia can be effectively defined using the “for what, from what” framework, western hemlock dwarf mistletoe refugia from fire could be delineated differently in the field focusing only on the remnant western hemlocks.

Data was not available to determine the rates of infection spread over time because I only had one season of measurements. I also could not evaluate the size of clusters because I did not have GPS points of infection center extent so I could not assess hypothesis 2 and 4 directly. However using Ripley’s K and the cross-variant I could see how the clusters changed over distance. I learned that infected, regenerating trees are going to be found closer to the remnant infected trees and that non-host trees may be blocking the spread of mistletoe into an uninfected patch because they were found clustered around remnant trees as well. This provides support for Hypothesis 1, 2, and 3.

Silvicultural prescriptions with the goal to preserve old growth forest structure, but that want to limit the amount of dwarf mistletoe in a forest can appropriately remove old infected hemlocks to preserve infection spread and extent. These prescriptions will also be able to predict future dwarf mistletoe spread. Forest operations that simulate disturbances that leave remnant hemlocks such as harvests, can incorporate spread predictions to limit regeneration being infected.

Learning From The Process

I learned a lot about the spatial analyst tools in ArcMap and how to produce easily interpretable maps and graphs. I also learned how to use several function in spatstat. I learned a lot about interpreting R outputs and spatial. Spatial autocorrelation can tell you a lot about what your data are doing but I thought it was most useful to be able to see on a map or chart what is specifically happening.

Landscape Patterns as Predictors of Tree Height

Question: Which landscape features correspond to clusters of greater than expected trees?

Methods: I performed two Hot Spot Analyses in ArcMap; one on Hot Spots of tree height and another on hot spots of distance between trees. Both were constrained to the reserved control areas of the HJ Andrews Forest. Hot spots in tree height are regions of greater than expected tree height, while hot spots in distances between trees are regions of greater than expected distance between individual trees (more dispersed trees). Hot spots and cold spots of each analysis generally overlapped. However, hot spots between tree height and spacing did not overlap in all cases, so I wanted to know what landscape features might explain this difference. Covariates I explored included slope, aspect, elevation, and landform. Since the end goal is to find landscape features that may correlate with amount of soil carbon, I conducted this analysis with the assumption that taller trees may correlate with regions of greater soil carbon. I used the package ‘caret’ in R to calculate a  confusion matrix between the Z-scores of height and distance for all the hot spot bins (-3,-2,-1,0,1,2,3), then further constrained the analysis to only the most extreme hot and cold spots (-3 and 3). I then compared mean height, distance, slope and elevation between the four combinations of the extreme hot and cold spots (Table 1).

Results: Regions of taller than expected trees often correspond to regions of greater than expected distances between trees, which agrees with current forest growth models (Fig. 1). Hot spots of tall trees are typically in valleys and cold spots are commonly on ridges (Fig 3 & 4). When we zoom in to the Lookout Mountain area of HJ Andrews, we see that hot spots of tall trees are more concentrated in valleys and on footslopes, and cold spots are closer to mountain ridges (Fig 3). When compared with the distance hot spot map of the same area, we see that cold spots go much further down the footslopes and even into the valleys in some cases (Fig 4). So although we have evidence for a strongly linear relationship between height and distance between trees, we also have evidence that they do not fully explain each other and other landscape features are likely at play.

Fig 1. Distance Z-scores vs. Height Z-scores from hot spot analyses show a linear relationship.

Fig 2. HJ Andrews elevation with reserved control areas in orange and

inset area of Lookout Mountain hot spot maps (below)

 

Fig 3. Hot Spot Analysis showing hot spots of tree heights (tallest trees)

in the Lookout Mountain area

 

Fig 4. Hot Spot Analysis showing the greatest distance between trees

in the Lookout Mountain area

 

An elevation band that correlates with occurrences of tall trees exists up to around 1100 m, after which point number of tall trees drops off substantially (Fig. 5). Certain aspects seem to correlate with taller trees, but those relationships are harder to tease apart and I have yet to fully explore them. Greater slopes tend to correlate with shorter trees, but this relationship is not linear. There is an interesting upwards trend at slopes between 30 and 50 degrees that seems to correlate with slightly taller trees, then a big drop in mean height Z-score at slopes of 60 degrees.

Fig 5. Aspect, elevation and slope compared with Z-scores of mean height.

A comparison of Z-scores from hot spot analyses of height and distances shows that although hot spots of height and distance tightly correlate, covariates that explain them are different (mean slope and elevation). When we compare the most extreme Z-scores to one another, slope, height and distance between trees are not particularly different. Mean elevation in three categories of Z-score is similar, but mean elevation in the fourth group (>3,>3) is significantly lower. A next step is to map out these

Table 1. Comparison between the most extreme Z-scores of tree height and tree spacing.

Height Z-Score Distance Z-Score Mean Height (m) Height_SD Mean Distance (m) Distance_SD Mean Slope (m) Slope_SD Mean Elevation (m) Elevation_SD
<-3 <-3 22.8 10.5 5.1 2.4 27.9 10.5 1285 294
<-3 >3 24.5 11.2 5.6 2 26.9 4.5 1377 153
>3 <-3 34.8 7.1 5 2 31.8 5.9 1310 44
>3 >3 39.5 16.7 4.7 2.6 26.2 11 934 188

Critique: These analyses are still based on Hot Spot Analyses, so they still comes with the same criticisms as previous Hot Spot Analyses. One of these criticisms was that it’s basically a smoothing function. Since the LiDAR dataset I’m using is basically a census of tree heights, running hot spot analyses is reducing the information in that dataset unnecessarily. I have yet to map out regions that were well-predicted and poorly predicted spatially, so I cannot fully discuss the merits of the confusion matrix method.

 

Comparing the Spatial Patterns of Remnant Infected Hemlocks, Regeneration Infected Hemlocks, and Non-Hosts of Western Hemlock Dwarf Mistletoe

Overview

For Exercise 1, I wanted to know about the spatial pattern of western hemlock trees infected with western hemlock dwarf mistletoe. I used a hotspot analysis to determine where clusters of infected and uninfected trees were in my 2.2 ha study area (Map 1). I discovered a hot spot and a cold spot, indicating two clusters, one of high values (infected) and one of low values (uninfected).

For Exercise 2, I wanted to know how the spatial pattern of these clustered infected and uninfected trees were related to the spatial pattern of fire refugia outlined in my study site. I used Geographically weighted Regression to determine the significance of this relationship, however I did not find a significant relationship between a western hemlock, its intensity of clustering and infection status, and it’s distance to its nearest fire refugia polygon.

This result led to the realization that the polygons as they were drawn on the map were not as relevant as the actual “functional refugia”. I hypothesized that, after the 1892 fire, the only way for western hemlock dwarf mistletoe to spread back into the stand would be from the trees that survived that fire, or “remnant” trees. These would then infect the “regeneration” tree that came after the 1892 fire. The functional refugia I am interested in are defined by the location of the remnant western hemlocks. I also hypothesized that the spatial pattern of non-susceptible host tree (trees that were not western hemlocks) would play a role in the distribution of the mistletoe.

Question Asked

How are the spatial patterns of remnant western hemlocks related to the spatial patterns of regeneration western hemlocks, uninfected western hemlocks, and non-western hemlock tree species, and how are these relationships related to the spread of western hemlock dwarf mistletoe in the stand?

The Kcross and Jcross Functions

The cross-type functions (also referred to as multi-type functions) are tools capable of comparing the spatial patterns of two different type events (type i and j) in a similar spatial window, of some point process, X. It does this by assigning labels to the events differentiating the type and summarizing the number or distance, of and between events, at differing spatial scales, or radius circles (r).

The statistic Kij (r) , summarizes the number of type j events, around a type i event at a distance of r, or a point process X. Deviations of the observed Kij(r) curve from the the Poisson curve, or if type j events are truly randomly distributed, indicates dependence of type j events on type i events. Similar results can be obtained from the regular Ripley’s K: deviations above the curve indicate clustering and deviations below indicate dispersal.

(Incredibly helpful and interactive explanation link: https://blog.jlevente.com/understanding-the-cross-k-function/)

The statistic Jij(r) = (1 – Gij(r))/(1-Fj(r)) summarizes the shortest distance between a type i and j event and compares it to the empty space function of type j event. This is another test for inferring independence or dependence of type j events to type i. Deviations of the Jij(r) curve the value of 1, indicate levels of dependence of the events to each-other. Specific deviations from 1 can be hard to interpret without an understanding of the Fj(r) function so imagining it stationary in the ratio makes it easier. As Gij(r) increases, the numerator shrinks, creating a smaller Jij(r) statistic. Deviations below 1 indicate that type i and j events are dependent and that as r increases, the shortest distance between points of type i and j increases. As Gij(r) decreases, the numerator grows, creating a larger Jij(r) statistic. Deviations above 1 indicate that type i and j events are dependent and that as r increases, the shortest distance between points of type i and j decreases.

Methods Overview

In R, the package “spatstat” provides a suite of spatial statistic functions including the cross-type functions. In order to use these you need to create a “point pattern process” object. These objects incorporate X and Y coordinates, and a frame of reference, or a “window,” and give spatial context top a list of values. Then marks are applied to these points that create the necessary multi-type point pattern process object. These marks serve to distinguish the type i and type j events described earlier in your analysis. Then running the “Kcross()” or “Jcross()” functions with the specified type events produces a graph that you can interpret, very similar to producing the normal Ripley’s K plot.

  • I took my X – Y coordinates of all trees on the stand and added a column called “Status” to serve as my mark for the point pattern analysis.
    1. The four statuses were “Remnant,” “Regen,” “Uninfected,” and “NonHost” to identify my populations of interest.
      1. I had access to tree cores, so I identified trees that were older than 170 yrs old and these trees’ diameters served as my cutoff for the “Remnant” diameter class.
      2. All trees DBH > 39.8 cm.
    2. Doing this in ArcMap removed steps I would have to have taken when I migrated the dataset to R.
    3. I removed all the dead trees because I wasn’t concerned with them for my analysis.
  • I exported this attribute table to a csv and loaded it into R Studio.
  • I created the boundary window of my study site using the “owin()” function, and the corner points from my study site polygon.
  • The function “ppp()” creates the point pattern object and I assigned the marks to the data set using the “Status” column I created in ArcMap
    1. It’s important your marks are factors otherwise it is not converted into a multi-type point pattern object.
  • The last step is running the “Kcross()” and “Jcross” to compare the “Remnant” population to the “Regen,” “Uninfected,” and “NonHost” populations.
    1. This produced 6 plots, 3 of each type of cross-type analysis.
    2. Compare these easily using the “par()” function, for example:

par(mfrow = c(1,3))

plot(Ex3.Kcross1)

plot(Ex3.Kcross2)

plot(Ex3.Kcross3)

This produces the three plots in a single row and three columns.

Results

Because I am assuming the remnant, infected western hemlock trees are one of the main factors for the spread of western hemlock dwarf mistletoe and that they are the center of  new infection centers on the study site, I did all my analysis centered on the remnant trees (points with status = “Remnant” treated as event type i).

1)   i = Remnant, j = Regen

The first analysis between remnant and regeneration trees demonstrate that there is dependence on the two events to each other. At fairly small distances, or values of r, infected western hemlocks that have regenerated after the 1892 fire cluster around infected remnant western hemlocks that survived the 1892 fire. This stands to reason because we assume that infected trees will be near other infected trees, and that infection centers start usually with a “mother tree.” In this case the remnant trees serve as the start of the new infection centers. The Jcross output also shows me that the two types of trees are clustered using the frequency of the shortest distances. After ~8 meters the two tree types exhibit definite clustering. In terms of the function, the Gij(r) in the numerator of the Jij(r) function is approaching 1, or the highest frequency of very short distances.

2)   i = Remnant, j = Uninfected

The Kcross plot from the second set of analyses between remnant and uninfected trees demonstrates that there is independence between the two events up to ~15 meters. After that, the trees exhibit slight clustering effects. The lack of dispersal tendencies is strange for these two types of trees because we expect uninfected trees to be furthest away from the center of infection centers. The presence of clustering may be indicative of the small spatial scale of my study site. It may also be that the size of the infection centers are only about 15 meters (if we assume that remnant trees are the center). The Jcross plot shows something similar: at small distances the types of trees seem independent and then around 8 meters they exhibit clustering.

3)   i = Remnant, j = NonHost

The Kcross from the last set of analyses between the remnant trees and the non-hosts demonstrates a similar pattern exhibited by the regeneration trees. After about 4 meters, the trees tend to be clustered. This is an interesting find because if the non-hosts cluster to remnant trees but uninfected trees are independent, then the non-hosts may be playing a role in this. The Jcross plot shows the same: the two types of trees exhibit clustering.

4)   Comparing Kcross Functions with eval.fv()

A useful way to compare patterns of Kcross functions is using the eval.fv function. The titles of each plot tell which Kcross was subtracted from which; note the difference in scales. The first plot shows that the regenerating trees’ spatial pattern as related to remnant trees is very different from the uninfected trees’ pattern. The regenerating trees’ spatial pattern is much more similar to the non-hosts’ spatial pattern at short distances, until about 15 meters. Then the patterns differ with the regenerating trees exhibiting more of a clustering tendency. However the scale is much smaller than the other two graphs. Lastly, the third plot shows the difference between the non-host trees’ spatial pattern and the uninfected trees’ spatial pattern. There appears to be a stepwise relationship where, at very near and very far distances the non-host trees are much more clustered, but at moderate distances the differences may be less dramatic.

Critique of Cross-Type Functions

The amount of easily interpretable literature on the spatstat package as a whole is sparse, although a wealth of very technical information exists. The function was easy to use and execute though and so was the process of creating the point pattern object. These two functions can clearly show how the spatial patterns of the two types of events change with scale. It would be helpful if there was a way to compare three or more types of events. The last drawback is that there is a lack of specific information for each point on your map or study site. This pattern that is generalizable to a whole set of points may not be as useful when trying to put together a story, such as the story of a stand’s development through time.

Additional Raster Analysis

The last critique of the cross-type functions led me to attempt a visualization of these patterns on my stand. Very briefly, I determined densities of the infected, uninfected, and the non-host trees using the Kernel Density function in ArcMap. Then I classified these densities using natural breaks and coded these for raster addition. After adding all three density rasters together, I coded each unique density classification combination to tell me how the densities of the populations appeared in the study site.

It appears that there are distinct patches of high density separated by areas of low density. On the eastern side of my study site, it appears that the high density areas of infected trees cluster with the remnant infected trees. An interesting interaction is occurring between the high density patches of uninfected trees and infected trees in the western portion of my study site. The mechanism for the seemingly clear divide may be the non-TSHE trees.

Spatial Distribution of Trees by Height Class, Slope and Elevation in the HJ Andrews Forest

Guiding Questions: How do distances between trees differ depending on tree height? How does the spatial pattern of tall trees relate to the spatial pattern of slope and elevation?

Methods: I used a combination of ArcMap, QGIS and R to perform analyses and view results. I used the results of my previous distance analysis within the HJ Andrews Forest, which grouped individual trees into ten height classes and calculated the mean distance between trees within the same height class, to correlate tree spacing with other spatial phenomena. I wanted to know if hot spots in within class tree spacing correlated with hot spots in tree height, so I examined hot spots and cold spots of tree distances within each height class and compared them to tree heights, slope and elevation. Height class 1 is the shortest class of trees and height class 10 is the tallest class of trees.

I used the Hot Spot Analysis Tool in the Arc Toolbox > Spatial Statistics Tools > Mapping Clusters > Hot Spot Analysis (Getis-Ord Gi*) to perform a Hot Spot Analysis on each of the ten height classes by mean distance to the 30 closest trees of the same height class. In the context of this analysis, the interpretation of a hot spot is that it is a region of greater than expected distances between trees of the same height class. For example, in the shortest height class, 1, hot spots are regions of greater than expected mean distance between a short tree and the 30 closest short trees. Cold spots would then be regions of closer than expected mean distance between short trees.

The Hot Spot Analysis in ArcMap used a self-generated distance band of 113m for my original hot spot analysis of the global dataset (not broken up by height class), so I decided to use a distance band of 100m for each subsequent hot spot analysis. Each height class has a different number of total trees in it, so by holding the distance band constant, I hoped to avoid influence from any differences in total number of trees between height classes.

After viewing the hot spot results, I plotted the z-scores of heights for each height class against the z-scores of the distances between trees to visually examine their relationship. If both heights and distances between trees were perfectly normally distributed, one would expect a circular distribution on the density plots with a slope of zero.

I then compared the mean slopes, elevations, and standard deviation of slopes and elevation within height classes across the entire forest. Since HJA is a research forest with many different management areas, including harvested patches and research plots, I limited the next part of the analysis to only within control areas of the forest. I downloaded the most recent (2014) land use designations from the HJA data repository (http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI008). For this analysis, I used Entity Title 3: Reserved areas (controls) within the Andrews Experimental Forest. I compared slopes and elevations within the control plots only by height class, to see if there were differences between the global dataset and the control regions of the forest.

Results:

The density plots of height z-score versus distance z-score revealed a different pattern between smaller height classes of trees and tree spacing than the relationship between larger height classes of trees and tree spacing. As we go from shorter height classes of trees to taller height classes, the density plot distributions change (Figures 1-10). There is strong evidence of positive correlation between hot spots of short trees and hot spots of distance between short trees, but from height class 6-10, there is little to no evidence of a relationship between hot spots of trees and distance between them. Tall trees are more or less distributed randomly throughout the forest.

Figure 1

 

Figure 2

 

Figure 3

 

Figure 4

 

Figure 5

 

Figure 6

 

Figure 7

 

Figure 8

 

Figure 9

 

Figure 10

There is clearly some structure to the density plots (especially in height classes 1-5), so we can assume that the trees are not randomly distributed and that there is a relationship between height and distance between trees. I compared mean and standard deviation of slope, as well as mean and standard deviation of elevation for each height class of trees (Table 1). Mean slopes do not significantly differ between height classes, so slope is likely not a main driver of tree height. However, there is some evidence that tree heights differ at lower and higher elevations, with the shortest height class of trees at a mean elevation of 1014 m, and the tallest height class at a lower mean elevation of 831 m. It’s important to note that mean elevations have large standard deviations, so the trend may not be as strong as it first appears. I wanted to know if there was more evidence for this pattern, so I calculated the same statistics for subsets of the hotspot analyses constrained to only the control areas of the forest (Table 2) to get an idea of how management may or may not influence the relationship between tree height and tree spacing throughout the forest. Mean slopes and elevations, accounting for standard deviation from the mean, do not differ significantly between the global and control datasets, meaning that the control regions are reasonable representations of the rest of the forest. To examine this further, I examined the same data for the entire forest excluding the control areas (Table 3). The same pattern holds between the three datasets; class 10 is the only class that is consistently at lower elevations across the forest. I made two density plots to display this relationship. Figure 11 shows the distribution of elevation for the shortest height class (class 1) of trees (green) versus the global dataset of all trees (red). The short trees follow the same distribution as the rest of the dataset, meaning that they are dispersed more or less evenly across elevations. Figure 12 shows the distribution of the tallest height class of trees (class 10; light blue) by elevation versus the global dataset of trees by elevation (red). This clearly shows that tall trees are not distributed at higher elevations.

Figure 11. Density of trees by elevation in height class 1 (shortest trees; green) versus global dataset of all trees (red).

 

Figure 12. Density of trees in height class 10 (tallest trees; light blue) versus global dataset of all trees (red) by elevation.

Table 1: Global dataset

Height Class Mean Slope SD Slope Mean Elevation (m) SD Elevation
1 23 11.7 1014 294
2 23 11.6 1008 291
3 25 11.6 990 295
4 25 11.5 948 291
5 25 11.3 931 293
6 26 10.9 972 291
7 27 10.5 982 250
8 26 10.6 959 221
9 25 10.7 926 207
10 23 11.2 831 197

 

Table 2: Subset of control regions

Height Class Mean Slope SD Slope Mean Elevation (m) SD Elevation
1 27 11.9 1104 317
2 27 11.5 1136 328
3 28 11 1157 329
4 28 10.8 1143 311
5 28 10.2 1133 285
6 28 10 1071 262
7 28 10 992 228
8 27 10.6 955 193
9 26 10.8 935 187
10 24 11 869 189

 

Table 3: Global dataset excluding control regions

Height Class Mean Slope SD Slope Mean Elevation (m) SD Elevation
1 22 11.4 991 284
2 22 11.4 979 273
3 24 11.6 952 272
4 24 11.5 902 266
5 24 11.5 867 265
6 25 11.3 908 290
7 26 10.7 973 267
8 25 10.5 964 242
9 23 10.4 919 221
10 22 11.2 793 198

 

Critique of the method:

A criticism of hot spot analysis is that it’s basically a smoothing function that places a focal around an area but does not account for the distribution of values within that area. So, the tallest tree in the dataset could be in cold spot (region of shorter than expected trees) and the hot spot analysis would give you no indication of that, so one may miss out on potentially useful/interesting information.

This is only a cursory look at the data and a next step is to more closely examine how slope, elevation and aspect influence distribution and height of trees, particularly within the control areas of the forest.

Fire Refugia’s Effects on Clustering of Infected and Uninfected Western Hemlock Trees

Overview

For Exercise 1, I wanted to know about the spatial pattern of western hemlock trees infected with western hemlock dwarf mistletoe. I used a hotspot analysis to determine where clusters of infected and uninfected trees were in my 2.2 ha study area (Map 1). I discovered a hot spot and a cold spot, indicating two clusters, one of high values (infected) and one of low values (uninfected).

In my study site, 2 fires burned. Once in 1829, burning most of the stand, and then again in 1892, burning everywhere except the fire refugia (polygons filled in blue). This created a multi-storied forest with remnant trees located in the fire refugias. One component of the remnant forest are infected western hemlocks. These remnant hemlocks serve as the source of inoculum for the hemlocks regenerating after the 1892 fire.

For Exercise 2, my research question was: How does the spatial pattern of fire refugia affect the spatial pattern of western hemlock dwarf mistletoe?

I predicted that a cluster of infected western hemlocks are more likely to be next to a fire refugia than a cluster of uninfected trees. In order to assess this relationship, I used the geographically weighted regression tool in ArcMap.

Geographically Weighted Regression

Geographically weight regression (GWR) works by creating a local regression equation for each feature in a data set you want to analyze, using an explanatory variable(s) to predict values for the response variable, using the least squares method. The Ordinary Least Squares (OLS) tool differs from GWR because OLS creates a global regression model (one model for all features) whereas GWR creates local models (one model per feature) to account for the spatial relationship of the features to each other. Because the method of least squares is still used, assumptions should still be met for statistically rigorous testing. The output of the GWR tool is a feature class of the same type as the input, with a variety of attributes for each feature. These attributes summarize the ability of the local regression model to predict the actual observed value at that feature’s location. If you have an explanatory variable that explains a significant amount of the variation of the response variable, this is useful for seeing how its coefficient varies spatially.

Execution of GWR

To use this tool, I quantified the relationship between the trees and the fire refugia. I used the “Near” tool for this to calculate the nearest distance to a fire refugia polygon’s edge. This was my explanatory variable. My response variable was the z-score that was output for each tree from the Optimized Hot Spot Analysis. Then I ran the GWR tool. I then used the Moran’s I tool to check for spatial autocorrelation of the residuals. This is to check the clustering of residuals. Clustering indicates I may have left out a key explanatory variable. The figure below displays my process.

I tested the relationship between nearest distance to a fire refugia polygon’s edge and the z-score that was output for each tree from the Optimized Hot Spot Analysis using OLS, which is necessary to develop a well specified model. My R2 value for this global model was 0.005, which is incredibly small. Normally I would have stopped here and sought out other variables to explain this pattern, but for this exercise I continued the process. 

Results

This GWR produced a high global R2 value of 0.98 (Adj R2 0.98) indicating that distance to refugia does a good job of explaining variance in the spatial pattern of infected and uninfected trees. However, examining the other metrics for the local model performance gives a different picture of model performance.

Map 2 displays results for the coefficients for the explanatory variable of distance to nearest refugia. As this variable changes, the z-score increases or decreases. These changes in z-scores indicate a clustering of high or low values. From examining the range of coefficient values, the range is quite small, -0.513 to 0.953. This means that across my study site, the coefficient only changes slightly from positive to negative. In the north western corner, we see a cluster of positive coefficient values. Here, as distance to refugia increases, the z-score of trees increases, predicting a clustering of infected trees. These values are associated with high local R2 values (Map 4). In other places of the stand we see slight clustering of negative coefficients, indicating distance to refugia decreases the z-score of trees, predicting a clustering of uninfected trees.

Map 3 displays the standardized residuals for each tree. Blue values indicate where the local model over-predicted what the actual observed value was, and red values are under-predictions. When residuals from the local regression models are distributed randomly (i.e. not clustered or dispersed) over the study area, then the geographically weighted regression model is fit well, or well specified. The residuals of the local regression models were significantly clustered. (Moran’s Index of 0.265, p-value of 0.000, z-score of 24.344). Because we can observe clustering in my study area of residuals, there is another phenomenon driving the changes in z-scores; in other words, driving the clustering of infected and uninfected trees.

From the previous two map evaluations I saw that the distance of a tree to fire refugia was not the only explanatory variable necessary to explain why infected and uninfected trees clustered. Map 4 displays the local R2 values for each feature. The areas in red are high local R2 values. We see the northwestern corner has a large number of large values which correspond to a cluster of small residuals and positive coefficients. Here, distance to fire refugia explains the clustering of infected trees well. The reverse is observed in several other places (clusters of blue) where distance to fire refugia does not explain why infected or uninfected trees cluster. In fact the majority of observations had a local R2 of 0.4 or less. From this evaluation, I believe this GWR model using distance to refugia does a good job of explaining the clustering of infected trees, but not much else.

Critique

GWR is useful for determining how the coefficient of an explanatory variable can change across an area. One feature in a specified area may have a slightly different coefficient from another feature, indicating these two features are experiencing different conditions in space. This allows the user to make decisions about where the explanatory has the most positive or negative impact. This result is not something you can derive from a simple OLS global model. This local regression process is something you could do manually but the tool in ArcMap makes this process easy. The output of GWR is also easy to interpret visually.

Some drawbacks are that you need to run the OLS model first for your data to determine which variables are significant in determining your response variable. If not, then a poorly specified model can lead to inappropriate conclusions about the explanatory variable (i.e. high R2 values). Also, the evaluation of how the features interact in space is not totally clear. The features are evaluated within a fixed distance or number of neighbors, but there is no description for how weights are applied to each neighboring feature. Lastly, for incidence data, this tool is much harder to use if you want to determine what is driving the spatial pattern of your incidence data. Some other continuous metric (in my case a z-score) must be used as the response variable, making results harder to interpret.

Model Results Follow-Up

After finding that distance to a refugia was not a significant driver for the majority of trees, I examined my data for other spatial relationships. After a hotspot analysis on solely the infected trees, I found that the dispersal of infected trees slightly lined up with the fire refugia drawn on the map (Map 5).

Among other measures, forest structure was used to determine where fire refugia were located. Old forest structure is typically more diverse vertically and less clustered spatially. Also infected western hemlocks are good indicators of fire refugia boundaries because as a fire sensitive tree species, they would not survive most fire damage and the presence of dwarf mistletoe indicates they have been present on the landscape for a while. From the map we can see that the dispersal of infected trees only lines up with the refugia in a few places. This mis-drawing of fire refguia bounds may be a potential explanation for under-performance of the GWR model.

Point Pattern Analysis of Tree Distribution by Height in the HJ Andrews Forest

1. Given that HJ Andrew Experimental Forest is a 16,000-acre forest with manipulations spanning back to its establishment as a Long-Term Ecological Research (LTER) site in 1948, it is a highly spatially heterogeneous ecosystem. Forest harvest began in the 1950s and resulted in a mosaic of young plantation forest (~30 percent of total forest area) and old growth (~40 percent of forest) (http://andrewsforest.oregonstate.edu/). My objective is to quantify the spatial pattern of trees across the forest and eventually relate that to quantifiable landscape features.

Motivating Questions: How does the spatial pattern of trees vary across the HJ Andrews Forest? Specifically, I’m exploring the relationship between tree height and tree spacing. One specific question of interest is: How does the mean distance between trees in the same height class differ from the mean distance between a single height class of tree and all other trees? This question attempts to address the clustering vs. dispersion of trees by height.

This is an analysis of the spatial distribution of one variable, tree height, so a consideration of the internal processes that may influence the spatial distribution of this variable is necessary.

  • Microclimate caused by the clustering or dispersion of trees could be either an attraction or repulsion process. Microclimate influences relative humidity and exposure to wind, along with many other factors, so clusters of trees would tend to have different microclimate features than more dispersed trees.
  • Population and community dynamics will influence the spatial distribution of trees. The speed at which a colonizer can take over a space and competition between different colonizers influence the distribution. Aboveground and belowground tree growth adn spacing of trees will be influenced by these factors.
  • Source and sink processes in a forest may result from topographical features, like valleys and hillslopes. I expect valleys to be a source (capable of producing a surplus of organisms) sink they will tend to be in areas near streams, so not water limited and in areas that serve as catchments for nutrients, so not nutrient limited.
  • Spatial distribution of tree height certainly is different according to the scale. The spatial pattern looks different at a single tree scale compared with a 50-m scale, compared with a 5-km scale. Some of these differences are revealed by Figures 3, 4 and 5, below.

2. My approach is to use k-nearest neighbor to examine distance between a given tree and proximal trees. I created ten height classes by using kmeans to find ten cluster centers in the tree height data, then used K nearest neighbor to examine the distance between each cluster center and the 30 closest trees.

3. For this analysis, I used a LiDAR dataset of vegetation heights downloaded from the HJ Andrews online data portal (http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI011). The selected data is from the third entry, titled, “Vegetation Height digital elevation model (DEM) from 2011 LiDAR, Upper Blue River Watershed (Oct 27th, 2011 – Nov 1st, 2011). A description of this dataset can be found here: http://andlter.forestry.oregonstate.edu/data/spatialdata/gi01103.htm.

I performed the majority of the analyses in R and used QGIS for data visualization. I used the ‘st_read’ function from the ‘sf’ package in R to read in the stem map shapefile (stem_map.shp). The stem map shapefile includes crown radii (m) and tree heights (m) as well as the point locations of trees.

Because the stem map shapefile essentially provides a census of trees within the HJA Forest, and that is (1) way too much data to deal with at one time, and (2) not useful to perform statistical analyses on since we can just look at the data mapped out and visually discern where trees are located, I decided to use kmeans to group the trees into 10 height classes, resulting in the data space being partitioned into Veronoi cells.

I used the ‘nngeo’ package to examine k-nearest neighbors relations between and within height clusters. I examined the relationship between a single tree (point) and its nearest neighbor of the same height class. I also examined the relationship between a single tree of one height class and its nearest neighbor of any height class to start to elucidate to what extent trees of similar heights cluster or are dispersed spatially.

I performed T-tests on each tree height class to test if there was a significant difference between (1) the mean distance between a tree and the next closest tree within the same height class and (2) the mean distance between a tree and the next closest tree of any height class. I calculated the mean, standard deviation and kurtosis of each height class distance (both within and between height classes).

4. Results

The table below shows the center of each of the ten height class groupings, mean distances between and within tree height classes, standard deviation and kurtosis of those means, and p-values. Results of all t-tests were significant (p<0.01), meaning that there is strong evidence that the within group mean distances are not equal to zero, so there are differences in mean distances between trees of the same height class (within group) and mean distance to the closest trees of any height class. In other words, the distance between a large tree and its nearest neighbor (of any height class) is significantly different than the distance between a large tree and its nearest neighbor within the large tree height class. The same is true of trees within and outside of the small height class, as well as each of the other ten height classes.

Table 1. Results from k-nearest neighbor analysis and t-tests

Tree Class Class Center (Height (m)) Mean Distance to 30 closest trees (m) St Dev Mean Distance Kurtosis Mean Distance Mean Distance to Closest Tree (m) St Dev Mean Mean Distance to Closest Tree Within Group (m) St Dev Group Mean P-value
1 11.8 12.5 4.4 6.4 2.1 2 4 6.3 <0.01
2 15.1 11.7 3.9 3 2.4 1.7 4.9 6.6 <0.01
3 19.9 12.6 3.4 4 3.6 1.3 6.8 6.7 <0.01
4 24.4 13.4 3.1 5.6 4 1.4 6.9 6.5 <0.01
5 29.7 14.6 2.9 3.9 4.6 1.4 8 6.9 <0.01
6 35.5 16.7 3 2.8 5.3 1.6 9.3 7.2 <0.01
7 42 18.6 2.6 4.5 6.1 1.7 10.1 7 <0.01
8 50 19.7 2.5 6.6 6.9 1.8 11.4 7.3 <0.01
9 59 20.5 2.6 3.1 7.5 1.8 13.5 8.7 <0.01
10 70 21.2 2.7 0.7 8.1 1.9 15 11 <0.01


Fig 1. Distribution of all tree heights (m) in HJ Andrews Forest.

 

 

 

Fig 2. Mean distance (m) and standard deviation (m) between trees of each of ten height classes and the next closest 30 trees within that height class. Generally, as trees get larger, mean distance between them is larger.

Fig 3. Distribution of the tallest tree class (70 m tree class) across the HJ Andrews Forest.

 

Fig 4. A representative distribution of all tree height classes on either side of a road in the HJ Andrews Forest, showing the extent of clustering and the extent of dispersion of height classes. Height classes are in ascending order from smallest class (~12m tall; Class 1) to tallest class (70m tall; Class 10).

 


Fig 5. A closer look in the same area at Fig. 4, where small trees are clustered near the road and clustered tightly together, while larger trees are more dispersed.

 

Fig 6. Mean distance between trees of the same height class and other trees of the same height class (blue) and mean distance between trees of one height class and any other tree (red). The overlapping red confidence interval with the blue points suggests that the average distance between small trees is not significantly different than distance between small trees and any trees. The general upward trend suggests that as trees get taller, the distance between them increases and variance slightly decreases.

5. Critique of the method:

The results make sense, but do not provide much more information about the actual distribution of trees (clustering vs. dispersion) than simple maps of point data, so the next step might be to examine tree heights within different management regimes. The current analysis tells me that trees are somewhat clustered by height, and that the mean distance between a tree of one height class to a tree of the same height class is, in most cases, different from the mean distance of a tree of one height class to a tree of any other height class. I’ve examined a map of different management regimes within the HJ Andrews Forest and there are clear areas of old growth, harvested areas, clearly defined plots, etc., so I would expect some of these areas to show tree clustering by height class. The patterns I found using this analysis were not as clear as I was expecting. Using kmeans and nearest neighbor analysis is a great way to start to examine the spatial relationships between and among data, but with such a large and highly varied dataset there can be shortcomings, especially when it comes to drawing any concrete conclusions.

References:

HJ Andrews Online Data Repository: http://andlter.forestry.oregonstate.edu/data/catalog/datacatalog.aspx

Johnson, S.; Lienkaemper, G. 2016. Stream network from 1997 survey and 2008 LiDAR flight, Andrews Experimental Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=HF013 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/66d98881d4eb6bb5dedcbdb60dbebafa.

Spies, T. 2016. LiDAR data (August 2008) for the Andrews Experimental Forest and Willamette National Forest study areas. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI010 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/c47128d6c63dff39ee48604ecc6fabfc.

Spies, T. 2016. LiDAR data (October 2011) for the Upper Blue River Watershed, Willamette National Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI011 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/8e4f57bafaaad5677977dee51bb3077c.

Spies, T. 2014. Forest metrics derived from the 2008 Lidar point clouds, includes canopy closure, percentile height, and stem mapping for the Andrews Experimental Forest.. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=TV081 (10 April 2019) DOI:http://dx.doi.org/10. 6073/pasta/875e10383e8c8aee3c9a49e0155eef1d.

Ex. 1: Spatial Autocorrelation in Stream Cross-Sectional Change In the Andrews Forest

Question that you asked

How much spatial autocorrelation exists in historic streambed changes in the Andrews Forest, both in the across-stream direction and the along-stream direction?

Name of the tool or approach that you used

I addressed each direction of the problems separately using. Since this resulted in one-dimensional, evenly spaced or faux-evenly spaced data, I just used the acf function in R to analyze the data.

Critique of the method – what was useful, what was not?

The methods provided an interesting look at patterns within the cross section data set. However, I think the one-dimensional approach fails to capture interactions between across- and along- channel changes as well as any temporal autocorrelation.

Description of steps you followed to complete the analysis and results you obtained.

For my project, I am looking at changes in the streambed along 4 stream reaches (black boxes) along two streams (blue lines) in the Andrews forest. The watershed of the larger stream, Lookout Creek is shown in black.

Researchers in the Andrews forest repeatedly surveyed fixed cross sections along the reaches from 1978 to 2011. In this exercise, I worked with a subset of the data from 1978 to 1998. The cartoon below does not contain real data but shows the arrangement of numbered reaches (blue lines) and their fixed endpoint stakes (blue dots) along a stream.

Part A – across-channel changes

Each individual cross section spans from bank to bank and show the topography and bathymetry across the width of the stream. Below are two consecutive years of cross-sectional profiles of the stream bed at Lower Lookout Creek, cross section # 3.

Profile from 1995.

Profile from 1996.

It looks like there was a lot of change between these two profiles. The active stream channel appears to have avulse from one side of the cross section to the other.

I compared the two profiles to see how much elevation was gained or lost.

Here grey line represents the surface of the stream bed in 1995, and the black line represents the surface of the stream bed in 1996. The area show in red represents material that was eroded away, and the green area represents material that was newly deposited.

We can simplify the plot above to show only the positive or negative elevation change across the channel. The figure below shows the same information as the figure above, but directly in terms of elevation change.

You can clearly see from this figure that this site at this year had one broad area of elevation gain or deposition, and one broad area of elevation loss or erosion. This isn’t precisely an attraction/repulsion phenomenon, but there can be positive feedbacks associated with erosion and deposition. For example, sediment deposited in a stream may locally obstruct or slow down the water passing by it and encourage deposition in adjacent areas.

Now, not every pair of years had this *much* elevation change, but also, not every pair of years shows such broad regions of positive or negative elevation change. Here is a figure of the elevation change at the same cross section between 1996 and 1997. You can see that elevation change is both smaller in magnitude and shows up in smaller patches.

I used spatial autocorrelation metrics to investigate the patch size. Note that autocorrelation at some scales is always expected when looking at topographic change. Even if you had extremely precise and accurate measurements, you would expect the stream bed change at one point to be strongly correlated with the change a few cm away both because the two points probably experience very similar forces and also because gravity tends to smooth out topographic irregularities in nature. But in most scenarios, you would probably not expect to find a correlation between the changes in a streambed with a change at a point 100 m away from the bank.

I picked two distances that seemed potentially interesting: 1 meter and 5 meters. Then I investigated autocorrelation at 1m and 5m lag distances using the acf function in R.

These lag distances are superimposed on the first change graph in the figure below.

I expected that all cross sections would have higher autocorrelation at the shorter lag distance than the longer one, but I also expected that a set of cross sections with very large patch sizes might show relatively higher autocorrelation at the larger lag distance compared to the small-patched cross sections. I need to spend more time looking into how the magnitude of the changes influences the autocorrelation function so that I can understand it better.

I repeated the process for every cross section at every year. Results are shown below.

The cross sections showed mostly positive autocorrelation at 1 m. Some years showed more autocorrelation across the board than others, and some individual cross sections seemed like they frequently had high autocorrelation values at multiple years.

Different patterns emerge at this spatial scale. At a 5 m lag, neutral or negative autocorrelation was much more common.

 

Part B – along-channel changes

I also wanted to look at autocorrelation between cross sections. Was the change at one cross section related to the change at cross sections immediately up- or down-stream?

Because I don’t have true geographic locations of the cross sections, I couldn’t do any fancy two dimensional work like interpolating the data into a river bathymetry model. Instead, I consolidated the cross-sectional change data into a single metric for each cross section.

Here is another look at the change plot I showed earlier. This is the same plot as before, but I’ve colored the positive values green to show deposition and the negative values red to show erosion.

I simplified this plot by removing the horizontal across-stream component and summing the red and green areas separately.

Then I had a choice: I could add the red and green areas together to show the total area of the cross section that changed in any way, or I could subtract the red area from the green area to show the net deposition in the cross section. Each of these metrics could be interesting for different reasons. For example, the total area of change might be relevant for ecological processes, but the net overal change could be relevant for tracking net sediment transport in these streams.

I started by adding the boxes together to calculate the total area of change.

In this way, I made a score for every cross section for every year pair. Hypothetical scores for the cartoon river are shown below.

Then I used the acf function again in R to determine how correlated adjacent cross section scores were with each other, compared to more distant cross section scores. This method is not fully spatial explicit, since I did not include any information about how far apart each cross section was from its neighbors. I just assumed that they were roughly equidistant within each reach. Indeed, the distance between cross sections can become hard to define when they are placed along curved parts of the stream.

I calculated these values with every year pair on each of the four reaches. I then repeated it with the subtraction method described above.

For the “addition” method, autocorrelation values were generally close to neutral with some reaches in some years tending towards positive values. Positive values could occur if changes in one reach directly influenced an adjacent reach (i.e. if upstream channel change influenced the hydraulics directly downstream) or if adjacent cross sections were influenced by similar other factors (i.e. two adjacent cross sections placed on a steep part of the reach might experience more similar changes to each other than two cross sections placed in a shallower part of the reach.)

The pattern is different for net change, where negative autocorrelation is more common. I think this is very interesting, because it implies that parts of the stream that had net deposition were more likely to be adjacent to parts of the stream that had net erosion. This could be an expression of a sink/source phenomenon where material scoured from one area is transported and deposited immediately downstream at the next cross section.

Exercise 1: What is the spatial pattern of western hemlock dwarf mistletoe at the Wolf Rock reference stand?

For Exercise 1, I wanted to analyze the spatial pattern of western hemlock dwarf mistletoe infections in live western hemlocks on my 2.2 ha reference stand (Wolf Rock). This was without considering any attributes of the western hemlock trees themselves. Simply, what was the spatial pattern of infection?

To answer this I used the “Average Nearest Neighbor” tool in the Spatial Statistics toolbox in ArcMap. This tool calculates a z-score and a p-value from that z distribution. This is a commonly used method in dwarf mistletoe literature for assessing the clustering of infection centers. Also, the equations for this tool assume that points are free to locate wherever in space and that there are no barriers to spread.

ArcMap makes running these analyses very simple so I created a selection of infected trees (red dots), created a new feature, and then ran the tool. The p-value from my test was 0.097 and my Nearest Neighbor Index was 0.970, indicating that the spatial pattern of the infections are somewhat clustered with an alpha of 0.10.

Average Nearest Neighbor is a good test for analyzing whether or not a set of coordinates are clustered. The degree of clustering of may be harder to interpret as a lower p-value may not necessarily mean points are more clustered. Also I was unable to see where my clusters are, and if my intuitions match the analysis (see map). One other important consideration is the study area. Changes in analysis area can drastically change the result of your clustering analysis (i.e. larger study areas may make data look more clustered). Lastly, there was no option for edge correction. This may have skewed some of the clustering results along the edge of my study site and 2.2 ha is pretty small to be subsampled without losing a lot of my data.

Prologue

After confirming that my infections were clustered, I wanted to see if the pattern I saw in my map, was actually on the ground. I wanted to know, where are infected trees clustered with infected trees and where are uninfected trees clustered with uninfected trees? Again, this was without considering any attributes of the western hemlock trees themselves.

I used the “Optimized Hot Spot Analysis” tool in the Mapping Clusters toolbox to analyze the incidence of infection data (0 = absence, and 1 = presence). The Optimized Hot Spot Analysis tool can automatically aggregate incidence data that are normally not appropriate for hot spot analysis. It also calculates several other metrics for me that made analysis easy. I could take these automatically calculated metrics and alter them in a regular hot spot analysis if needed.

This map displays clustering that matched up closely with my intuitions from Map 1. On the left, the blue values show a cluster of uninfected trees that are closely clustered with other uninfected trees. The larger swath on the right show a cluster of trees that are closely clustered with other infected trees. In the middle a mix of uninfected trees and infected trees are mixed without displaying any significant clustering. Lastly, small clusters in the top left and bottom left of infected trees were identified. These clusters may be edge of larger clusters outside my stand, or lightly infected trees that are starting a new infection center. These results will be extremely valuable in informing my steps for Exercise 2 because I can assess the conditions of both patches and determine differences between the two. I can also determine if distance to the refugia impact the clustering of infection because it appears the infected cluster is closer to the fire refugia.

The hot spot analysis was extremely useful for analyzing and displaying the information I needed about the clustering and was very useful for building off of the Average Nearest Neighbor analysis.

My data set also included a severity rating for dwarf mistletoe infected western hemlocks in my study site. I ran a similar hot spot analysis to above to determine if there were any similarities with how severity played out in the stand compared to solely incidence data. My data ranged from 0 – 5, 0 indicating uninfected trees and 5 indicating most heavily infected. These are classified data, not continuous but still appropriate for the optimized hot spot analysis. Western hemlock dwarf mistletoe forms infection centers, starting from a residual infected western hemlock that survived some disturbance. From there the infection spreads outwards. Another facet of infection centers is that the most heavily infected trees are almost always aggregated in the center of the infection center and infection severity decreases as you move towards the outside of the infection center. This is intuitive when you think about infected trees in terms of the time they’ve been exposed to a dwarf mistletoe seed rain: the trees in the center of the infection center likely have been exposed to infectious seed the longest. These trees can be rated using a severity rating system that essentially determines the proportion of tree crown infected. This is calculated in a way that gives a rating that is easily interpretable, in this case, 0-5.

This third map tells me about how severity is aggregated in the stand. I can see that the wide swath in the middle of the stand, associated with the fire refugia, has the largest aggregation of severely infected trees. This is what I expected in the stand because the trees in the fire refugia survived the fire and provide an infectious seed source for the post-fire regeneration. Also, on the edges of this high severity cluster, are lower severity values indicating the expected pattern of infection centers are playing out. The west side of the stand shows a large clustering of low severity ratings. We can see that the high density of uninfected trees, falls into our cold spot of low or no severity. Interestingly, the hot spot of trees found previously  in the southwest corner, is actually a cluster of low severity trees. This may be a new infection center forming or an exterior edge of another infection center outside the plot.  Lastly, the two pockets of low severity on the east side of the stand are more distinct when considering their severity.

This second application of hot spot analysis tells another story about my data and how dwarf mistletoe is patterned spatially. The non-significant swath in the center of my stand using the incidence data turns out to be a significant clustering of highly infected trees among other new observations.

 

Cross-sectional Change in the Andrews Forest

Background and Research Questions

This project explores stream bed mobility in the HJ Andrews experimental forest in relationship to peak discharge events and channel geometry. The HJ Andrews Experimental Forest is a Long Term Ecological Research (LTER) Site located east of Eugene on the western slope of the Cascade Range. The site has been managed since the 1950s for ecological and forestry research, and the largest stream in the forest has been gauged since 1950. From 1978 to 2011, researchers conducted repeated cross sectional surveys on five reaches in three different creeks in the forest. An analysis of these cross-sectional profiles will help researchers and managers gain a better understanding of how, when, and where stream beds responds to extreme hydrologic events.

Water flowing through streams exerts stresses on the bed material. Whether or not these stresses have the capacity to mobilize sediment and change the shape of a stream bed depends on the amount of water moving through the stream along with other factors. As with many geomorphological processes, bed sediment transport is dominated by movement during extreme events. Although cross sectional changes overall appear to be strongly related to the magnitude of greatest flow between measurements, I am interested in investigating confounding factors including the extent of temporal and spatial autocorrelation in the data set.

I would like to explore (a) if and (b) in what manner aggradation or erosion in one cross section might be related to changes in adjacent cross sections. I would also like to know if aggradation or erosion in one cross section in one year are related to changes in the same cross section in an adjacent year.

Data

I am analyzing cross sectional measurements of five reaches within the Andrews Forest, which range from 12 m to 55 m in horizontal extent and 1.2 to 7.3 meters in vertical extent. The cross sections are variously spaced along an along-stream dimension, and they are were surveyed every one to five years over a period of 30 years between 1978 and 2011. The vertical precision of the data set is roughly 1 cm (though the data are unlikely to be accurate to 1 cm) and the horizontal precision varies from 1 cm to several decimeters.

The cross sectional change between two adjacent pairs of years at one cross section is shown below. Areas of erosion are shown in red and areas of aggradation are shown in green.

Hypothesis and Approach

I predict that there will be a relationship between changes at one cross section during one year and the same cross section at another year. Portions of the stream bed that have been recently scoured or contain newly emplaced sediment may be less armored than undisturbed portions of the stream bed. These less armored portions of the stream bed may be more susceptible to future disturbance. Alternately, changes at a cross section may represent longer-term processes related to channel geometry: e.g. a series of cross sections could show continued incision of a cut bank over multiple years.

I do not expect to see a detectable relationship between changes at adjacent cross sections because I expect that the most important geographic controls on channel change are either smaller (e.g. pools) or much larger (e.g. along-stream variation in discharge) than the distance between cross sections.

I want to use this class as an opportunity to explore statistical relationships, but I don’t know yet what kinds of analyses are best suited to this problem. I’d like to learn more in general about how to handle spatial autocorrelation, and when it is and isn’t a statistical issue.

Justification
This project could be scientifically useful for improving our understanding of sediment transport in Pacific Northwest mountain streams. Resource managers may also have an interest in sediment transport because it relates to stream channel mobility (“How much can we depend on this creek staying in the same place?”) and ecology (“How vulnerable is stream life to disruption via bed transport?”). From a resource management perspective, it is becoming increasingly useful to study peak flow events because downscaled climate models for our region predict increased frequencies of large storms.

Preparation

I feel good about my experience with spatial technology, and I’m most interested in learning about how to use that technology to answer statistical questions. I am highly proficient with Arc software. I have TAed one undergraduate class and independently taught another short undergraduate class in ArcGIS. I have a working knowledge of ArcPy, but I still need to use references regularly to write code. I conducted some undergraduate remote sensing research using ArcPy and used ArcPy for research at a government science agency. I work extensively in R. I’ve done image processing using Arc software, Python, ENVI, ImageJ, and raster tools in R, but it’s a very broad field, and I definitely think I could learn more.

 

Epidemiology of Western Hemlock Dwarf Mistletoe Post – Mixed Severity Fire

Description of the Research Question

My study site is located in the HJ Andrews Experimental Forest, where the two most recent mixed severity fires burned leaving a sizable fire refugia in the middle of the stand. Western hemlock dwarf mistletoe (WHDM) survived the fire in this refugia. WHDM spreads via explosively discharged seed and rarely by animals. This means that for the pathogen to spread, the seed must reach susceptible hosts. WHDM is a obligatory parasite of western hemlock. The regeneration post fire may pose a barrier to the pathogens spread because of the structure and composition. Lastly, the structure of the fire refugia may determine the rate of spread. This is because the structure largely determines where infections exist in the vertical profile of the canopy.

My research question is: How is the spatial pattern of WHDM spread extent and intensification from fire refugia related to the spatial pattern of the structure and composition of the fire refugia and the surrounding regeneration via barriers to viable seed reaching susceptible hosts?

I have three objectives related to the spatial organization or the regenerating forest surrounding the fire refugia and one related to the fire refugia itself. My objectives are to determine how fire refugia affects dwarf mistletoe’s spread and intensification through:

  1. The stand density of regenerating trees and of surviving trees, post fire.
  2. The stand age and structure of regenerating trees and surviving trees in fire refugia.
  3. The tree species composition of the regenerating trees adjacent to fire refugia.

AND…

  1. Whether intensification dynamics of WHDM inside a fire refugia resemble those of WHDM in other infection centers.

Dataset Description

Spatial locations of the extent of spread and intensity (also referred to as severity) of WHDM infection include presence/absence of infection in susceptible hosts and will have a severity rating for each infected tree. The presence/absence data will be two measurements, one from 1992 and one from 2019. The intensity rating will be from 2019 only.

Spatial locations/descriptions of the structure and composition of the fire refugia and regeneration surrounding refugia include X,Y of each tree, a variety of forest inventory attributes such as diameters, heights, and species for each tree, and delineation of the fire refugia boundaries. This data has been measured several times: 1992, 1997, 2013, and 2019. The GPS coordinates were recorded with handheld GPs units most likely under canopy cover so resolution may vary.

These data sets are bounded by a 2.2 ha rectangle that confines the study area.

Hypotheses

The spatial pattern of western hemlock dwarf mistletoe in the study site will be several discrete clusters. This arrangement forms because WHDM spreads from an initial infection point outward. The initial infection serves as an approximate center point and the cluster, or infection center, grows outward from there. Separations between clusters are maintained by forest structure and composition or disturbances. New clusters form from remnant trees surviving disturbances, or random dispersal of seed long distances by animals. In this case, the fire refugia protect the remnant trees from disturbance and these trees are the new focal points for infection centers

The spatial pattern of the forest structure and composition will drive the direction and rate of spread of WHDM because variances in these two attributes will cause varying amounts of barriers to WHDM seed dispersal. Because seeds are shot from an infected tree, that seed needs to reach a new uninfected branch. This means physical barriers to spread can affect seed dispersal and non susceptible species will stop spread.

Approaches

I would like to utilize spatial analyses that allow me to understand how the clustering of WHDM is related to the boundaries of the fire refugia. Also, how the infection centers have changed over time utilizing forest structure and composition metrics of the regeneration surrounding the refugia. Lastly, something that can incorporate a severity rating on a scale instead of simply presence/absence data and describe the distribution of severely infected trees vs lightly infected trees and how that relates to the fire refugia boundary and forest metrics of the regeneration surrounding the refugia.

Expected Outcome

I would like to produce statistical relationships that can determine the significance of forest density, species composition, age, and structure on the ability of WHDM to spread. Also, I would like to produce statistical relationships that can describe whether or not a fire refugia alters the way WHDM spreads and intensifies when compared to commonly observed models. Maps as visuals for describing the change over time would be a useful end product as well.

Significance

Understanding the spread patterns of WHDM is important for resource managers seeking to increase biodiversity and produce forest products. Focus has shifted to creating silvivultural prescriptions that emulate natural disturbances that are still economically viable and that maintain ecosystem functions. Disturbance events can control WHDM but also create opportunities to increase its spread and intensification so managers need to have an understanding of how a particular forest structure will affect WHDM. Also, if we want to maintain biodiversity, understanding how WHDM infection centers created by fire develop is important. Fire frequency and severity may be increasing in the future and the loss of mixed severity fire would mean a significant loss of WHDM. Land managers seeking to emulate burns can use this information to plan burns that preserve patches of WHDM if desired and understand how the pathogen will progress 25 years later. This is not usually the case for forest pathogens.

Level of Preparation

I have worked in ArcMap quite a bit, but I haven’t much experience with the wide range of functionality of ArcInfo. I used ModelBuilder somewhat to keep my queries and basic analyses organized. No experience programming in Python. I have taken two stats classes before this using R and feel I have a working knowledge and have no problem learning new tools in it. However, I have very little work with image processing such as working with rasters and some small exposure to LiDAR processing.

Exploring spatial variation in drivers of soil CO2 efflux in HJ Andrews Forest

Description of Research Question

My objective is to capture modes of variance that exist between and within a subset of variables that I expect to correlate most strongly with soil CO2 efflux in the HJ Andrews (HJA) forest. By stratifying the forest, I plan to determine future sampling sites that will be used to explore the relationship between soil C inputs, soil C stocks and CO2 outputs. Aboveground and belowground biomass are major sources of soil carbon and drivers of soil respiration, so biomass will be used as a proxy for soil CO2 efflux for the purposes of this analysis.

Research Question: How is the spatial distribution of biomass in the HJA forest related to stand age, slope, aspect, elevation and geomorphon as a result of varying degrees of exposure to solar radiation, wind gusts, precipitation, humidity, etc.? What is the overall variance of the HJA forest along these vectors and is that variance spatially autocorrelated?

Description of Dataset

I will use LiDAR data from 2011 and 2014 at 0.3-0.5 m vertical resolution and at 1 m2 horizontal resolution covering the Lookout Creek Watershed and the Upper Blue River Watershed to the northern extent of HJA. These LiDAR data include a high hit model and a bare earth model. I will also use NAIP imagery to approximate forest stand age, which is 1 m resolution and covers years between 2002 and 2018.

Hypotheses

I expect areas containing more biomass to positively correlate with south-facing slopes due to more exposure to solar radiation resulting in faster rates of vegetation growth. I expect older stands to positively correlate with greater biomass. I expect steeper slopes to correspond to less biomass due to more weathering and thinner soil horizons, supporting less growth. I expect higher elevations to correspond to lower biomass due to greater sustained winds, higher windspeeds and more snow accumulation. As geomopohon describes the geometric structure of the terrain, it is a collection of multiple factors that could positively or negatively correlate with biomass. For example, I expect a ridge to negatively correlate with biomass because of the combination of greater slopes and more exposure to winds, while I expect a valley to positively correlate with biomass due to more wind protection and thicker soil horizons with more organic matter and more water retention.

Approaches

With an end goal of identifying sampling sites, I’ll need to cluster or stratify the HJA forest. I’ll begin with a clustering analysis, then perform supervised and unsupervised classification, followed by sensitivity analysis comparing the results of the clustering analysis and both classifications. I will need to address spatial autocorrelation either as part of the clustering analysis or separately. I’ll need to plan sampling by accessibility, so I’ll examine an HJA roads layer as well.

Expected Outcome

I plan to produce maps (or use/improve on already available ones) at a spatial scale relevant to my study so I can identify potential sampling sites. I plan to map biomass across the HJA forest and produce a stratification of factors most closely related to soil CO2 efflux. Depending on resolution of the data and the results of the stratification, I may need to constrain my analysis. I plan to produce a statistical summary of the strata relating and describing the covariates and how much variance is explained by the stratification.

Significance

Carbon sequestration is a highly relevant research area where many unknowns still exist. Given that soil is an enormous C reservoir, small changes in soil C stocks can have huge impacts on the rest of the C cycle. As CO2 is a potent greenhouse gas, more release of C from soil can cause greater warming of our planet and can lead to a positive feedback loop where the warming cycle is amplified. It is in our best interest to have a good understanding of current soil C stocks and fluxes in forested systems so we can hypothesize how they might change under different or future conditions. By using biomass as a proxy for soil CO2 efflux, I will identify locations that are likely to have greater CO2 efflux and I will be able to make informed predictions about which drivers are most significantly correlated to CO2 efflux. I will be able to test these analyses in the future using field sampling techniques.

My level of preparation/proficiency

I have limited experience with Arc-Info (GEOG 560) and I’ve used Modelbuilder a few times. I have no experience with GIS programming in Python or R, but am proficient with coding in R (3 statistics courses and my own data analysis) and am comfortable seeking answers to questions in the R environment. I have no experience image processing.