Ex. 3: Does black stain spread through landscape networks?


For those who have not seen my previous posts, my research involves building a model to simulate the spread of black stain root disease (a disease affecting Douglas-fir trees) in different landscape management scenarios. Each of my landscapes are made up of stands assigned one of three forest management classes. These classes determine the age structure, density, thinning, and harvest of the stands, factors that influence probability of infection.


Are spatial patterns of infection probabilities for black stain root disease related to spatial patterns of forest management practices via the connectivity structure of the network of stands in my landscape?


 I decided to look at how landscape connectivity influenced the spatial relationship between forest management practices and infection probabilities. This approach builds off of a network approach based in graph theory (where each component of the landscape is a “node” with “edges” connecting them) and incorporates concepts from landscape ecology regarding distance-dependent ecological processes and the importance of patch characteristics (e.g., area, habitat quality) in the contribution of patches to the connectivity of the landscape. I used ArcMap, R, and a free software called Conefor (Saura and Torné 2009) to perform my analysis.


 1. Create a mosaic of the landscape

The landscape in my disease spread model is a torus (left and right sides connected, top and bottom are connected). The raster outputs from my model with stand ID numbers and management classes do not account for this and are represented as a square. Thus, in order to fully consider the connectivity of each stand in the landscape, I needed to tile the landscape in a 3 x 3 grid so that each stand at the edge of the stand map would have the correct spatial position relative to its neighbors beyond the original raster boundary. I did this in R by making copies of the stand ID raster and adjusting their extent. In ArcMap, I then assigned the management classes to each of those stands, converting to polygon, using the “Identity” tool with the polygon for management class, and then using the “Join Field” tool so that every stand with the same unique ID number would have the relevant management class assigned. If I had not done this step, then the position of stands at the edge of the raster in the network would have been misrepresented.

2. Calculate infection probability statistics for each stand

I then needed to relate each stand to the probability of infection of trees in that stand (generated by my model simulation and converted to point data in a previous exercise). In ArcMap, I used the “Spatial Join” tool to calculate statistics for infection probabilities in that stand, because each stand contains many trees. Statistics included the point count, median, mean, standard deviation, minimum, maximum, range, and sum.

3. Calculate the connectivity of each stand in the network of similarly managed stands in the landscape

3a. For this step, I used the free software Conefor, which calculates a variety of connectivity indices at the individual patch and overall landscape level. First, I used the Conefor extension for ArcMap to generate the input files for the Conefor analysis. The extension generates a “nodes” file for each feature and a “connection” file, which contains the distances between features a binary description of whether or not a link (“edge”) exists between two features. One can set the maximum distance for two features to be linked or generate a probability of connection based on an exponential decay function (built-in feature of Conefor, which is an incredible application). For my analysis, I performed connectivity analyses that only considered features to be linked if (i) they had the same management class and (ii) there were no more than 10 meters of distance between the stand boundaries. Ten meters is about the upper limit for the maximum likely root contact distance between two Douglas-fir trees.

3b. For each management class, I ran the Conefor analysis to calculate multiple metrics. I focused primarily on:

  • Number of links in the network
  • Network components – Each component is a set of connected patches (stands) that is internally connected but has no connection to any other set of patches.
  • Integral Index of Connectivity (IIC) – Essentially, this index gives each patch (stand) a value in terms of its importance for connectivity in the network based on its habitat attributes (e.g., area, habitat quality) and its topological position within the network. For this index, higher values indicate higher importance for connectivity. This is broken into three non-redundant components that sum to the total IIC:
    • IIC intra – connectivity within a patch
    • IIC flux – area-weighted dispersal flux
    • IIC connector – importance of a patch for connecting other patches in the network) (Saura and Rubino 2010)
  1. Analyze the relationship between connectivity metrics and infection probabilities

I reduced the mosaic to include only feature for each stand, eliminating those at the periphery and keeping those in the core. I confirmed that the values were similar for all of the copies of each stand near the center of the mosaic. I then mapped and plotted different combinations of connectivity and infection probability metrics to analyze the relationship for each management class (Fig. 1, Fig. 2).

Fig. 1. Map of IIC connectivity index and mean infection probability for the extensively managed stands.


I generally found no relationship between infection probability and the various metrics of connectivity. As connectivity increased, infection probabilities did not change for any of the metrics I examined (Fig. 2). I would like to analyze this for a series of landscape simulations in the future to see whether patterns emerge. I could also refine the distance used to generate links between patches to reflect the dispersal distance for the insects that vector the disease.

Fig. 2. Plots of infection probability statistics and connectivity metrics for each of the stands in the landscape. Each point represents one stands in the randomly distributed landscape, with extensively managed stands in red, intensively managed stands in blue, and old-growth stands in green.

CRITIQUE OF THE METHOD – What was useful, what was not?

I had originally planned to use the popular landscape ecology application Fragstats (or the R equivalent “landscapemetrics” package), but I ran into issues. As far as I could tell (though I may be incorrect), these options only use raster data and consider one value at a time. What I needed was for the analysis to consider groups of pixels by both their stand ID and their management class, because stands with the same management class are still managed independently. However, landscapemetrics would consider adjacent stands with the same management class to be all one patch. This meant that I could only calculate metrics for the entire landscape or the entire management class, which did not allow me to look at how each patch’s position relative to similarly or differently managed patches related to its probability of infection. In contrast, Conefor is a great application that allows for calculation of a large number of connectivity metrics at both the patch and landscape level.


Ex 1: Mapping the stain: Using spatial autocorrelation to look at clustering of infection probabilities for black stain root disease

My questions:

I am using a simulation model to analyze spatial patterns of black stain root disease of Douglas-fir at the individual tree, stand, and landscape scales. For exercise 1, I focused on the spatial pattern of probability of infection, asking:

  • What is the spatial pattern of probability of infection for black stain root disease in the forest landscape?
  • How does this spatial pattern differ between landscapes where stands are clustered by management class and landscapes where management classes are randomly distributed?

    Fig 1. Left: Raster of the clustered landscape, where stands are spatially grouped by each of the three forest management classes. Each management class has a different tree density, making the different classes clearly visible as three wedges in the landscape. Right: Raster of the landscape where management classes are randomly assigned to stands with no predetermined spatial clustering. The color of each cell represents the value for infection probability of that cell. White cells in both landscapes are non-tree areas with NA values.

Tool or approach that you used: Spatial autocorrelation analysis, Moran’s I, correlogram (R)

My model calculates probability of infection for each tree based on a variety of tree characteristics, including proximity to infected trees, so I expected to see spatial autocorrelation (when a variable is related to itself in space) with the clustering of high and low values of probability of infection. Because some management practices (i.e., high planting density, clear-cut harvest, thinning, shorter rotation length) have been shown to promote the spread of infection, there is reason to hypothesize that more intensive management strategies – and their spatial patterns in the landscape – may affect the spread of black stain at multiple scales.

I am interested in hotspot analysis to later analyze how the spatial pattern of infection hotspots map against different forest management approaches and forest ownerships. However, as a first step, I needed to show that there is some clustering in infection probabilities (spatial autocorrelation) in my data. I used the “Moran” function in the “raster” package (Hijmans 2019) in R to calculate the global Moran’s I statistic. The Moran’s I statistic ranges from -1 (perfect dispersion, e.g., a checkerboard) to +1 (perfect clustering), with a value of 0 indicating perfect randomness.

Moran’s I = -1

Moran’s I = 0

Moran’s I = 1









I calculated this statistic at multiple lag distances, h, to generate a graph of the values of the Moran’s I statistic across various values of h. You can think of the lag distance of the size of the window of neighbors being considered for each cell in a raster grid. The graph produced by plotting the calculated value of Moran’s I across various lag values is called a “correlogram.”

What did I actually do? A brief description of steps I followed to complete the analysis

1. Imported my raster files, corrected the spatial scale, and re-projected the rasters to fall somewhere over western Oregon.

I am playing with hypothetical landscapes (with the characteristics of real-world landscapes), so the spatial scale (extent, resolution) is relevant but the geographic placement is somewhat arbitrary. I looked at two landscapes: one where management classes are clustered (“clustered” landscape), and one where management classes are randomly distributed (“random”). For each landscape, I used two rasters: probability of infection (continuous values from 0 to 1) and non-tree/tree (binary, 0s and 1s).

2. Masked non-tree cells

Since not all cells in my raster grid contain trees, I set all non-tree cells to NA for my analysis in order to avoid comparing the probability of infection between trees and non-trees. I used the tree rasters to create a mask.
c.tree[ c.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA
c.pi.tree <- mask(c.pi, c.tree) # Combine the prob inf with tree, leaving all others NA
# Repeat with randomly distributed management landscape
r.tree[ r.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA
r.pi.tree <- mask(r.pi, r.tree) # Combine the prob inf with tree, leaving all others NA

Fig 2. Filled and hollow weights matrices.

3. Calculated Global Moran’s I for multiple values of lag distance.

For each lag distance, I created a weights matrix so the Moran function in the raster package would know how to weight each neighbor pixel at a given distance. Then, I let it run, calculating Moran’s I for each lag to create the data points for a correlogram.

I produced two correlograms: one where all cells within a given distance (lag) were given a weight of 1 and another “hollow” weights matrix when only cells at a given distance were given a weight of 1 (see example below).

4. Plotted the global Moran’s I for each landscape and compare.







What did I find? Brief description of results I obtained.

The correlograms show that similar values become less clustered at greater distances, approaching a random distribution by about 50 cell distances. In other words, cells are more similar to the cells around them than they are to more-distant cells. The many peaks and troughs in the correlogram are present because there are gaps between trees because of their regular spacing in plantation management.

In general, the highest values of Moran’s I were similar between the landscape with clustered management landscape and the landscape with randomly distributed management classes. However, the rate of decrease in the value of Moran’s I with increasing lag distance was higher for the random landscape than the clustered landscape. In other words, similar infection probabilities had larger clusters when forest management classes were clustered. For the clustered landscape, there was actually spatial autocorrelation at lag distances of 100 to 150, likely because of the clusters of higher infection probability in the “old growth” management cluster.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Filled” weights matrix.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Hollow” weights matrix.














Critique of the method – what was useful, what was not?

My biggest issue initially was finding a package to perform a hotspot analysis on raster data in R. I found some packages with detailed tutorials (e.g., hotspotr), but some had not been updated recently enough to work in the latest version of R. I could have done this analysis in ArcMap, but I am trying to use open-source software and free applications and improve my programming abilities in R.

The Moran function I eventually used in the raster package worked quickly and effectively, but it does not provide statistics (e.g., p-values) to interpret the significance of the Moran’s I values produced. I also had to make the correlogram by hand with the raster package. Other packages do include additional statistics but are either more complex to use or designed for point data. There are also built-in correlogram functions in packages like spdep or ncf, but they were very slow, potentially taking hours on a 300 x 300 cell raster. That said, it may just be my inexperience that made a clear path difficult to find.


