average nearest neighbor

Question Asked

To predict the invasion potential of a species, it is necessary to understand the spatial pattern of the invasion in relation to landscape scale variables. For exercise 2, I explored how the spatial pattern of invasion by the recently introduced annual grass, Ventenata dubia (ventenata) relates to the spatial pattern of vegetation cover categories throughout the Blue Mountain Ecoregion of eastern Oregon.

Tools and Approaches Used

To unpack this question, I performed a neighborhood analysis to explore how the proportion of different vegetation type cover differ at increasing distances from plots with high versus low ventenata cover.

The neighborhood analysis required several steps performed in ArcGIS and in R:

I split my sample plot layer in ArcGIS into two layers – one containing plots with only high ventenata cover (>50%) and one containing plots with only low cover (<5%).
I buffered each plot by 10m, 50m, 100m, 200m, 400m, and 800m using the “buffer” tool in ArcGIS and then erased each buffer layer by the preceding buffer layer to create “donuts” surrounding each sample points using the “erase” tool in ArcGIS (Fig. 1).
I brought in a vegetation cover categories raster file (Simpson 2013) that overlaps with my study area and used the “tabulate area” tool in ArcGIS to calculate the total cover of each vegetation type (meadow, shrub steppe, juniper, ponderosa pine, Douglas-fir, grand-fir, hardwood forest/ riparian, and subalpine parkland) that fell within each buffer for every point. I repeated this for high and low ventenata points.
Finally, I consolidated the tables in R and created a line graph with the ggplot2 package to plot how the proportion of vegetation type differed by buffer distance from point (Fig. 2). Cover represents percent cover of each vegetation type at each buffer distance. Error bars at each distance represent standard error. VEDU refers to the plant code for Ventenata dubia (ventenata).

I was also curious to how the high and low points differed from random points in the same area. To explore this I:

Created 110 random points that followed the same selection criteria of 1000m proximity to fire perimeter used to select the ventenata sampling points.
Repeated steps 2 through 4 above to graphically represent how vegetation cover differs as a function of distance from these random points in relation to low and high ventenata points (Fig. 3).

Results & Discussion

My analysis revealed that vegetation type differs between high and low ventenata sites and random sites within the study area. The high ventenata plots were located entirely in ponderosa pine and shrub steppe vegetation types, but as distance increased from the plots, the distribution of about half of the vegetation types became more evenly distributed (Fig. 2). Ponderosa covers over 75% of the high ventenata 10m buffer areas with shrub steppe making up the remaining 25%. However, as distance increased, ponderosa cover dropped sharply to under 35% at 400m. Shrub steppe gradually declined throughout the 800m distance, and was surpassed by grand fir and Douglas fir by 800m. Meadows covered about 10% of the 50m buffer but declined to about 5% by the 400m buffer. The remaining vegetation types, juniper, riparian, and subalpine fir, were consistently under 5% cover throughout the buffer analysis.

In the low ventenata sites, shrub steppe vegetation was the most dominant, but the distribution was spread more evenly across the vegetation types than in the high ventenata sites (Fig. 2). Shrub steppe vegetation droped from 45% to 30% from the 10m to the 50m buffer, and then remained relatively constant throughout the remaining buffer distances. Like the high ventenata sites, grand-fir gradually increased in cover throughout, becoming the most dominant vegetation type of the 800m buffer. Unlike the high sites, ponderosa pine made up only about 10% of each buffer. Riparian vegetation was the only cover type that remained 0 throughout all the buffers.

In the random sites, the distributions of vegetation type were steady throughout the 800m, with only small fluctuations in cover with increasing distances (Fig. 3). Shrub steppe vegetation type was the highest at about 30% throughout, followed by juniper, ponderosa pine, and grand fir at about 20% cover.

This analysis demonstrates that ventenata could be dependent on specific vegetation types not only at the sample location, but also in the vicinity surrounding the sample area. This is evident in the high ventenata analysis where ponderosa pine cover remains much higher than the low sites and the random sites throughout the 800m buffered area. This analysis also depicts my sample bias as it demonstrates which community types I was targeting for sampling (shrub steppe and dry forest communities), which may not be representative of the area as a whole (as demonstrated in the random points analysis).

Critique of Method

The neighborhood analysis was a useful way of visualizing how vegetation type changes with distance from high and low ventenata points and may have helped uncover the importance of large areas of ponderosa pine as a driver of invasion; however, the results of the analysis could be a relic of my sampling bias towards shrub steppe and dry forest communities rather than an absolute reflection of community drivers of ventenata. The vegetation layer that I used was also not as accurate or as detailed as I would have liked to capture the nuance of the different shrub steppe and forest community types that I was attempting to differentiate in my sampling. If I were to do this again, I would try to find and use a more accurate potential vegetation layer with details on specific community attributes. Additionally, the inclusion of error bars was not possible using the “multiple ring buffer” tool in ArcGIS, so, I instead had to make each buffer distance as a separate layer and erase each individually to maintain the variation in the data. I like the idea of the random points as a sort of randomization test; however, more randomizations would make this a more robust test. With more time and more knowledge of coding in ArcGIS/ python, I would attempt a more robust randomization test.

Simpson, M. 2013. Developer of the forest vegetation zone map. Ecologist, Central Oregon Area Ecology and Forest Health Program. USDA Forest Service, Pacific Northwest Region, Bend, Oregon, USA

For Exercise 1, I wanted to analyze the spatial pattern of western hemlock dwarf mistletoe infections in live western hemlocks on my 2.2 ha reference stand (Wolf Rock). This was without considering any attributes of the western hemlock trees themselves. Simply, what was the spatial pattern of infection?

To answer this I used the “Average Nearest Neighbor” tool in the Spatial Statistics toolbox in ArcMap. This tool calculates a z-score and a p-value from that z distribution. This is a commonly used method in dwarf mistletoe literature for assessing the clustering of infection centers. Also, the equations for this tool assume that points are free to locate wherever in space and that there are no barriers to spread.

ArcMap makes running these analyses very simple so I created a selection of infected trees (red dots), created a new feature, and then ran the tool. The p-value from my test was 0.097 and my Nearest Neighbor Index was 0.970, indicating that the spatial pattern of the infections are somewhat clustered with an alpha of 0.10.

Average Nearest Neighbor is a good test for analyzing whether or not a set of coordinates are clustered. The degree of clustering of may be harder to interpret as a lower p-value may not necessarily mean points are more clustered. Also I was unable to see where my clusters are, and if my intuitions match the analysis (see map). One other important consideration is the study area. Changes in analysis area can drastically change the result of your clustering analysis (i.e. larger study areas may make data look more clustered). Lastly, there was no option for edge correction. This may have skewed some of the clustering results along the edge of my study site and 2.2 ha is pretty small to be subsampled without losing a lot of my data.

Prologue

After confirming that my infections were clustered, I wanted to see if the pattern I saw in my map, was actually on the ground. I wanted to know, where are infected trees clustered with infected trees and where are uninfected trees clustered with uninfected trees? Again, this was without considering any attributes of the western hemlock trees themselves.

I used the “Optimized Hot Spot Analysis” tool in the Mapping Clusters toolbox to analyze the incidence of infection data (0 = absence, and 1 = presence). The Optimized Hot Spot Analysis tool can automatically aggregate incidence data that are normally not appropriate for hot spot analysis. It also calculates several other metrics for me that made analysis easy. I could take these automatically calculated metrics and alter them in a regular hot spot analysis if needed.

This map displays clustering that matched up closely with my intuitions from Map 1. On the left, the blue values show a cluster of uninfected trees that are closely clustered with other uninfected trees. The larger swath on the right show a cluster of trees that are closely clustered with other infected trees. In the middle a mix of uninfected trees and infected trees are mixed without displaying any significant clustering. Lastly, small clusters in the top left and bottom left of infected trees were identified. These clusters may be edge of larger clusters outside my stand, or lightly infected trees that are starting a new infection center. These results will be extremely valuable in informing my steps for Exercise 2 because I can assess the conditions of both patches and determine differences between the two. I can also determine if distance to the refugia impact the clustering of infection because it appears the infected cluster is closer to the fire refugia.

The hot spot analysis was extremely useful for analyzing and displaying the information I needed about the clustering and was very useful for building off of the Average Nearest Neighbor analysis.

My data set also included a severity rating for dwarf mistletoe infected western hemlocks in my study site. I ran a similar hot spot analysis to above to determine if there were any similarities with how severity played out in the stand compared to solely incidence data. My data ranged from 0 – 5, 0 indicating uninfected trees and 5 indicating most heavily infected. These are classified data, not continuous but still appropriate for the optimized hot spot analysis. Western hemlock dwarf mistletoe forms infection centers, starting from a residual infected western hemlock that survived some disturbance. From there the infection spreads outwards. Another facet of infection centers is that the most heavily infected trees are almost always aggregated in the center of the infection center and infection severity decreases as you move towards the outside of the infection center. This is intuitive when you think about infected trees in terms of the time they’ve been exposed to a dwarf mistletoe seed rain: the trees in the center of the infection center likely have been exposed to infectious seed the longest. These trees can be rated using a severity rating system that essentially determines the proportion of tree crown infected. This is calculated in a way that gives a rating that is easily interpretable, in this case, 0-5.

This third map tells me about how severity is aggregated in the stand. I can see that the wide swath in the middle of the stand, associated with the fire refugia, has the largest aggregation of severely infected trees. This is what I expected in the stand because the trees in the fire refugia survived the fire and provide an infectious seed source for the post-fire regeneration. Also, on the edges of this high severity cluster, are lower severity values indicating the expected pattern of infection centers are playing out. The west side of the stand shows a large clustering of low severity ratings. We can see that the high density of uninfected trees, falls into our cold spot of low or no severity. Interestingly, the hot spot of trees found previously in the southwest corner, is actually a cluster of low severity trees. This may be a new infection center forming or an exterior edge of another infection center outside the plot. Lastly, the two pockets of low severity on the east side of the stand are more distinct when considering their severity.

This second application of hot spot analysis tells another story about my data and how dwarf mistletoe is patterned spatially. The non-significant swath in the center of my stand using the incidence data turns out to be a significant clustering of highly infected trees among other new observations.

GEOG 566

Advanced spatial statistics and GIScience

Tag Archives: average nearest neighbor

Does the spatial arrangement of vegetation cover influence ventenata invasion?

Exercise 1: What is the spatial pattern of western hemlock dwarf mistletoe at the Wolf Rock reference stand?