Tag Archives: k-nearest neighbor

Point Pattern Analysis of Tree Distribution by Height in the HJ Andrews Forest

1. Given that HJ Andrew Experimental Forest is a 16,000-acre forest with manipulations spanning back to its establishment as a Long-Term Ecological Research (LTER) site in 1948, it is a highly spatially heterogeneous ecosystem. Forest harvest began in the 1950s and resulted in a mosaic of young plantation forest (~30 percent of total forest area) and old growth (~40 percent of forest) (http://andrewsforest.oregonstate.edu/). My objective is to quantify the spatial pattern of trees across the forest and eventually relate that to quantifiable landscape features.

Motivating Questions: How does the spatial pattern of trees vary across the HJ Andrews Forest? Specifically, I’m exploring the relationship between tree height and tree spacing. One specific question of interest is: How does the mean distance between trees in the same height class differ from the mean distance between a single height class of tree and all other trees? This question attempts to address the clustering vs. dispersion of trees by height.

This is an analysis of the spatial distribution of one variable, tree height, so a consideration of the internal processes that may influence the spatial distribution of this variable is necessary.

  • Microclimate caused by the clustering or dispersion of trees could be either an attraction or repulsion process. Microclimate influences relative humidity and exposure to wind, along with many other factors, so clusters of trees would tend to have different microclimate features than more dispersed trees.
  • Population and community dynamics will influence the spatial distribution of trees. The speed at which a colonizer can take over a space and competition between different colonizers influence the distribution. Aboveground and belowground tree growth adn spacing of trees will be influenced by these factors.
  • Source and sink processes in a forest may result from topographical features, like valleys and hillslopes. I expect valleys to be a source (capable of producing a surplus of organisms) sink they will tend to be in areas near streams, so not water limited and in areas that serve as catchments for nutrients, so not nutrient limited.
  • Spatial distribution of tree height certainly is different according to the scale. The spatial pattern looks different at a single tree scale compared with a 50-m scale, compared with a 5-km scale. Some of these differences are revealed by Figures 3, 4 and 5, below.

2. My approach is to use k-nearest neighbor to examine distance between a given tree and proximal trees. I created ten height classes by using kmeans to find ten cluster centers in the tree height data, then used K nearest neighbor to examine the distance between each cluster center and the 30 closest trees.

3. For this analysis, I used a LiDAR dataset of vegetation heights downloaded from the HJ Andrews online data portal (http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI011). The selected data is from the third entry, titled, “Vegetation Height digital elevation model (DEM) from 2011 LiDAR, Upper Blue River Watershed (Oct 27th, 2011 – Nov 1st, 2011). A description of this dataset can be found here: http://andlter.forestry.oregonstate.edu/data/spatialdata/gi01103.htm.

I performed the majority of the analyses in R and used QGIS for data visualization. I used the ‘st_read’ function from the ‘sf’ package in R to read in the stem map shapefile (stem_map.shp). The stem map shapefile includes crown radii (m) and tree heights (m) as well as the point locations of trees.

Because the stem map shapefile essentially provides a census of trees within the HJA Forest, and that is (1) way too much data to deal with at one time, and (2) not useful to perform statistical analyses on since we can just look at the data mapped out and visually discern where trees are located, I decided to use kmeans to group the trees into 10 height classes, resulting in the data space being partitioned into Veronoi cells.

I used the ‘nngeo’ package to examine k-nearest neighbors relations between and within height clusters. I examined the relationship between a single tree (point) and its nearest neighbor of the same height class. I also examined the relationship between a single tree of one height class and its nearest neighbor of any height class to start to elucidate to what extent trees of similar heights cluster or are dispersed spatially.

I performed T-tests on each tree height class to test if there was a significant difference between (1) the mean distance between a tree and the next closest tree within the same height class and (2) the mean distance between a tree and the next closest tree of any height class. I calculated the mean, standard deviation and kurtosis of each height class distance (both within and between height classes).

4. Results

The table below shows the center of each of the ten height class groupings, mean distances between and within tree height classes, standard deviation and kurtosis of those means, and p-values. Results of all t-tests were significant (p<0.01), meaning that there is strong evidence that the within group mean distances are not equal to zero, so there are differences in mean distances between trees of the same height class (within group) and mean distance to the closest trees of any height class. In other words, the distance between a large tree and its nearest neighbor (of any height class) is significantly different than the distance between a large tree and its nearest neighbor within the large tree height class. The same is true of trees within and outside of the small height class, as well as each of the other ten height classes.

Table 1. Results from k-nearest neighbor analysis and t-tests

Tree Class Class Center (Height (m)) Mean Distance to 30 closest trees (m) St Dev Mean Distance Kurtosis Mean Distance Mean Distance to Closest Tree (m) St Dev Mean Mean Distance to Closest Tree Within Group (m) St Dev Group Mean P-value
1 11.8 12.5 4.4 6.4 2.1 2 4 6.3 <0.01
2 15.1 11.7 3.9 3 2.4 1.7 4.9 6.6 <0.01
3 19.9 12.6 3.4 4 3.6 1.3 6.8 6.7 <0.01
4 24.4 13.4 3.1 5.6 4 1.4 6.9 6.5 <0.01
5 29.7 14.6 2.9 3.9 4.6 1.4 8 6.9 <0.01
6 35.5 16.7 3 2.8 5.3 1.6 9.3 7.2 <0.01
7 42 18.6 2.6 4.5 6.1 1.7 10.1 7 <0.01
8 50 19.7 2.5 6.6 6.9 1.8 11.4 7.3 <0.01
9 59 20.5 2.6 3.1 7.5 1.8 13.5 8.7 <0.01
10 70 21.2 2.7 0.7 8.1 1.9 15 11 <0.01


Fig 1. Distribution of all tree heights (m) in HJ Andrews Forest.

 

 

 

Fig 2. Mean distance (m) and standard deviation (m) between trees of each of ten height classes and the next closest 30 trees within that height class. Generally, as trees get larger, mean distance between them is larger.

Fig 3. Distribution of the tallest tree class (70 m tree class) across the HJ Andrews Forest.

 

Fig 4. A representative distribution of all tree height classes on either side of a road in the HJ Andrews Forest, showing the extent of clustering and the extent of dispersion of height classes. Height classes are in ascending order from smallest class (~12m tall; Class 1) to tallest class (70m tall; Class 10).

 


Fig 5. A closer look in the same area at Fig. 4, where small trees are clustered near the road and clustered tightly together, while larger trees are more dispersed.

 

Fig 6. Mean distance between trees of the same height class and other trees of the same height class (blue) and mean distance between trees of one height class and any other tree (red). The overlapping red confidence interval with the blue points suggests that the average distance between small trees is not significantly different than distance between small trees and any trees. The general upward trend suggests that as trees get taller, the distance between them increases and variance slightly decreases.

5. Critique of the method:

The results make sense, but do not provide much more information about the actual distribution of trees (clustering vs. dispersion) than simple maps of point data, so the next step might be to examine tree heights within different management regimes. The current analysis tells me that trees are somewhat clustered by height, and that the mean distance between a tree of one height class to a tree of the same height class is, in most cases, different from the mean distance of a tree of one height class to a tree of any other height class. I’ve examined a map of different management regimes within the HJ Andrews Forest and there are clear areas of old growth, harvested areas, clearly defined plots, etc., so I would expect some of these areas to show tree clustering by height class. The patterns I found using this analysis were not as clear as I was expecting. Using kmeans and nearest neighbor analysis is a great way to start to examine the spatial relationships between and among data, but with such a large and highly varied dataset there can be shortcomings, especially when it comes to drawing any concrete conclusions.

References:

HJ Andrews Online Data Repository: http://andlter.forestry.oregonstate.edu/data/catalog/datacatalog.aspx

Johnson, S.; Lienkaemper, G. 2016. Stream network from 1997 survey and 2008 LiDAR flight, Andrews Experimental Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=HF013 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/66d98881d4eb6bb5dedcbdb60dbebafa.

Spies, T. 2016. LiDAR data (August 2008) for the Andrews Experimental Forest and Willamette National Forest study areas. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI010 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/c47128d6c63dff39ee48604ecc6fabfc.

Spies, T. 2016. LiDAR data (October 2011) for the Upper Blue River Watershed, Willamette National Forest. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=GI011 (10 April 2019) DOI:http://dx.doi.org/10.6073/pasta/8e4f57bafaaad5677977dee51bb3077c.

Spies, T. 2014. Forest metrics derived from the 2008 Lidar point clouds, includes canopy closure, percentile height, and stem mapping for the Andrews Experimental Forest.. Long-Term Ecological Research. Forest Science Data Bank, Corvallis, OR. [Database]. Available: http://andlter.forestry.oregonstate.edu/data/abstract.aspx?dbcode=TV081 (10 April 2019) DOI:http://dx.doi.org/10. 6073/pasta/875e10383e8c8aee3c9a49e0155eef1d.