Supervised Classification Archives

Bryan Begay

GEOG566

05/31/2019

Question asked?

How does forest aesthetics vary depending on forest structure as a result of active management that retains vegetation and landforms?

In order to answer my question I would look at two stands in the McDonald-Dunn forest to do some analysis on how their forest structure is related to forest aesthetics. The first stand is Saddleback which had been logged in 1999 and shows signs of management. The second stand was identified near Baker Creek, and is 1 mile west of Saddleback. Baker Creek was chose for its riparian characteristics, as well as having no signs of management activates.

A description of the data set.

The LiDAR data that I used with my initial analysis was 2008 DOGAMI LiDAR flown over the McDonald-Dunn forest. The RMSE for this data set was 1.5cm. The DEM data set I used was from 2009 has a RMSE of 0.3 m. A Canopy Height Model (CHM) was made in RStudio lidR package that used a digital surface model with a 1 meter resolution. The CHM was used to create an individual tree segmentation, where segmented trees were then converted to point data.

https://gimbalmonkey.github.io/SaddlebackPC/index.html

A link to a visualization of the raw point cloud that is georeferenced to its terrain.

Hypotheses: predictions of patterns and processes you looked for.

I suspected that initially the Baker Creek stand would have higher forest aesthetic that would reflect in the stand’s unmanaged vegetation structure.

Rational

Since the Saddleback had been managed and cut I figured the more natural structure of the riparian stand would have generally higher forest aesthetics than a stand that has been altered by anthropogenic factors. Some processes that I hypothesized that relates to forest aesthetics to these stands was the spatial point pattern of trees could be related to forest aesthetics. Insert forest aesthetic link:

Approaches: analysis approaches you used.

Exercise 1: Ripley’s K Analysis point pattern analysis

The steps taken to create a point pattern analysis was to identify individual trees and convert the trees into point data. The RStudio lidR package was used to create a Canopy Height Model and then an Individual tree segmentation. Rasters and shapefiles were create to export the data so I could then use the tree polygons to identify tree points. The spatstat package was used in RStudio as well to perform a Ripley’s K analysis on the point data.

Figure 1. Individual Tree Segmentation using watershed algorithm on Saddleback stand.

Exercise 2: Geographically weighted Regression

The steps taken to do the geographically weighted regression included using the polyongs created from the individual tree segmentation to delineate tree centers. When tree points were created from the centroids of the polygons, which would be inputs for the GWR in ArcMap. A density raster and CHM raster had their data extracted to the point data so that density and tree height could be the variables used in the regression. Tree height was the explanatory variable and density was the independent variable.

Figure 2. Polygon output from Individual tree segmentation using the lidR package. The Watershed algorithm was the means of segmentation, and points were created from polygon centroids in ArcMap.

Exercise 3: Supervised Classification

This analysis involved creating a supervised classification by using training data from NAIP imagery and a maximum likelihood classification algorithm. It involved using the NIR band and creating a false color image that would show the difference spectral reflectance values from conifers and deciduous trees. I used a histogram stretch to visualize the imagery better and spent time gathering quality training data. I then created a confusion matrix by using accuracy points on the training data. I then clipped the thematic map outputs with my individual tree segmentation polygons to show how each tree had their pixels assigned.

Results

The Ripley’s K analysis in ArcMap showed me that Saddleback stand’s trees are dispersed, and the Baker Creek stand’s trees were spatially clustered. GWR outputs told me that the model in the Saddle back stand showed me a map output where tree heights and density were positively related. The adjusted R2 was 0.69 and gave me a good output that showed me the tallest and densest trees were on the edges of Saddleback stand. The Baker Creek stand’s model performed poorly on the point data with an adjusted R2 of 0.5. The outputs only showed relationships could only be modeled on the upper left of the stand. The classified image worked well on Saddleback stand due to less distortion in the NAIP imagery on that stand, and the Baker Creek stand’s classification was not useful since it had significant distortion in the NAIP imagery.

Exercise 1:

Figure 3. ArcMap Ripley’s K function output for Saddleback stand assessing tree points.

Exercise 2:

Figure 4. Geographically weighted regression of Baker Creek and Saddleback stand. The Hotter colors indicate positive relationships between tree density and tree height.

Exercise 3.

Figure 5. Supervised image classification using a maximum likelihood algorithm on Saddleback stand.

What did you learn from your results? How are these results important to science? to resource managers?

I learned that Ripley’s K outputs can differ depending on what packages used. R-studio Ripley’s K outputs told me that both my stands had clustered tree patterning. ArcMap outputs that made more sense told me that my Saddleback stand was actually dispersed. Outputs can be variable if inputs are not explicitly understood or modeled with enough care. I also learned that trying to model a very heterogeneous riparian stand is more difficult because of the variability. This is important for researchers who are interest in riparian areas like Baker Creek since they might need to have more variables to adequately model those stands.

Your learning: what did you learn about software?

I became very familiar with processing and modelling with LiDAR point clouds. I also became familiar with Modelbuilder and learned how to use packages in R like Spatstat. I also found a new method for making a confusion matrix in ArcMap.

What did you learn about statistics or other techniques?

I learned how to do point pattern analysis with Ripley’s K on tree points. This was done in R and in Arc. In Arc using the spatial statistics tool was also something I used and still plan to use. When using GWR I understood what it does, understood the outputs, and learned to properly interpret the results. I also became more concerned with issues of scale and networks that might affect my areas of interest.

Question that I asked?

Could I identify functional tree species with supervised image-classification in my stands?

The reason I asked this question was so that If I had to do a geographically weighted regression again it would be valuable to have deciduous or coniferous tree species in my point data for an added variable.

Name of the tool or approach that you used.

The main tool that I used for image classification was the maximum-likelihood classification in ArcMap. I also used the create accuracy assessment points to help create a confusion matrix in excel.

Brief description of steps you followed to complete the analysis.

I downloaded 2016 NAIP imagery in my area of interest and used the high resolution imagery to create a false colored image with the bands being arranged as NIR, Red, and Green. To help delineate broad-leaf vegetation from coniferous vegetation, I applied a histogram equalize stretch that enhanced my ability to identity conifers in the landscape. From there I created a maximum likelihood classification by drawing training data polygons on the false color imagery, which involved me using the Esri digital imagery base map as a reference image.

Once the image classification was complete, I used the create accuracy points on my stand and then extracted the raster values from the thematic map output to those points to create a confusion matrix in Excel. I clipped the thematic map raster to the watershed polygons I made when I did an individual tree segmentation to show what pixel classifications were assigned in my tree tops.

Brief description of results you obtained.

The thematic map output was 83% accurate with conifer and developed land covers performing the worst in the model. The developed land cover is generally difficult to model in a landscape, and the variability in urban spectral reflectance leads to errors in modeling. The conifer land cover performed more poorly due to my trouble achieving accurate training data with the imagery resolution, and also with the model having trouble delineating conifers from grass and deciduous vegetation. Errors of commission on my part (65% accuracy), and errors of omission (75% accuracy) lead to the lower accuracy of the conifer land cover (Table 1). Despite these errors, the thematic map output performed well, and the land cover pixels in my stands showed that conifer trees were accurately assigned in the Saddleback stand (Figure 2). For the baker creek stand the large amount of shadows, sun glare on canopies, and classification cut off, lead to a poor classification of that stand.

Figure 1. The land cover thematic map for the entire NAIP image. The cyan blue color indicates the locations of Saddleback and Baker Creek stands.

Figure 2.The land cover classification output for the Saddleback stand.

Figure 3. The land cover classified output for Baker Creek Stand. Note that the NAIP imagery that was classified did not extend to cover the entire stand. The tree crown polygons were laid below the output to show where the land cover cuts off.

Table 1. Confusion matrix for the thematic map output.

Critique of the method – what was useful, what was not?

Some critiques about this process was that it was time consuming to create training data detailed enough to capture the variation in the scene for my desired accuracy. Sources of errors in the thematic map include shadows, resolution and variable spectral response signatures in the remotely sensed vegetation. Shadows occluded trees that would otherwise stand out, and distorted the classification enough for me to have to add in a land cover classification for shadows to mask them out of the scene. The issue of resolution just means that NAIP imagery was not detailed enough for the applications I asked. Imagery taken from unmanned aerial drones may be a potential avenue for acquiring a more higher resolution data set. The confusion matrix highlights this issue, with an omission error of 65% for conifers and 75% commission error. It was difficult to determine conifer trees accurately in the training data from the variability of the spectral reflectance and the blurred crowns from the 1 meter resolution.

Since I only did a classification, I didn’t attempt to classify tree functional species to my tree polygons. The process that comes to mind on how to do that is to visibly determine which classification color is more pronounced in a tree top, and then placing that species in the point data as an attribute. This process would be highly time consuming and developing a methodology to streamline the classification of functional tree species to my tree points would be potential future work.Overall, the thematic map outputs are useful for areas like the Saddleback stand that have less shadows and distortion. The map is less useful for areas with high distortion like my Baker Creek stand.

GEOG 566

Advanced spatial statistics and GIScience

Tag Archives: Supervised Classification

Using LiDAR data to assess forest aesthetics in McDonald-Dunn Forest

Supervised Image classification on forested stands