Background
Once again, I am using the confusion matrix to determine the accuracy of segmented classification based on the ground truth of manual classification. This work piggybacks off of exercise two where I also used a confusion matrix for determining accuracy. In my previous work I critiqued the ability to draw any relevant conclusions based on a sample size of one. Here I have included a sample size of five images which have gone through the entire image processing, segmentation, manual classification and confusion matrix. This allows us to make more relevant comparisons and looks for trends in the data. Additionally, I mentioned the confusion matrix was inaccurate because it was performed on a rectangular region that was simply the extent of the image and not just the leaf. This has been corrected for and now conducts a confusion matrix on the entire leaf and no outside area. Lastly, I have adjusted the segmentation settings to increase accuracy based on previous findings in exercise one and two.
Questions
- How accurate is the segmentation classification?
- Are there commonalities between the false positives and false negatives?
- Are more raster cells being classified as false negatives or false positives when misclassified?
- What is going undetected in segmentation?
- What is most easily detected in segmentation?
Tools and Methods
Many of the steps are repeated from exercise two. I will repeat them here in a completer and more coherent format with the new steps found in the process.
ArcGIS Pro
Begin by navigating to the Training Sample Manager. To get to the Training Sample Manager, go to the Imagery tab and Image Classification group and select the Classification Tools where you can click Training Sample Manager. In the Image Classification window pane go to create new scheme. Right click on New Scheme and Add New Class and name it “img_X_leaf” and supply a value (the image #) and a description if desired. Click ok and select the diseased pixels scheme and then the Freehand tool in the pane. Draw an outline around the entire leaf (Images 1-5). Save this training sample and the classification scheme. Save ArcGIS Pro and exit. Reopen ArcGIS Pro and go to Add data in the Layer group on the Map tab to upload the training sample polygon of the leaf. This process must be repeated for each leaf.
The polygon of the leaf will need to be converted to a raster. Click the Analysis tab and find the Geoprocessing group to Tools once again. Search Polygon to Raster (Conversion Tools) in the Tools pane and select it. Use the polygons layer for Input Features and the window pane options should adjust to accommodate your raster. The only adjustment I made was to the Cellsize, which I changed to 1 to maintain the cell size in my original raster. Select Run in the bottom right corner of the pane. Each of the polygons should now be converted to a rasterized polygon with a unique ObjectID, Value and Count which can be found by going to the Attribute Table for that layer or clicking on a group of pixels. This also must be completed for each of the polygons created for each leaf outline.
Now we should have three layers for each leaf in which we want to export as Tiff files. We have our “leaf” that we just created, our “manually classified” and “segmented” layers. To export these three, you need to go to each layer and right click. Start with the “leaf” image and go to Data and Export Raster and a pane will appear on the right. Zoom into the image so there isn’t much area above or below the leaf. Select Clipping Geometry and click on Current Display Extent. This will export the image as a Tiff file which we will use later. Continue this process with the other two images without changing the extent. Each image should be exported after this step with the same extent resulting in the same number of pixels and overlap between pixels if placed overtop one another. This can be confirmed in R in the follow steps.
R
Open R studio and use the following annotated code below:
required packages for rasters
install.packages(“raster”)
install.packages(“rgdal”)
install.packages(“sp”)
library(raster)
library(rgdal)
library(sp)
rm(list=ls()) ##clears variables
##raster upload
img_10_truth <- raster(“E:/Exercise1_Geog566/MyProject3/img_10_truth.tif”)
img_10_predict <- raster(“E:/Exercise1_Geog566/MyProject3/img_10_predict.tif”)
img_10_leaf <- raster(“E:/Exercise1_Geog566/MyProject3/img_10_leaf_shape.tif”)
##view the raster
img_10_truth ##confirm dimensions, extent, etc. They need to be the same or very close for a confusion matrix
img_10_predict
img_10_leaf
plot(img_10_leaf) ##view the images
plot(img_10_predict)
plot(img_10_truth)
##export data to excel
img_10_leaf_table <- as.data.frame(img_10_leaf, xy=T) ##creates tables
img_10_predict_table <- as.data.frame(img_10_predict, xy=T)
img_10_truth_table <- as.data.frame(img_10_truth, xy=T)
install.packages(“xlsx”)
library(xlsx)
setwd(“E:/Exercise_3/data_tables”)
write.table(img_10_leaf_table, file = “img_10_leaf.csv”, sep = “,”)
write.table(img_10_predict_table, file = “img_10_predict.csv”, sep = “,”)
write.table(img_10_truth_table, file = “img_10_truth.csv”, sep = “,”)
##########################################################################################
Excel
At this point the three Tiff files should have been uploaded to R as raster’s and undergone a quality check by viewing the images and the data using the supplied code above. Next, it should have been converted to a table for each file and then all exported out as csv files. From here, open all three files into excel. Begin with the “Truth” and “Predict” files and select the entire column that is the farthest to the left, the values associated with each cell. Go to the Home tab and the Editing group and click on the drop down for Sort and Filter and select the A to Z. Change everything from whatever value they are, to 1. This can easily be accomplished by using the Find and Replace function in the Editing group. Replace all the NA’s with 0’s and then go to left column which simply lists each pixel number and sort the column from A to Z so its back to the order it was when originally opened into excel. Once this is completed for both the “Predict” and “Truth” files, copy the right most column of values and paste them into the Leaf” excel file. Use the Sort function to sort the right column from A to Z on the leaf values column and allow for it to expand the selection. It should be a list of 0’s and if you scroll down you will eventually begin seeing NA’s in the leaf value column. Delete everything below this point in the “Truth” and “Predict” value columns. Scroll to the top and delete the first four columns that are the numbers, x and y coordinates and the leaf values column. You should be left with two columns that are the “Truth” and “Predict” values with 1’s and 0’s only. Give each column a label (Truth & Predict), save this document as an xlsx file and exit out.
This excel step helped us overcome the issue of having values that weren’t in a 0’s and 1’s format needed for the confusion matrix. Here, 1’s stand for diseased patch and 0’s stand for non-diseased. What we also did was removed all the values that were outside of the leaf area. So, the only values we are working with are those which are in the leaf and not part of the background. The last step is importing the excel file that was just created into R to perform the confusion matrix.
##########################################################################################
##confusion matrix
install.packages(“caret”)
library(caret)
img_10_confusion$Predict <- as.factor(img_10_confusion$Predict) ##convert to factors
img_10_confusion$Truth <- as.factor(img_10_confusion$Truth)
confusionMatrix(img_10_confusion$Predict, img_10_confusion$Truth)
You should now have R output of the confusion matrix and associated statistics.
ArcGIS Pro
For the purpose of visualization, I also changed the colors of the three Tiff files. To do this, I simply right clicked on each of the three layers and selected Symbology. The pane on the right side of the screen appears where you have options to change the Color Scheme. I made the leaf layer green, the segmented regions white and manually classified regions white. The snipping tool was used to extract these images.
Results
Between the five images there were 15,519 pixels classified in the confusion matrix. 14,884 were properly classified as non-diseased. 282 were classified as true disease. 307 were identified as false negatives and 46 were classified as false positives. Overall the segmentation classification had an accuracy of 97.7% with a p-value less than 0.05 indicating statistical significance. The model had a sensitivity of 99.69% and a specificity of 47.88%.
Accuracy between the five images ranged from 94% to 99%, all with significant p-values. The sensitivity for all five individual matrices and the combined matrix was at 99% with no exceptions. Specificity had a fairly large range.
I was happy with the level of accuracy produced throughout all five images and the overall cumulative matrix. The five images ranged a bit in complexity with the number of lesions present, leaf size and shape, size of lesions, etc. Looking at the confusion matrix we can see there were many more false negatives than false positives. This indicates the model is possibly not quite sensitive enough to detect the all regions or certain parts of the disease. The specificity was not consistent between the five images for some reason which is still unclear to me. As mentioned, although the model may have some issues with detection certain patches, the sensitivity indicated throughout all matrices is very high.
Viewing the images below we can see where the false positives appear and where the false negatives occur (Images 1-5). It seems that both the false positives and negatives tend to appear on the margins of the lesion when they at least detected. Additionally, larger patches seem to have a higher rate of detection than those which are small in size. This is especially true in Image 1&2.
Critique
My only critique is the process to get to this step. I was very satisfied with my results but see logistical problems in getting through large amounts of samples. I will attempt to find shortcuts throughout the process my next go around. I will also seek creating a model in ArcGIS Pro for some of the rudimentary tasks done in the processing of the images.
Appendix
Taylor, glad to see that you were able to replicate the analysis and confirm the findings from Ex. 2. In your final project please state some hypotheses about what aspects of leaf relfectance or other factors are indicative of the disease, what spatial patterns you predicted, and then evaluate what you learned from each of the exercises.