Author Archives: batestay

Final project synthesis of parts 1, 2, and 3 on classification of Blackleg on turnip

Background

The objective of this research was to find a way to attribute proper classification of diseased versus non-diseased regions on a turnip leaf. To do this I assessed the classification accuracy of the unsupervised machine learning classification algorithm, segmentation, with manual classification. Additionally, I hoped to reveal some spatial characteristics about the lesions on the leaf for future work.

Dataset

The cumulative data set is roughly 500 turnip leaves, harvested here in the Willamette valley during the months of February to April of 2019. For the sake of this project I have selected 5 images in which I have worked with to complete the basis of this analysis. The spatial resolution is just under 1 mm and images are in raster format of single turnip leaves with five bands (blue, green, red, far-red, NIR). For this analysis only, the red band was used. The camera being used is a Micasense Red-edge which has a 16-bit radiometric resolution.

Questions

These were not all questions that I had anticipated answering at the beginning of the term, but have developed over the course of the term as I ran into issues or found intriguing results.

• Can diseased patches be successfully identified and isolated by computer image classification and through manual classification?
If yes;
o can the diseased patches based on image classification and manually classified diseased patches be characterized?
o How accurate is the segmentation classification?
o Are more raster cells being classified as false negatives or false positives when misclassified?
o What is going undetected in segmentation?

Hypothesis

My original hypothesize was that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves.

Approaches and results

I began in ArcGIS Pro by using the: segmentation -> mask -> clip -> raster to polygon -> merge -> polygon to raster. This yielded a layer which included only the computer generated segmented diseased patches which were exported as Tiff files. I followed a similar process with my manually classified diseased patches but used the ‘training sample manager’ instead of segmentation. Both these Tiff files were uploaded to Fragstats where I did patch analysis between the two to characterize the diseased patches. I found this not entirely helpful, but if I dug deeper into patch analysis I could be provided with some worthwhile information about the diseased lesion characteristics.

After characterizing the patches, I performed an error analysis using a confusion matrix in R. The two Tiff files were uploaded to R and converted to rasters to check quality and then to tables. The tables were imported into excel and values were adjusted to 1 for diseased patches and 0’s for non-diseased. Anything outside the area of the leaf was deleted. The tables were imported back to R and a confusion matrix comparing the computer classified segmentation versus the manually classified patches was conducted. All the classification matrices were combined for the overall accuracy (Table 1). This aspect was a bit time consuming as was the original image processing but was a very useful analysis.

One other approach I used to help with visualization was showing the leaf with false positives and false negatives included (Image 1). This was necessary to help explain results more clearly to others and show results outside of the statistics.

Significance

I have created a method for getting from image acquisition all the way to statistical results. This work is first off significant to my thesis research. This will be something I refine to help streamline but is a very good starting point for me. Second, I got statistically significant results based on p-value and a high accuracy. This will be helpful to those in agronomy or crop consulting which are using remote sensing for disease detection. At this point I don’t think it is practical for application of agronomist but with further work I hope to make this methodology an available resource for other researchers to work off of and for farmers to potentially try and utilize.

Hypothesis revised

After receiving these results and considering my next steps, my hypothesis has been refined. I now hypothesize that segmentation classification has an equal ability to classify diseased patches as manual classification because of the spectral values associated with diseased pixels.

New research questions

This research is far from complete and has led to a new set of questions to be answered.

• Is percent classification of diseased pixels the best metric for how well the classification method works?
• What is the accuracy of a support vector machine versus manually classified pixels?
• Is this classification translatable to other brassica species such as canola, radish, kale, etc.?

Future research

To answer some of my new questions I believe it is critical I learn how to create band composites between the five available bands. Currently, my issue is the bands do not overlap. My approach for solving this includes the use of georeferenced points/pixels on each image to ensure they overlap. I think if I use the 4 corners of the image I should get the overlap I want. By creating band composites, I can begin using vegetative indices like NDVI to help more accurately distinguish pixels than can be done with one band alone.

Another concern I have is despite classifying pixels accurately I am weary about this being an all determining method of classification. I have instances in my current classification where only 2 or 3 of the 6 or so lesions are actually having pixels inside its perimeter classified as diseased. I think a more accurate or a helpful assessment would quantify the percent of lesions that are captured by the segmentation process out of the total number of lesions on the leaf.

Using an unsupervised classification scheme for this work was good to lay the groundwork. In future work I would like to move to supervised machine learning classification schemes. My reason being; when attempting to translate this algorithm used on turnip classification to others such as canola, radish, etc. I believe it will have a higher level of accuracy and consider more spatial characteristics. Currently, my classification scheme only uses the spectral values of the red band where I would like to implement variables such as lesion size, proximity from margin, proximity to other lesions, shape, NDVI values, etc. This works into my last question concerning the ability of the classification to be applied to broader ranges of plants.

Technical and software limitations

Some of my limitations are apparent in above sections because they led to my future research questions. One of my greatest limitations was my ability to effectively and efficiently use ArcGIS Pro as well as code in R. This led to longer measures of time conducting image processing which may have been streamlined if I was better in ArcGIS Pro. The major roadblocks I hit in ArcGIS Pro were related to the overlap of different bands that I mentioned earlier in addition to the use of the support vector machine. I have plans to effectively solve the band composite issue, but am still unsure about the support vector machine function in ArcGIS Pro. I believe the way it was setup is not designed for collecting training data for many images but collecting this data from one image. My alternative for the SVM is to try conducting the analysis in R.

In R I struggled in my ability to code. This was a major strain on my time and required the assistance of others to help troubleshoot on many occasions. This can be overcome with more practice and is not so much a limitation of the software, but the user.

My learning

I learned the most in ArcGIS Pro I would say. The image processing took lots of time and many steps which exposed me to a whole set of tools and analysis I was previously unaware of. I don’t think my skills in R increased much but I now have the code and steps for the analysis I would like to complete on my future thesis work, the confusion matrix. Another program I gained some experience in was Fragstats. It was useful for characterizing the diseased patches and may be an option for some of my future work when looking at the size, shape distance, etc. of these diseased patches.

Appendix

Table 1. Confusion matrix for pixel classification of five leaves.

Image 1. Example of one leaf analysis where the green cells are manually classified non-diseased pixels, white pixels inside the leaf are segmentation classified diseased pixels and block pixels are the manually classified diseased pixels.

Blackleg classification by segmentation error analysis and visualization

Background

Once again, I am using the confusion matrix to determine the accuracy of segmented classification based on the ground truth of manual classification. This work piggybacks off of exercise two where I also used a confusion matrix for determining accuracy. In my previous work I critiqued the ability to draw any relevant conclusions based on a sample size of one. Here I have included a sample size of five images which have gone through the entire image processing, segmentation, manual classification and confusion matrix. This allows us to make more relevant comparisons and looks for trends in the data. Additionally, I mentioned the confusion matrix was inaccurate because it was performed on a rectangular region that was simply the extent of the image and not just the leaf. This has been corrected for and now conducts a confusion matrix on the entire leaf and no outside area. Lastly, I have adjusted the segmentation settings to increase accuracy based on previous findings in exercise one and two.

Questions

How accurate is the segmentation classification?
Are there commonalities between the false positives and false negatives?
Are more raster cells being classified as false negatives or false positives when misclassified?
What is going undetected in segmentation?
What is most easily detected in segmentation?

Tools and Methods

Many of the steps are repeated from exercise two. I will repeat them here in a completer and more coherent format with the new steps found in the process.

ArcGIS Pro

Begin by navigating to the Training Sample Manager. To get to the Training Sample Manager, go to the Imagery tab and Image Classification group and select the Classification Tools where you can click Training Sample Manager. In the Image Classification window pane go to create new scheme. Right click on New Scheme and Add New Class and name it “img_X_leaf” and supply a value (the image #) and a description if desired. Click ok and select the diseased pixels scheme and then the Freehand tool in the pane. Draw an outline around the entire leaf (Images 1-5). Save this training sample and the classification scheme. Save ArcGIS Pro and exit. Reopen ArcGIS Pro and go to Add data in the Layer group on the Map tab to upload the training sample polygon of the leaf. This process must be repeated for each leaf.

The polygon of the leaf will need to be converted to a raster. Click the Analysis tab and find the Geoprocessing group to Tools once again. Search Polygon to Raster (Conversion Tools) in the Tools pane and select it. Use the polygons layer for Input Features and the window pane options should adjust to accommodate your raster. The only adjustment I made was to the Cellsize, which I changed to 1 to maintain the cell size in my original raster. Select Run in the bottom right corner of the pane. Each of the polygons should now be converted to a rasterized polygon with a unique ObjectID, Value and Count which can be found by going to the Attribute Table for that layer or clicking on a group of pixels. This also must be completed for each of the polygons created for each leaf outline.

Now we should have three layers for each leaf in which we want to export as Tiff files. We have our “leaf” that we just created, our “manually classified” and “segmented” layers. To export these three, you need to go to each layer and right click. Start with the “leaf” image and go to Data and Export Raster and a pane will appear on the right. Zoom into the image so there isn’t much area above or below the leaf. Select Clipping Geometry and click on Current Display Extent. This will export the image as a Tiff file which we will use later. Continue this process with the other two images without changing the extent. Each image should be exported after this step with the same extent resulting in the same number of pixels and overlap between pixels if placed overtop one another. This can be confirmed in R in the follow steps.

Open R studio and use the following annotated code below:

required packages for rasters
install.packages(“raster”)
install.packages(“rgdal”)
install.packages(“sp”)
library(raster)
library(rgdal)
library(sp)
rm(list=ls()) ##clears variables
##raster upload
img_10_truth <- raster(“E:/Exercise1_Geog566/MyProject3/img_10_truth.tif”)
img_10_predict <- raster(“E:/Exercise1_Geog566/MyProject3/img_10_predict.tif”)
img_10_leaf <- raster(“E:/Exercise1_Geog566/MyProject3/img_10_leaf_shape.tif”)
##view the raster
img_10_truth ##confirm dimensions, extent, etc. They need to be the same or very close for a confusion matrix
img_10_predict
img_10_leaf
plot(img_10_leaf) ##view the images
plot(img_10_predict)
plot(img_10_truth)
##export data to excel
img_10_leaf_table <- as.data.frame(img_10_leaf, xy=T) ##creates tables
img_10_predict_table <- as.data.frame(img_10_predict, xy=T)
img_10_truth_table <- as.data.frame(img_10_truth, xy=T)
install.packages(“xlsx”)
library(xlsx)
setwd(“E:/Exercise_3/data_tables”)
write.table(img_10_leaf_table, file = “img_10_leaf.csv”, sep = “,”)
write.table(img_10_predict_table, file = “img_10_predict.csv”, sep = “,”)
write.table(img_10_truth_table, file = “img_10_truth.csv”, sep = “,”)

##########################################################################################

Excel

At this point the three Tiff files should have been uploaded to R as raster’s and undergone a quality check by viewing the images and the data using the supplied code above. Next, it should have been converted to a table for each file and then all exported out as csv files. From here, open all three files into excel. Begin with the “Truth” and “Predict” files and select the entire column that is the farthest to the left, the values associated with each cell. Go to the Home tab and the Editing group and click on the drop down for Sort and Filter and select the A to Z. Change everything from whatever value they are, to 1. This can easily be accomplished by using the Find and Replace function in the Editing group. Replace all the NA’s with 0’s and then go to left column which simply lists each pixel number and sort the column from A to Z so its back to the order it was when originally opened into excel. Once this is completed for both the “Predict” and “Truth” files, copy the right most column of values and paste them into the Leaf” excel file. Use the Sort function to sort the right column from A to Z on the leaf values column and allow for it to expand the selection. It should be a list of 0’s and if you scroll down you will eventually begin seeing NA’s in the leaf value column. Delete everything below this point in the “Truth” and “Predict” value columns. Scroll to the top and delete the first four columns that are the numbers, x and y coordinates and the leaf values column. You should be left with two columns that are the “Truth” and “Predict” values with 1’s and 0’s only. Give each column a label (Truth & Predict), save this document as an xlsx file and exit out.

This excel step helped us overcome the issue of having values that weren’t in a 0’s and 1’s format needed for the confusion matrix. Here, 1’s stand for diseased patch and 0’s stand for non-diseased. What we also did was removed all the values that were outside of the leaf area. So, the only values we are working with are those which are in the leaf and not part of the background. The last step is importing the excel file that was just created into R to perform the confusion matrix.

##########################################################################################

##confusion matrix
install.packages(“caret”)
library(caret)
img_10_confusion$Predict <- as.factor(img_10_confusion$Predict) ##convert to factors
img_10_confusion$Truth <- as.factor(img_10_confusion$Truth)
confusionMatrix(img_10_confusion$Predict, img_10_confusion$Truth)

You should now have R output of the confusion matrix and associated statistics.

ArcGIS Pro

For the purpose of visualization, I also changed the colors of the three Tiff files. To do this, I simply right clicked on each of the three layers and selected Symbology. The pane on the right side of the screen appears where you have options to change the Color Scheme. I made the leaf layer green, the segmented regions white and manually classified regions white. The snipping tool was used to extract these images.

Results

Between the five images there were 15,519 pixels classified in the confusion matrix. 14,884 were properly classified as non-diseased. 282 were classified as true disease. 307 were identified as false negatives and 46 were classified as false positives. Overall the segmentation classification had an accuracy of 97.7% with a p-value less than 0.05 indicating statistical significance. The model had a sensitivity of 99.69% and a specificity of 47.88%.

Accuracy between the five images ranged from 94% to 99%, all with significant p-values. The sensitivity for all five individual matrices and the combined matrix was at 99% with no exceptions. Specificity had a fairly large range.

I was happy with the level of accuracy produced throughout all five images and the overall cumulative matrix. The five images ranged a bit in complexity with the number of lesions present, leaf size and shape, size of lesions, etc. Looking at the confusion matrix we can see there were many more false negatives than false positives. This indicates the model is possibly not quite sensitive enough to detect the all regions or certain parts of the disease. The specificity was not consistent between the five images for some reason which is still unclear to me. As mentioned, although the model may have some issues with detection certain patches, the sensitivity indicated throughout all matrices is very high.

Viewing the images below we can see where the false positives appear and where the false negatives occur (Images 1-5). It seems that both the false positives and negatives tend to appear on the margins of the lesion when they at least detected. Additionally, larger patches seem to have a higher rate of detection than those which are small in size. This is especially true in Image 1&2.

Critique

My only critique is the process to get to this step. I was very satisfied with my results but see logistical problems in getting through large amounts of samples. I will attempt to find shortcuts throughout the process my next go around. I will also seek creating a model in ArcGIS Pro for some of the rudimentary tasks done in the processing of the images.

Appendix

Table 1. Confusion matrix of all five samples combined. Values are number of pixels classified out of 15,519 total.

Figure 1. Classification error for the combined matrix. The values are the number of pixels classified out of 15,519 total.

Image 1. Classified leaf is green with black manually classified diseased patches and white overlapping segmented pixels. R output is also included for the confusion matrix.

Image 2. Classified leaf is green with black manually classified diseased patches and white overlapping segmented pixels. R output is also included for the confusion matrix.

Image 3. Classified leaf is green with black manually classified diseased patches and white overlapping segmented pixels. R output is also included for the confusion matrix.

Image 4. Classified leaf is green with black manually classified diseased patches and white overlapping segmented pixels. R output is also included for the confusion matrix.

Image 5. Classified leaf is green with black manually classified diseased patches and white overlapping segmented pixels. R output is also included for the confusion matrix.

Determining accuracy of segmentation versus ground truth disease patches of Blackleg on Turnip

Background
My research involves the classification of a leaf spots on turnip derived from the pathogen blackleg. I had hypothesized that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves. Here I focus on the accuracy of previously determined disease patches through segmentation and ground truth classification. To do this a confusion matrix is used and allows for detection of true positives, true negatives, false positives and false negatives. All of the image processing took place in ArcGIS Pro. The next step involved the accuracy assessment which was conducted in R.

Questions
How accurate is the computer image classification?
• Can the accuracy be quantified?
• How can the image classification error be visually represented?

Tools and Methodology
I began in ArcGIS Pro to help visually represent the false negative and false positive regions of the disease patches through the segmentation process of classification. To start I turned on both layers where you can see the overlap between the two classification methods in Image 1.

I then went to the symbology of the segmented image and changed the color to white. I placed this layer on top, so it covered all the raster cells of the manually classified patches leaving only the false negatives, seen in Image 2. Next, I went back into the symbology, but for the manually classified image and changed the color scheme to white. I moved this layer to the top and changed the segmented image color scheme back to unique values. I was left with Image 3 showing the false positives. This was an easy way to visualize these disease patches and how well the classification method was working.

Next, I exported a Tiff file of both the manually classified patches and the segmented patches. To ensure each cell between the two layers lined up in R I had to make sure the extent was the same for both layers when I exported. To do this I right clicked on the layer and went down to data and selected export raster. A pane appears in right side of ArcGIS Pro where I hit the drop-down arrow for clipping geometry and selected current display. I did this with both layers and had one Tiff file for computer classified image through segmentation and one for the manually classified disease patches.

Using the raster, rgdal and sp packages I was able to upload my two Tiff files in raster format to R. I gave the two files each a name and used the plot function to view the two images. I noticed they both had values associated with each patch which were on a gradient scale. To correct for this, I converted my two raster layers in R to tables. This provided a coordinate for each cell and the value associated with it. In the image segmentation raster table I had 0 to 2 and the manually classified image I had 1 to 6. For all the white space I was given ‘N/A’, which was another issue. I used the xlsx package to export my data tables to excel files. I opened the two files in excel and used the sort smallest to largest function. From here I was able to use the replace function and change all the ‘N/A’ values to 0’s and all the values associated with pixels to 1’s. The values were arbitrary associated with the pixels and I needed the raster in 1’s and 0’s format. After doing this with both excel sheets I copy and pasted the two side by side and deleted all the associated coordinates. These were also unnecessary because each pixel from the same coordinate between the layers were in the same row. I saved this excel file and uploaded it into R. I downloaded the caret package and performed a confusion matrix which can be seen below in Table 1.

Results
The visual representation for my false positive and false negative results can be seen below in Image 2 & 3 with Image 1 for comparison. You can see the false negatives for disease covers a much larger area than does the false positives. This may imply that the segmentation is limited in its assessment of diseased area. What it tends to miss is the margins of the disease but does a fair job of predicting the center of the disease where it likely originated and is most severe. To correct this, setting a larger threshold may allow for less severe regions of the disease to be classified. Because the segmentation is based on pixel reflectance value at the red band, this would mean the threshold value needs to be slightly lowered.

Additionally, an entire patch of disease was missed which can be seen in the right corner of Image 1. Currently, the classification system is set to only create segmented patches of 10 or more pixels. This patch is 9 pixels and therefore just missed the cutoff. Even though it was just shy of this requirement, we are unsure if the segmentation would have detected a difference in this diseased patch or if it was also out of the threshold for classification. If it is common for disease patches to be this small, it may be an indicator to lower the value to 5 or 6 for what it is allowed in the segmentation blocking.

There are multiple steps in the processing of the image where different routes could have been taken and potentially increased classification accuracy. The objective of this classification method is to have as little percent error as possible or at least determine if you’d rather have more false positive than false negatives, vice versa or equal. Here we have greater percent of cells which are false negatives and modest assessment of diseased pixels when simply visualizing the images.

To help quantify these images and determine how accurate the model was, a confusion matrix was used in R, seen in Table 1. The segmented classification correctly identified non-diseased regions very well and did a pretty good overall job of predicting disease that was confirmed with ground truth. There were 2075 true positives, 55 false negatives, 41 true negatives, and 12 false positives. The model correctly identified 97.1% of the 2183 total cells. The precision of the model was 42.7%. The sensitivity of the model was 77.4% and the specificity was 97.4%. The accuracy was very good for this model and is essentially a percent error calculation of the model. The sensitivity measures the proportion of actual positives that are correctly identified while the specificity is the opposite and measures the proportion of true negatives identified. The precision gives us a sense of how useful or complete the results are. The model did well overall but provided insights to possible adjustments that could be made which would increase the predictive power here.

Critique of method
One critique I have is the small sample size. While I simply intended to only lay down the framework for creating a stepwise process in disease classification, it supplies results that can hardly be statistically backed. I would like to increase my sample size to five images for part three and look for similarities and differences between the five. I also intend to make some adjustments to the process to try and increase overall accuracy, precision, specificity and sensitivity. So essentially this critique is the limited conclusion that can be drawn from a sample size of one and the need to increase that to five for now.

The second critique I have is the number of steps I have used to get to this point. I would like to find a more manageable way to do the segmentation process and the image processing steps to get to this point. I have found small changes I can make along the way. Ideally, I can use the Modelbuilder in ArcGIS Pro where most of the processing is done. This will streamline the process when I find a way to do this.

An error which is present in my results is the confusion matrix. The matrix is considering 2183 raster cells in order to perform the matrix. These cells were determined by a defined rectangle when exporting a Tiff file from ArcGIS Pro. Many of these cells are not even of the leaf and is simply classifying regions outside of the leaf. To correct this, I would need to export a Tiff file which is symbolic of the leaf shape. The confusion matrix results provided were therefore erroneous in a sense or a bit misleading.

Partner ideas
My partner talked about how they were doing a neighborhood analysis and it could be practical for me to do. She had mentioned doing it in earth engine which I haven’t used but could get some help from her. There is a multiple rings buffer and I could look at the false positives in this light. She also mentioned using geographically weighted regression. We didn’t discuss much about it, but it seemed like a good regression to perform on my error analysis. We related on some level with our projects data and issues but at the time didn’t have any clear resolves. I will be curious to follow up on our chat and see what type of analysis was performed and share my results as well.

Appendix

Image 1. Overlap between segmentation on top and manual classification below

Image 2. False negatives after subtracting segmented regions from manually classified.

Image 3. False positives after manually classified cells are subtracted from segmented.

Table 1. Confusion matrix

R Code
##raster upload

install.packages(“raster”)
install.packages(“rgdal”)
install.packages(“sp”)

library(raster)
library(rgdal)
library(sp)

img20_seg <- raster(“E:/Exercise1_Geog566/MyProject3/RasterT_afr7_Polygon_1.tif”)

img20_ground <- raster(“E:/Exercise1_Geog566/MyProject3/diseased_20_PolygonT_1.tif”)

img20_seg
img20_ground

plot(img20_ground)
plot(img20_seg)

##export data

raster.table <- as.data.frame(img20_seg, xy=T)
truth <- as.data.frame(img20_ground, xy=T)

install.packages(“xlsx”)
library(xlsx)
setwd(“E:/”)

tableimg <- raster.table

write.table(tableimg, file = “dataexport.csv”, sep = “,”)
write.table(truth, file = “truth1.csv”, sep = “,”)

##confusion matrix

#######this isn’t working

install.packages(“caret”)
library(caret)

table(confusionM$reference, confusionM$predicted)

confusionMatrix(confusionM$reference, confusionM$predicted)

##For the confusion matrix if above doesn’t work

myconfusionM <- table(confusionM$predicted, confusionM$reference)
print(myconfusionM)

##accuracy of matrix

2075+41+12+55 #total
(12+55)/2183 #misclassified/total

##Precision

41/(55+41)

##Sensitivity

41/(12+41)

##Specificity

2075/(2075+55)

Leaf spot patch analysis caused by Blackleg

Review
My research involves the classification of a leaf spots on turnip derived from the pathogen blackleg. I had hypothesized that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves. This post focuses on the analysis of the clusters based on image classification through segmentation as well as the manually classified clusters. These clusters of pixels are expected to be representative of the diseased patches on the leaves. Here we seek to obtain some patch statistics which will be compared for relationships and accuracy at a later time. A large portion of this process went into image processing before the analysis could be conducted. All of the image processing took place in ArcGIS Pro. The next step involved the patch analysis which was conducted in Fragstats.

Questions
Some questions I asked myself about element A & B of my hypothesis included:

Can diseased patches be successfully identified and isolated by computer image classification and through manual classification?

If yes; what is the area and perimeter of the diseased patches based on image classification and manually classified diseased patches? I was mostly looking to obtain and gain some experience in the patch analysis provided by Fragstats. Much more thorough analysis can be and will be completed in Fragstats when variable A & B are compared for accuracy assessment in exercise two.

Tools used and methodology
The image processing and classification of pixels was conducted in ArcGIS Pro. This analysis required extracting both manually classified diseased patches and computer classified patches. I began by uploading band 3, which captures light energy in the red wavelength at 670 – 675 nm.

1. For computer classification of the image I used Segmentation.

a) I went to the Imagery tab and the Image Classification group. Click the drop-down arrow and an option for Segmentation should appear. Be sure the layer you desire to perform the segmentation on was selected prior to selecting the Segmentation Classification Tools. In the pane you have options to adjust Spectral detail, Spatial detail and Minimum segment size in pixels. There are variable and depend on image resolution, level of accuracy you wish to achieve, etc. Because my resolution to high I chose for higher spectral detail and spatial detail. I adjusted the spectral detail from 15.50 to 17 and the spatial detail from 15 to 17 as well. For the minimum segment size in pixels I chose 10. As mentioned these values are variable and although the standard values worked well for segmentation, I previewed other values and achieved greater results by adjusting. After previewing other options and deciding what values work best, click the Run button in the bottom right corner of the pane.

b) Masking all the cells that weren’t diseased was the next step and crucial to this diseased region selection process. To do this, go to the Analysis tab and the Raster group to find the Raster Functions. Open the Raster Functions and it should appear in the pane to the right. Search Mask and select it. Open the segmented image in the raster box and three options will be available for Included Ranges. These three options are representative of the blue, green and red bands. Because the data was adjusted from 12-bit to 8-bit in the segmentation process, we should have a range of values between 1 and 255. Despite having three band options, all the bands hold the same value because we are only working with the red band. Click on the segments you wish to include in the working window to check their values. Include a minimum and maximum which should be the same for all 1, 2 and 3 that includes the segments you want. The range I had included was 100 to 160, which is the RGB.Pixel Values that appear when clicking a pixel. Once you have these entered, click create new layer in the bottom of the right pane.

c) Next, I used the Clip function by navigating to the Imagery tab, the Analysis group and selecting Raster Functions. In the window pane to the left a search bar can be found along the top where you can enter Clip. Although there are other methods and locations the clip function can be found, I experienced different results depending on how I navigated to clipping, as well different clipping options in the window pane. In the Parameters section for Clip Properties, select the drop-down arrow for the raster you previously segmented. Leave Clipping Type and Clipping Geometry/Raster as the default. Adjust the active map view by zooming in or out to the region you want preserved after the clip. I use as small of area as I possibly can when clipping, while keeping all required segmented regions. Click the Capture Current Map Extent button found just to the right of the extent options in the pane (Green square map). To clip, click Create new layer at the bottom of the pane and a new map layer with the clipped region should appear in the map contents.

d) From here, go to the Analysis tab and Geoprocessing group where you can find the Tools. Click the Tools to open the options in the left pane. Here we want to use the Raster to Polygon (Conversion Tools) which can be found by searching at the top of the tools window pane. Use the most recent segmented-clipped raster for the Input raster, leave the Field blank and select where you would like the Output Polygon features to save. I left the remaining option as the default and clicked the Run arrow in the bottom right corner. Although we need the final product to be in raster format, this step is important for creating separate polygons which are grouped together as one unit.

e) As mentioned we must get the polygons back into a raster. Before we use the Polygon to Raster tool, polygons which overlap must be merged. This is one of the two reasons why we converted the raster to polygons. When the segmentation was conducted in (step d), diseased regions were not all categorized in the same bin. This resulted in regions which were maintained after masking but are different shades grey. Even one diseased patch may have two to three tones to it, resulting in segments for a patch. Now that everything is a polygon, we can merge these polygons that should be one polygon. Click the Edit tab and select Modify in the Features group. In the Modify Features, scroll down to the Construct group and click Merge. In Existing Features click Change the Selection and while holding shift, select the polygons which belong in one group. They will be highlighted in blue and appear in the pane if properly selected and if so, click Merge. Navigate to the Map tab and Selection group and click Clear. Use the same merge steps to merge any other polygons which belong to the same diseased lesion but were classified separately in raster segmentation.

f) Finally, the polygons can be converted back to a raster for the last step of image processing. Click the Analysis tab and find the Geoprocessing group to Tools once again. Search Polygon to Raster (Conversion Tools) in the Tools pane and select it. Use the polygons layer for Input Features and the window pane options should adjust to accommodate your raster. The only adjustment I made was to the Cellsize which I changed to 1 from 0.16 to maintain the cell size in my original raster. Select Run in the bottom right corner of the pane for the final layer. Each of the polygons should now be converted to a rasterized polygon with a unique ObjectID, Value and Count which can be found by going to the Attribute table for that layer or clicking on a group of pixels. This allows for each diseased patch to be aggregated and analyzed as such in Fragstats.

g) Go to the final raster layer, right click and go to data and export data. What I wanted is a TIFF file for analysis in Fragstats.

2. To compare the accuracy of the segmentation as a classification method for diseased pixels, manual classification was used as a ground truth technique.

a) For manual classification of the original red band, the Training Sample Manager was used. This allows for manual classification which can be used for supervised machine learning techniques of classification. Here, I simply used it to select diseased regions manually, but have an end goal of using the support vector machine learning model.

b) To get to the Training Sample Manager, go to the Imagery tab and Image Classification group and select the Classification Tools where you can click Training Sample Manager. In the Image Classification window pane go to create new scheme. Right click on New Scheme and Add New Class and name it diseased pixels and supply a value which is arbitrary and a description if desired. Click ok and select the diseased pixels scheme and then the Freehand tool in the pane. Draw an outline around the diseased regions in the image. Save these polygons and the classification scheme. Go to Add data in the Layer group on the Map tab to upload the training sample polygons.

c) Once the polygons are uploaded, use the Polygon to Raster protocol which can be found in step 1, part (f) followed by part (g).

Fragstats was used for the analysis of the polygons. For exercise 1 the intent in Fragstats was to simply obtain information about the diseased patches that were extracted from the red band image. This included the pixel number, area, perimeter, perimeter-area ratio and shape index. Many more options were available for patch analyses and will be considered for exercise 2.

1. Open up Fragstats for the patch analysis.

a) To import your first image click Add layer and go down to GeoTIFF grid and select it. In the Dataset name, search for your tiff file you created in ArcGIS Pro and select it and click ok. You have to do this for each of the datasets. Then go to the Analysis parameters and click Patch metrics in the Sampling strategy. For General options hit Browse to find a location for the output to save and click the Automatically save results checkbox.

b) Click on the red box called Patch metrics and in the Area – Edge tab select the Patch Area and Patch Perimeter. Click on the Shape tab and click Perimeter – Area ratio and Shape Index.

c) In the upper left corner, you can hit Run and it will check everything you have uploaded and the analysis you have selected. You must then hit Proceed after it checked the model consistency and it runs the analyses.

d) After this hit the Results options for viewing the output.

Results
The patches aligned pretty well through visualization in ArcGIS Pro when comparing the two layers. The comparison between patches will be done in exercise 2. I did notice that the segmentation process missed many of the smaller diseased patches. Looking at Image 2, you can see that they were segmented from the surrounding regions but were lumped into a bin with that caught areas that appeared lighter in the image because of a reflection. Different thresholds could be used in the segmentation process in order to include those smaller patches. This could maybe be done if they followed a parameter involving shape. This may be considered for future segmentation. A concern is that you set the threshold too low and it includes many regions that aren’t diseased but include all the diseased regions also. This would result in many false positives for disease. The alternative is to set a high threshold value for diseased regions which would result in false negatives. The idea is to find a good balance between the two. Currently the segmentation is strictly based on the segments value which is attributed to the reflectance in the red band. As mentioned another parameter worth considering is shape. Since the diseased regions tend to be leaf spots resulting in a circular area typically, adjustments could be made to include more circular patches that don’t extend past a certain number of pixels. In table 1 we see that number of pixels for patches ranges from 8 to 50. The patch analysis provides some insights into possible spatial factors that explain these segmented regions and how the classification process could be done more accurately.

Table 1. Showing the 11 different patches that were manually selected and analyzed in Fragstats. The color indicates the corresponding patch that was detected through segmentation shown in Table 2.

Table 2. The 4 different patches that were identified through segmentation and analyzed in Fragstats. The color indicates the corresponding patch that was detected through segmentation shown in Table 1.

Critique of the method
The MicaSense red-edge camera has 5 bands which can be very helpful for applying different vegetative indices and compositing bands in order to help bring out the variation in spectral signatures between diseased and un-diseased tissue. Although the pixel size is just under 1 mm, which appears to be adequate for identifying lesions on the leaf, the bands do not have perfect overlap. This is due to the design of the camera which has five different lenses for the five different bands that are all separated by one to two inches. This could be corrected for if the extent for each of the five bands was manually adjusted for a near perfect overlap. Until this is resolved, the red band seems to show the most variation in pixel value for diseased patches in comparison to the other four bands and was used for this analysis.

Additionally, the amount of processing that is done in part 1 step (a) to get the segmented raster patches is has many steps. Methods for speeding up this process need to be considered for future analysis especially when conducting this analysis on the 500 leaves.

As mentioned before, Fragstats has many more statistically capabilities that were not applied in this analysis. Getting more statistics on the manually and segmented patches would be helpful for determining the level of accuracy as well as other parameters worth considering.

Appendix

Image 1. The red band tif file uploaded into ArcGIS Pro for classification. The white patches are what we would expect the diseased regions to look like.

Image 2. The result of the segmentation performed on Image 1 described in step 1 part (a)

Image 3. The patches determined based on segmentation in part 1 from Image 2. Separate patches were created by using the Raster to Polygon tool, followed by the Polygon to Raster tool. Two layers were turned on here for the intent of showing the overlap between the raster patches and the original image.

Image 4. The patches determined based on the manual classification described in part 2 from Image 1. Training sample manager followed by polygon to raster was used to get these patches. Two layers were turned on here for the intent of showing the overlap between the raster patches and the original image.

Aerial remote sensing detection of Leptosphaeria spp. on Turnip

Introduction

Of the many pathogens disrupting healthy growth of brassica species in the valley is the pathogen commonly referred to as Blackleg. This fungal pathogen has been reported to nearly wipe out canola production in Europe, Australia, Canada and in more recent years has devastated the United States (West et al., 2001). In 2013 the pathogen was reported for the first time in the pacific northwest since the 1970’s (Agostini et al., 2013) and has since been reported in the Willamette Valley (Claassen, 2016). There are two known species of Blackleg, Leptosphaeria maculuns and Leptosphaeria biglobosa. These are not to be mistaken with the potato bacterial pathogen Pectobacterium atrosepticum, which is also referred to as Blackleg. While much of the crop failure in canola has been associated with Leptosphaeria maculuns, both species are found in the valley and seem to be of similar consequence to turnip.

Classification of cercospera leaf spot for instance has been accomplished applying a support vector machine model but utilized a hyperspectral camera with high spectral and spatial resolution (Rumpf et al., 2010). Because plant diseases can oftentimes be difficult to see even with the naked eye, researchers have struggled to successfully detect specific plant diseases as spatial resolution decreased. While this analysis focuses solely on detection at 1.5 meters, it is possible for detection of blackleg despite lowered spatial resolution as result of increased flying elevations.

Here we consider how spatial patterns from diseased leaves is related to ground truth disease ratings of turnip leaves based on spectral signatures. With the application of a support vector machine model, classification of diseased versus non-diseased tissue is expected to generate a predictive model. This model will be used to determine if single turnip leaves which are diseased and non-diseased are accurately categorized based on the ground truth.

Data

This project will be using a data set derived from roughly 500 turnip leaves, harvested here in the valley during the months of February to April of 2019. Roughly 200 of these leaves will be used in training the data model. The remaining leaves will be used for the data analysis in determining accuracy of the model versus ground truth. The spatial resolution is less than 1 mm and images are in raster format of single turnip leaves with five bands (blue, green, red, far-red, NIR). I do not anticipate using every band for the analysis and will likely apply some VI as a variable. The camera being used is a mica sense Red-edge which is 12-bit radiometric resolution.

Hypothesis

I hypothesize that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves.

Approaches

In order to accomplish this analysis, I will be using ArcGIS Pro where I have quite a bit of experience, but not particularly on this subject or type of analysis. The workflow for analysis will begin with image processing where I have little experience but don’t require expertise in this area. I hope to conduct the image processing in Pix4D where I will begin with image calibration based on the reflectance panel in each image. Followed by cropping down to simply the leaf under assessment. From here there may be some smoothing and enhancing the contrast of the image but is still undetermined.

Images will then be brought into ArcGIS Pro for conducting a spatial analysis. I intend to use spatial pattern analysis of manually classified disease versus unsupervised segmentation of the leaves for exercise 1. Next I plan on then using this information in spatial regression to improve image-based classification for exercise 2. For exercise 3 I intend to use the support vector model wizard, which will be used for training a model. This involves highlighting regions of diseased tissue and regions of non-diseased tissue to obtain a trained model when a sufficient number of pixels to create support vectors is reached. The x and y-axis for the model are yet to be determined but will likely be NIR and red-edge digital number values. Some alternatives are using different VI’s such as NDVI as explanatory variable. Turnip images which were never used for training the model will be used for analysis of the support vector machine’s ability to classify diseased or non-diseased regions and then leaves entirely. I anticipate every leaf to have at least a few pixels which will classify as diseased and will therefore set a threshold for a maximum number of diseased pixels in the image, while yet classifying it as non-diseased. I also might require a certain number of pixels to be bordering one another qualify as diseased region. The methodology may require some troubleshooting, but the expectations are clear and the methods to reach that outcome are mostly drawn out.

Expected Outcome

I expect the model to have very high accuracy after the model is fine tuned for accuracy based on contrast in spectral signatures I expect to see between diseased versus non-diseased leaves. Below I have outlined the three outcomes I would like to ultimately achieve. Due to time restrictions, the scope of my research is limited to outcomes 1 & 2 below.

Train a support vector machine model for classification of pixels in turnip leaves as either diseased or non-diseased.
Accurately apply the SVM model on turnip leaves from many geographical locations in the valley with different levels of diseases severity and different times in the year.
Scale up from 1.5 meters and test the ability of the model to maintain accurate classification of blackleg on turnip.

Significance

I intend to publish and further the collective knowledge in aerial remote sensing. This more specifically applies to those in the area of agronomy or plant pathology. This is very applied science and is a resource for those in the industry of agriculture.

Traditionally detection for this pathogen has depended on a reliable field scout who may need to cover fifty acres or more looking for signs or symptoms of this disease. Nowadays, precision agriculture has introduced the use of drones to perform unbiased field scouting for the grower. This saves time and can be very reliable if done properly. An important aspect of disease control relies on early detection. If early detection can be accomplished, growers have time to respond accordingly. This may allow for early sprays with lighter applications rates or less controlled substances, cultural control of nearby fields, etc. in order to stop the spread of disease.

Works cited

Agostini, A., Johnson, D. A., Hulbert, S., Demoz, B., Fernando, W. G. D., & Paulitz, T. (2013). First report of blackleg caused by Leptosphaeria maculans on canola in Idaho. Plant disease, 97(6), 842-842.

Claassen, B. J. (2016). Investigations of Black Leg and Light Leaf Spot on Brassicaceae Hosts in Oregon.

Rumpf, T., Mahlein, A. K., Steiner, U., Oerke, E. C., Dehne, H. W., & Plümer, L. (2010). Early detection and classification of plant diseases with Support Vector Machines based on hyperspectral reflectance. Computers and Electronics in Agriculture, 74(1), 91-99.

West, J. S., Kharbanda, P. D., Barbetti, M. J., & Fitt, B. D. (2001). Epidemiology and management of Leptosphaeria maculans (phoma stem canker) on oilseed rape in Australia, Canada and Europe. Plant pathology, 50(1), 10-27.

GEOG 566

Advanced spatial statistics and GIScience