Final project synthesis of parts 1, 2, and 3 on classification of Blackleg on turnip

Background

The objective of this research was to find a way to attribute proper classification of diseased versus non-diseased regions on a turnip leaf. To do this I assessed the classification accuracy of the unsupervised machine learning classification algorithm, segmentation, with manual classification. Additionally, I hoped to reveal some spatial characteristics about the lesions on the leaf for future work.

Dataset

The cumulative data set is roughly 500 turnip leaves, harvested here in the Willamette valley during the months of February to April of 2019. For the sake of this project I have selected 5 images in which I have worked with to complete the basis of this analysis. The spatial resolution is just under 1 mm and images are in raster format of single turnip leaves with five bands (blue, green, red, far-red, NIR). For this analysis only, the red band was used. The camera being used is a Micasense Red-edge which has a 16-bit radiometric resolution.

Questions

These were not all questions that I had anticipated answering at the beginning of the term, but have developed over the course of the term as I ran into issues or found intriguing results.

• Can diseased patches be successfully identified and isolated by computer image classification and through manual classification?
If yes;
o can the diseased patches based on image classification and manually classified diseased patches be characterized?
o How accurate is the segmentation classification?
o Are more raster cells being classified as false negatives or false positives when misclassified?
o What is going undetected in segmentation?

Hypothesis

My original hypothesize was that spatial patterns of pixels based on image classification are related to manually classified spatial patterns of observed disease on turnip leaves because disease has a characteristic spectral signature of infection on the leaves.

Approaches and results

I began in ArcGIS Pro by using the: segmentation -> mask -> clip -> raster to polygon -> merge -> polygon to raster. This yielded a layer which included only the computer generated segmented diseased patches which were exported as Tiff files. I followed a similar process with my manually classified diseased patches but used the ‘training sample manager’ instead of segmentation. Both these Tiff files were uploaded to Fragstats where I did patch analysis between the two to characterize the diseased patches. I found this not entirely helpful, but if I dug deeper into patch analysis I could be provided with some worthwhile information about the diseased lesion characteristics.

After characterizing the patches, I performed an error analysis using a confusion matrix in R. The two Tiff files were uploaded to R and converted to rasters to check quality and then to tables. The tables were imported into excel and values were adjusted to 1 for diseased patches and 0’s for non-diseased. Anything outside the area of the leaf was deleted. The tables were imported back to R and a confusion matrix comparing the computer classified segmentation versus the manually classified patches was conducted. All the classification matrices were combined for the overall accuracy (Table 1). This aspect was a bit time consuming as was the original image processing but was a very useful analysis.

One other approach I used to help with visualization was showing the leaf with false positives and false negatives included (Image 1). This was necessary to help explain results more clearly to others and show results outside of the statistics.

Significance

I have created a method for getting from image acquisition all the way to statistical results. This work is first off significant to my thesis research. This will be something I refine to help streamline but is a very good starting point for me. Second, I got statistically significant results based on p-value and a high accuracy. This will be helpful to those in agronomy or crop consulting which are using remote sensing for disease detection. At this point I don’t think it is practical for application of agronomist but with further work I hope to make this methodology an available resource for other researchers to work off of and for farmers to potentially try and utilize.

Hypothesis revised

After receiving these results and considering my next steps, my hypothesis has been refined. I now hypothesize that segmentation classification has an equal ability to classify diseased patches as manual classification because of the spectral values associated with diseased pixels.

New research questions

This research is far from complete and has led to a new set of questions to be answered.

• Is percent classification of diseased pixels the best metric for how well the classification method works?
• What is the accuracy of a support vector machine versus manually classified pixels?
• Is this classification translatable to other brassica species such as canola, radish, kale, etc.?

Future research

To answer some of my new questions I believe it is critical I learn how to create band composites between the five available bands. Currently, my issue is the bands do not overlap. My approach for solving this includes the use of georeferenced points/pixels on each image to ensure they overlap. I think if I use the 4 corners of the image I should get the overlap I want. By creating band composites, I can begin using vegetative indices like NDVI to help more accurately distinguish pixels than can be done with one band alone.

Another concern I have is despite classifying pixels accurately I am weary about this being an all determining method of classification. I have instances in my current classification where only 2 or 3 of the 6 or so lesions are actually having pixels inside its perimeter classified as diseased. I think a more accurate or a helpful assessment would quantify the percent of lesions that are captured by the segmentation process out of the total number of lesions on the leaf.

Using an unsupervised classification scheme for this work was good to lay the groundwork. In future work I would like to move to supervised machine learning classification schemes. My reason being; when attempting to translate this algorithm used on turnip classification to others such as canola, radish, etc. I believe it will have a higher level of accuracy and consider more spatial characteristics. Currently, my classification scheme only uses the spectral values of the red band where I would like to implement variables such as lesion size, proximity from margin, proximity to other lesions, shape, NDVI values, etc. This works into my last question concerning the ability of the classification to be applied to broader ranges of plants.

Technical and software limitations

Some of my limitations are apparent in above sections because they led to my future research questions. One of my greatest limitations was my ability to effectively and efficiently use ArcGIS Pro as well as code in R. This led to longer measures of time conducting image processing which may have been streamlined if I was better in ArcGIS Pro. The major roadblocks I hit in ArcGIS Pro were related to the overlap of different bands that I mentioned earlier in addition to the use of the support vector machine. I have plans to effectively solve the band composite issue, but am still unsure about the support vector machine function in ArcGIS Pro. I believe the way it was setup is not designed for collecting training data for many images but collecting this data from one image. My alternative for the SVM is to try conducting the analysis in R.

In R I struggled in my ability to code. This was a major strain on my time and required the assistance of others to help troubleshoot on many occasions. This can be overcome with more practice and is not so much a limitation of the software, but the user.

My learning

I learned the most in ArcGIS Pro I would say. The image processing took lots of time and many steps which exposed me to a whole set of tools and analysis I was previously unaware of. I don’t think my skills in R increased much but I now have the code and steps for the analysis I would like to complete on my future thesis work, the confusion matrix. Another program I gained some experience in was Fragstats. It was useful for characterizing the diseased patches and may be an option for some of my future work when looking at the size, shape distance, etc. of these diseased patches.

Appendix

Table 1. Confusion matrix for pixel classification of five leaves.

Image 1. Example of one leaf analysis where the green cells are manually classified non-diseased pixels, white pixels inside the leaf are segmentation classified diseased pixels and block pixels are the manually classified diseased pixels.

GEOG 566

Advanced spatial statistics and GIScience

Final project synthesis of parts 1, 2, and 3 on classification of Blackleg on turnip