Question that I asked:
In Exercise 3, I was asking how the different classification methods that I’m using align or do not align with each other — essentially, creating a confusion matrix to identify the pixels that are True Positives, False Positives, and False Negatives when comparing two of the classifications. That is — what is the spatial pattern of confusion between pixels classified by global settlement datasets and an unsupervised binary classification of pixels within an OSM boundary?
Approach that I used:
I used EarthEngine to simulate confusion among pixels — I did not necessarily create a matrix in order to do so, but considered the concept of a confusion matrix in order to calculate a number for each pixel.
Steps I followed to complete the analysis:
In order to do this, I needed to compare “Facebook” and “K-means” as well as “WSF” and “K-means.” In this case, the “K-means” classification was my “truth” classification, because I needed something to measure the other classifications against that was not just a summary of pixels inside of a vector but a more traditional binary unsupervised classification. Thus, I used the settlements to guide a clustering algorithm of K-Means to cluster similar pixels, and created clusters of “settlement-like” pixels. I needed to perform raster math in order to combine these datasets and represent False Positives, False Negatives, and True Positives. I added WSF to K-means, with each pixel labeled as “1” in each representing a positive record. I then added Facebook to K-means, again with each pixel labeled as 1, but then multiplied this by 4 for further additive properties to identify the overlaps in False Positives, False Negatives, and True Positives. Ultimately, this meant that the pixels that were False Negative in either or both comparisons had pixel values of 10, 11, and 14. The data that were False Positive in either or both comparisons had pixel values of 1, 4, and 5. A value of 15 represented where all three datasets registered as True Positive.
Brief description of results I obtained:
I was interested in both False Positives and False Negatives, but for different reasons. In False Positives, I wanted to visualize where the different global settlement datasets DID detect settlements while the K-Means did not. False Negatives would show me where K-Means picked up settlement data but neither of the global datasets did. Ultimately, False Negatives were much more prevalent throughout the dataset, further illustrating the exclusion of refugee settlements from global datasets. I chose to display a map of a more interesting pattern that appeared when looking over a larger settlement that was near a large body of water: because the binary clustering grouped water with “settlement” type landscapes, this area is a significant False Negative in the data. If I were to calculate statistics, it would make sense to exclude the area of water in order to not promote bias in the data.
Critique of the method – what was useful, what was not?
This exercise probably caused the most trouble with regards to methods that I thought made sense and would work but both presented more challenges and fewer patterns than I was hoping. There wasn’t really the spatial pattern that I was expecting, perhaps because this data is so noisy and over discontinuous areas in the landscape. It was useful to dig into the overlapping data, but ultimately the actual spatial patterns were not very remarkable. Perhaps the numbers presented in a matrix would be a more useful representation of the data or patterns for this specific method. Aside from the spatial statistics, I actually ran into quite a few function issues in EarthEngine, Pro, and ArcGIS Desktop — all three of which I used to try to manipulate this data into the ways I was imagining it would. For my final project, I may need to revisit these methods or seek advice from others, because my techniques were not as effective as I’d hoped.