Description of the research question:
My research focuses on vulnerable populations, specifically refugees and internally displaced peoples. This is a small part of a larger project funded by NASA, “Mapping the Missing Millions” and is largely defined as the “geography of exclusion.” I am hoping to understand why settlements have been excluded from global population datasets; we know that this happens often, but not specifically the mechanisms of why these settlements are missing from these datasets. Hence, my question recognizes that the classification methods used for population datasets are imperfect and I’m seeking to understand why they are imperfect. This means I will need to understand the spatial distribution of the settlements identified in both sets and analyze the intersections and exclusions between them and understand why these exist. This might also mean figuring out how close an OpenStreetMap settlement is to an urban center or a road and figuring out if these metrics affect the classification.
My research question is as follows:
How do the settlements identified by OpenStreetMap (OSM) compare to settlements identified in global population datasets via classification and what about these classification metrics fails to detect settlements known to OSM?
Description of the dataset:
The crux of my data is a comparison of UNHCR and OpenStreetMap (OSM) to a global population dataset, Global Human Settlement Layer (GHSL). OpenStreetMap is a global open source dataset and contains both point and polygon information. Through the UNHCR point data that identifies settlement locations, I have identified boundaries that are attributed as delineating refugee settlements. A potential disclaimer with OSM data is that it’s an open source dataset contributed to by volunteers. This means that attribution can be unclear or inconsistent, despite validation. I can also use other OSM data like roads and urban areas to expand my spatial analyses for a proximity assessment.
I will also make use of the rich Landsat and Sentinel data available for my spectral analysis. This will either be at 30 meter resolution (Landsat) or 10 meter resolution (Sentinel). The temporal extent depends on the satellite: Landsat 7 is from 2000 and forward; Landsat 8 is 2014 to present, and Sentinel-2 was launched in 2015.
For this class, I will focus my analysis on Uganda, given its high prevalence of refugee settlements and extensive OSM dataset with a strong Humanitarian OpenStreetMap Team presence.
The images above are an example of a refugee settlement in Algeria. The area in blue in the NW corner is the settlement; the area in the SW is a nearby town. However, this settlement is not identified in the Global Human Settlement Footprint, although this specific settlement has existed since at least 2001.
Hypotheses:
I expect that settlements not detected by GHSL will have a different and less distinct spectral signature than settlements detected by GHSL. By “distinct,” I am referring to how different the spectral signature in the settlement is to the spectral signature immediately around the settlement. By “different” spectral signature, I am referring to the concept that the classification in GHSL is looking for a specific type of spectral signature, and that this does not match the spectral signature found in the settlements indicated by OSM. I also expect that settlements not detected by GHSL will be further from known roads and high density urban areas than settlements detected by GHSL.
Approaches & Analyses:
With my OSM data, I can use these vector boundaries to analyze the spatial and spectral patterns of these settlements. I will analyze the size of these settlements, the spectral signature in these settlements, proximity to resources (roads, water, cities).
With the global population dataset, I can identify pixel clusters that indicate settlements, and perform similar analysis to identify size, spectral signature, and proximity to resources.
While these analysis can help me identify the differences between these settlements, I also still need to analyze the classification methods of GHSL to understand why these differences might be significant and have resulted in different settlement detections.
Expected Outcome:
I will need to present the statistical relationships between the refugee settlements that are and are not detected in my target population dataset. Because I’m also seeking to understand why these settlements are excluded in the classification, I will need to connect the spatial relationships that I find with the classification methods that GHSL uses. This will be a more verbal description, but I plan to make maps to illustrate these spatial relationships and characteristics. These relationships and characteristics include settlement size, border complexity, proximity to roads, and spectral signature.
Significance:
This project addresses the exclusion of settlements and populations within various global datasets. This has a greater relevance given that so much derived data relies on this, whether for distributing aid and resources, analyzing displacement, or understanding human migration. By understanding what factors contribute to the inclusion or exclusion includes settlements in these datasets, more users can understand the limitations of what is possible to detect and where the gaps in population detection is more likely to occur.
Level of preparation:
I have substantial experience with ArcInfo products. I’ve been using ArcGIS Pro for over a year now, and prior to that I spent 2 years working with ArcMap daily in a professional capacity, took three classes that exclusively taught in the ArcDesktop interface, and employed ArcInfo for projects in multiple other classes. My image processing skills are also extensive, ranging from two classes using ENVI Classic, a GIS internship that included georeferencing satellite imagery, and most recently a class and outside research using Google Earth Engine. My experience with R is limited to a summer research project in 2016. I have some basic programming in GIS skills (very limited ArcPy use but recent and frequent ModelBuilder use) and will be learning more throughout this term as a participant in Robert Kennedy’s GIS Programming class.