Category Archives: 2017

Spring 2017 blog posts

Spatial and Temporal Patterns of Reported Salmonella Rates in Oregon

Question: How are values of reported yearly Salmonella rates related to predictors found in previous OLS and GWR analysis at different temporal and spatial lags? Also, to what extent are there regional groupings in Oregon found through Principal Components Analysis (PCA)?

The data used here is the same used in all prior blog posts.

Names of Analytical Tools/Approaches Used

Both temporal and spatial cross-correlation tools were used in my analysis to visualize how values of the predictors I identified in previous analyses varied at higher/lower rates of reported Salmonella. The time series analysis was limited to 2008-2017 which was the extent of my data. Temporal cross-correlation allowed me to visualize how the values of predictors varied at different time lags of Salmonella rates and the spatial cross-correlation allowed me to visualize the variation in the predictors at different spatial lags of Salmonella rates. Results for the temporal and spatial cross-correlation were visualized with ACF plots and cluster plots respectively. Finally, PCA was used to identify noticeable regional groupings as a function of the different variables in my dataset. These results were visualized on a biplot.

Description of the Analytical Process

All cross-correlation analysis was limited to the significant predictors identified in previous regression analysis. Specifically, county Salmonella rate as a function of: county % female, county % child poverty, and county median age.

  1. Temporal Autocorrelation: Data were summarized by year, creating a data frame of median values for Salmonella rates and the predictors for each of the 10 years in my dataset. ACF plots were created for each of the predictors as they varied over different time lags of Salmonella
  2. Spatial Autocorrelation: Data were summarized by county, creating a data frame of median values for Salmonella rates and the predictors for all 36 counties in Oregon. Spatial cross-correlation was carried out using a local spatial weights matrix to create clusters of local indicators of spatial association. The clusters for each of the predictors were visualized on maps of Oregon.
  3. PCA: Oregon was divided into three broad regions based on population distribution and geographical barriers: 1) the Portland metro area consisting of Multnomah, Clackamas, Washington, and Columbia counties, 2) West Oregon consisting of all other counties west of the Cascade Mountain range, and 3) East Oregon consisting of all counties east of the mountain range. Data was summarized by county and every county was associated with a specific region within Oregon. PCA was carried out on the first two principle components because approximately 70% of the variation in the data was explained by these two components. The results of the PCA were visualized in a biplot.

Brief Description of Results

Temporal Cross Correlation:  results show that county percent female was positively associated with Salmonella rates at time lags 1-4 and slightly negatively associated with disease rates at all other lags. Child poverty was also positively associated with Salmonella rates at lags 1-4 and were otherwise negatively associated with disease rates except at distantly negative time lags. Temporal cross correlation analysis of median age and Salmonella incidence rates yielded a similar pattern. It appears the temporal cross-correlation patterns across all three of these variables follow a somewhat sinusoidal curve.

Female:

Child Poverty:

Median Age:

Spatial Cross Correlation: results from a cross correlation analysis of county percent female and Salmonella rates show a large cluster of high county percent female and high rates of Salmonella clustered in the Northwest region of the state. Other clusters are fairly well dispersed around the state. From an analysis of child poverty and Salmonella rates there is a large cluster of high Salmonella rates and high child poverty in the Northeastern area of the state and a large cluster of high Salmonella rates but low child poverty in the Northwest. A similar pattern can be seen in the median age cross-correlation plot.

Female:

Child Poverty:

Median Age:

PCA: Principal Component Analysis showed that the counties comprising the Portland metro area were all characterized by relatively higher income compared to the rest of the state. The Eastern portion of the state can somewhat be characterized by higher median ages and higher proportions of elderly residents. The rest of the western portion of Oregon is not characterized by particularly high values of any of the other variables of interest.

Critique of Methods Used

These analyses support the findings of Exercise 2 where there was evidence to support the existence of a time trend in the rates of reported Salmonella in Oregon counties. Also, the results of the cross-correlation and principal component analysis support the findings in the GWR analysis where different predictors were positively/negatively associated with Salmonella rates depending on the county in which the data was measured. One main critique of the spatial cross-correlation analysis was that through the use of a local spatial weight matrix, only local indicators of spatial indicators of association were determined. This analysis did not include a global spatial weight matrix which could change the spatial associations seen in my results. Also, while PCA was useful in showing that different regions in Oregon were more strongly associated with certain predictors than others, there is considerable overlap between the regions. Thus, it is unknown if these results are significant.

Testing the Cross Variogram with Ripley’s K Plot and Cross K Plot

Marina Marcelli

Question:

What is the spatial distribution of the water table in my study area with respect to scale? How are wells in my area clustered? Is there a relationship between wells where the water lies above the first lava and below the first lava, wells that have water tables that correspond to the first lava those that lie above the water table, and wells that have water tables that correspond to the water table and those that lie below it?

Approach:

I used both the Ripley’s K function and Ripley’s Cross K function to look at the spatial distribution of water, first lava and the water table with respect to lava in the area. Like the variogram and the cross variogram, Ripley’s K function and the Cross Ripley’s K function describe spatial distributions at different scales. However, rather than use the variance, as with the variogram, Cross Ripley K compares the spatial data to a curve that represents complete randomness (the Poisson’s curve).

Brief Methodology:

I first did a preliminary analysis using the Ripley’s K function for both the depth to first water and the depth to first lava.

Then I used a Kcross to compare the differences between the depths to first lava and water. Because the Kcross function uses only factor variables, I had to make sure my data was categorical. I thus decided to bin my data into three categories depending on the Lava – Water (L-W) value.

L_W[i] >40 <- “above”

40>= L_W[i] >= -40 <- “equal”

L_W[i] <-40 <- “below”

Where “above” stipulates that the water table is above the contact to first lava, “equal” stipulates that the water table roughly equated to the contact to first lava, and “below” signifies that the water was below the contact to first water.

After binning the data I compared each category to the others, resulting in three Cross Ripley’s K plots. I then plotted a significance envelope to see where the data was actually significant.

Results:

Figure 1: Study area with the well logs used for this study. The cyan represent the well logs that have a water table “above” the first lava. The Blue are below, and the pink have a water table that roughly correponds to the first lava.

Figure 2: K(r) vs r, the distances for which we are comparing clustering. For these data, Poisson’s curve appears to be nearly horizontal. This means that the data appear to be clustered at all scales measured by Ripley’s K function. According to these plots the depth to first lava data is clustered at all scaled.

Figure 3: The shape of the depth to first water Ripley’s K plot is different than the depth to first lava plot (fig 2). What that means, I don’t know. However, based on both Poisson’s curve and the significance envelope, water also clusters all scales measured.

Figure 4: Ripley’s Cross K function for the points where the water table is above the contact with first lava, with the points where the water table is below the contact with first lava. At distances shorter than 6 km, the spread appears to be random, while distances, the data appear to have significance. This means that the data do not cluster a close distance. This corresponds what we would expect from natural fluctuations in elevation of the water table driven by changes in elevation. In a simple system, with one lava and one water table, this works well. However, the study area is in reality much more complicated than this.

Figure 5: Cross K function for the points that correspond to above and equal. The data appear to be correlated at much closer distances than the above and below data.

Figure 6: Cross K plot for points corresponding to equal and below. They appear to be linked at all scales. Ideally they would be clustered at closer scales and random farther away. The discrepancy might be accounted for by faults, or multiple water bearing layers.

 

Conclusion:

Depth to first lava and depth to first water are linked at all scales measured by Ripley’s K plot. In this case, the largest scale I managed to measure was 12 km. Wells that plotted as above the water table and wells that plotted below the water table were not clustered, rather they showed to be linked at distances greater than 6 km. Points that were equal and below  were correlated at all scales. Well logs that were equal and above were correlated at scales larger than 2 km. This might have to do with lack of data. It might also have to do with the regional geology.

Critique of the method:

One aspect of the process that I walked away from was that my field area is 40 km across. The largest r value I calculated was 12 km. Ripley’s K plots and the Cross K plots might demonstrate different relationships are larger scales. In the future I would like to figure out how to change the r values.

I will need to be able to plot these data at larger scales to determine weather or not they corroborate what the variograms found.

Grant Z’s Exercise 1: Determining Moran’s I of LULC/NDVI change in rural Senegal

For this first exercise, I wanted to determine how land use/land cover (LULC) was spatially auto-correlated with itself in my region of interest. In order to do this, I acquired two Landsat images, one from the past and one from present, conducted an NDVI (normalized difference vegetation index) analysis on each, determined the difference between the two, and then ran a Moran’s I function over that image to determine how changes in NDVI are related to each other. By understanding this, I can know better how patterns of LULC manifest in the landscape, and their spatial pattern.

The software I used to approach this problem were ArcGIS Pro to import imagery, clip imagery, and perform raster calculations for computing NDVI, and RStudio to import the NDVI raster and run a Moran’s I function on it.

Firstly, I downloaded Landsat imagery from GloVis, the USGS Global Visualization Viewer, which is a repository for all Landsat data, as well as some imagery from other satellites. I selected my area of interest and searched for Landsat 5 imagery from ~2008 and Landsat 8 imagery from 2018 — I avoided Landsat 7 as a malfunction on that sensor has led to gaps in its data. Ultimately, I downloaded one Landsat 5 image from January 2010 (the only one available which had no cloud cover) and one Landsat 8 image from January 2018, to determine 8 years of change.

I then added the red and near-infrared (NIR) bands for each image into ArcGIS Pro. I first performed an intersect over all the layers to generate a common footprint. From there, I performed an NDVI analysis using the raster calculator tool on each image set (Landsat 5 and Landsat 8), using the classic NDVI formula (NIR – red)/(NIR + red). I then subtracted the 2010 NDVI raster from the 2018 NDVI raster to determine areas of change. The figure below shows the ultimate 8 year difference NDVI image I output. Areas of red represent declines in vegetation between the two images; yellow areas represent no change; green areas represent growth in vegetation.


Overall NDVI

With this raster depicting NDVI change in my AOI, I then wanted to know how the pattern of change related to itself. To do this, I performed a spatial auto-correlation function on both the large image, and a subset image, to find out its Moran’s I. I examined two images in order to superficially examine how scale affected the spatial auto-correlation of LULC change.

My first Moran’s I, of the overall image, was 0.6716816. As a positive number, this indicates that there is some amount of spatial auto-correlation taking place; that is, areas of vegetation change tend to occur near one another. The code I used is below.


Moran’s I of overall image

Next, I performed the exact same analysis with a subset image of the overall image, to explore how Moran’s I changed with scale. I explored a large area surrounding a village I’m familiar with. The Moran’s I for this analysis was 0.8079745, which is higher than the overall image. This indicates that, potentially, there is stronger spatial auto-correlation at smaller scales.

Overall, I feel that this approach is a good jumping off point into further exploring how LULC changes in my area of interest are related to other processes. Ultimately, I’m curious as to whether these LULC changes can be attributed in some way to the establishment of artisanal gold mining in the area. One good control for this would be to examine LULC change between years without establishment of gold mines, to see if it follows a similar pattern to the years of change, and if it is spatially auto-correlated as in this exercise.