Question: How are values of reported yearly Salmonella rates related to predictors found in previous OLS and GWR analysis at different temporal and spatial lags? Also, to what extent are there regional groupings in Oregon found through Principal Components Analysis (PCA)?
The data used here is the same used in all prior blog posts.
Names of Analytical Tools/Approaches Used
Both temporal and spatial cross-correlation tools were used in my analysis to visualize how values of the predictors I identified in previous analyses varied at higher/lower rates of reported Salmonella. The time series analysis was limited to 2008-2017 which was the extent of my data. Temporal cross-correlation allowed me to visualize how the values of predictors varied at different time lags of Salmonella rates and the spatial cross-correlation allowed me to visualize the variation in the predictors at different spatial lags of Salmonella rates. Results for the temporal and spatial cross-correlation were visualized with ACF plots and cluster plots respectively. Finally, PCA was used to identify noticeable regional groupings as a function of the different variables in my dataset. These results were visualized on a biplot.
Description of the Analytical Process
All cross-correlation analysis was limited to the significant predictors identified in previous regression analysis. Specifically, county Salmonella rate as a function of: county % female, county % child poverty, and county median age.
- Temporal Autocorrelation: Data were summarized by year, creating a data frame of median values for Salmonella rates and the predictors for each of the 10 years in my dataset. ACF plots were created for each of the predictors as they varied over different time lags of Salmonella
- Spatial Autocorrelation: Data were summarized by county, creating a data frame of median values for Salmonella rates and the predictors for all 36 counties in Oregon. Spatial cross-correlation was carried out using a local spatial weights matrix to create clusters of local indicators of spatial association. The clusters for each of the predictors were visualized on maps of Oregon.
- PCA: Oregon was divided into three broad regions based on population distribution and geographical barriers: 1) the Portland metro area consisting of Multnomah, Clackamas, Washington, and Columbia counties, 2) West Oregon consisting of all other counties west of the Cascade Mountain range, and 3) East Oregon consisting of all counties east of the mountain range. Data was summarized by county and every county was associated with a specific region within Oregon. PCA was carried out on the first two principle components because approximately 70% of the variation in the data was explained by these two components. The results of the PCA were visualized in a biplot.
Brief Description of Results
Temporal Cross Correlation: results show that county percent female was positively associated with Salmonella rates at time lags 1-4 and slightly negatively associated with disease rates at all other lags. Child poverty was also positively associated with Salmonella rates at lags 1-4 and were otherwise negatively associated with disease rates except at distantly negative time lags. Temporal cross correlation analysis of median age and Salmonella incidence rates yielded a similar pattern. It appears the temporal cross-correlation patterns across all three of these variables follow a somewhat sinusoidal curve.
Female:
Child Poverty:
Median Age:
Spatial Cross Correlation: results from a cross correlation analysis of county percent female and Salmonella rates show a large cluster of high county percent female and high rates of Salmonella clustered in the Northwest region of the state. Other clusters are fairly well dispersed around the state. From an analysis of child poverty and Salmonella rates there is a large cluster of high Salmonella rates and high child poverty in the Northeastern area of the state and a large cluster of high Salmonella rates but low child poverty in the Northwest. A similar pattern can be seen in the median age cross-correlation plot.
Female:
Child Poverty:
Median Age:
PCA: Principal Component Analysis showed that the counties comprising the Portland metro area were all characterized by relatively higher income compared to the rest of the state. The Eastern portion of the state can somewhat be characterized by higher median ages and higher proportions of elderly residents. The rest of the western portion of Oregon is not characterized by particularly high values of any of the other variables of interest.
Critique of Methods Used
These analyses support the findings of Exercise 2 where there was evidence to support the existence of a time trend in the rates of reported Salmonella in Oregon counties. Also, the results of the cross-correlation and principal component analysis support the findings in the GWR analysis where different predictors were positively/negatively associated with Salmonella rates depending on the county in which the data was measured. One main critique of the spatial cross-correlation analysis was that through the use of a local spatial weight matrix, only local indicators of spatial indicators of association were determined. This analysis did not include a global spatial weight matrix which could change the spatial associations seen in my results. Also, while PCA was useful in showing that different regions in Oregon were more strongly associated with certain predictors than others, there is considerable overlap between the regions. Thus, it is unknown if these results are significant.