- The research question that you asked.
My original question focused solely on the spatial patterns of Salmonella cases in Oregon. However, after realizing my dataset contained time series information I decided to pivot my analysis somewhat and my research question became the following: How are the spatial and temporal patterns of reported Salmonella infections associated with sociodemographic factors in Oregon counties between 2008-2017? While the eventual discovery of causal drivers of disease will be important for disease prevention efforts, the focus here is what other factors are associated with disease at the county level. Through this analysis I hoped to identify sub-populations who are at a higher risk of becoming infected with Salmonella in Oregon.
- A description of the dataset you examined, with spatial and temporal resolution and extent.
The data used in my analysis came from a variety of sources. Data regarding the number of Salmonella cases per county came from the Oregon Public Health Epidemiology User System (ORPHEUS). These data contained individual level information, but it was de-identified due to privacy concerns. Information concerning age, sex, disease onset, and county of residence was available with these data. Population estimates came from Portland State University’s Population Research Center which collected yearly county level population estimates. Other sociodemographic information came from the American Community Survey, a yearly survey which assesses various county-level characteristics like poverty, high school graduation rates, percentage of foreign-born residents, etc. All data used in this analysis was from years 2008-2017.
- Hypotheses: predictions of patterns and processes you looked for.
Hypothesis 1: I expect counties with higher proportions of females to have higher levels of Salmonella infections due to findings in prior literature concluding that females have a higher incidence rate of Salmonella infections compared to males. The underlying causal mechanism for this pattern is unknown, but given the results of other studies of foodborne illness I expect Oregon’s population to be similar. As a result I would expect the percentage of a county that is female to be significantly associated with Salmonella incidence.
Hypothesis 2: I expect counties with higher proportions of infants and newborns to also have higher levels of Salmonella infections compared to counties with a higher proportion of older age groups. Other findings from the Oregon Health Authority indicate that young children are at high risk for developing foodborne illnesses. The reasoning here is that the immune systems of young children are not fully developed and less likely to effectively fight off infection resulting in a disproportionately high disease incidence in this group. I expect to see that the percentage of a county that belongs to the age group 0-4 years old will be significantly associated with Salmonella incidence.
Hypothesis 3: I expect to find a significant time trend of Salmonella incidence over time in Oregon counties. Disease incidence varies from year to year and can sometimes be volatile in scarcely populated counties or counties experiencing a major outbreak. Because of this natural variance over time I expect to see a significant time trend of reported cases in Oregon. Some counties will show a positive trend over time, others will show decreasing disease rates, and others will be relatively level. However, I expect time will be a significant factor associated with disease incidence.
- Approaches: analysis approaches you used.
I used auto- and cross-correlation, longitudinal trend analysis, hotspot analysis, geographically weighted regression, and Principal Component Analysis. All of these analytical approaches were performed in the R statistical software with the aid of various packages which allowed me to perform all of my spatial analysis.
- Results: what did you produce — maps? statistical relationships? other?
From these analyses I was able to create maps, plots of auto- and cross-correlation over time, and a bivariate plot of principle components which were associated with different regions of Oregon. Also, there was considerable numerical output for my time trend regression and GWR which provided evidence of statistically significant associations between the variables in my model and the outcome. These outputs can be seen in my second and third blog posts.
Hypothesis 1: Findings from my GWR support my first hypothesis that the proportion of a county which is female is significantly associated with Salmonella incidence. While coefficient estimates vary by county, most often there is a positive association between the proportion of a county that is female and disease incidence. Cross-correlation analysis found there were large areas of Oregon, particularly in the Western part of the state, where there was a positive association between the two variables was positive and some areas clustered in the Eastern part of the state where this pattern was reversed.
Hypothesis 2: GWR findings and cross correlation analyses did not support my second hypothesis. The proportion of a county which is aged 0-4 is not significantly associated with county Salmonella incidence. However, it was found that county percentage of child poverty was significantly associated with disease incidence. Perhaps the reason for this finding is that the proportion of children alone may not be significantly associated with disease, but poverty is. Thus children in poverty partly explain the association seen for children as a whole.
Hypothesis 3: Longitudinal analysis and regression supported my third hypothesis that a significant time trend existed for Oregon’s Salmonella incidence. As expected, there was some variance in county disease rates over time due to a large host of factors. Larger populations are likely to have more cases of disease reported due simply to the fact that there are more people to possibly infect. Overall, there is an increasing trend in Oregon’s Salmonella incidence. There may be a higher amount of disease occurring in Oregon over time. Another possible explanation for some of the increase could be due to improvements in Oregon’s disease monitoring abilities and infrastructure over time.
- What did you learn from your results? How are these results important to science? to resource managers?
This analysis found statistically significant associations between some population characteristics (like median age, childhood poverty levels, and time) and the reported rate of Salmonella. These results are important to researchers because the focus to find causal exposures for Salmonella can be narrowed to groups or areas more associated with the disease. These results can inform future disease prevention research. Resource managers would also be interested in this analysis as the counties identified here as being more associated with disease than others can be targeted for disease surveillance.
- Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?
My analysis was completed solely in R statistical software. I learned a lot over this course about different spatial analysis packages in order to make my R more robust. Some spatial packages have a steep learning curve with a fair degree of technical knowledge to appropriately implement in your analysis. I would say I learned a lot about trouble shooting from discussion threads and GitHub posts.
- What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), (d) multivariate methods (e.g., PCA), and (e) or other techniques?
I learned how hotspot analysis is performed and what a Getis-Ord Gi* score is and how they are compared to yield hot and cold spots. As for autocorrelation I learned how to perform cross-correlation analysis through both space and time, as well as how to cluster different values into a map which is easier to interpret. During my geographically weighted regression analysis I learned how to transform my data and make it compatible with the regression technique as well as figure out which variables to put in my GWR based on an ordinary least squares regression. Prior to this project I had never heard of PCA, so here I learned introductory skills about how to apply this analytical technique to my dataset.