A description of the research question that you are exploring
Of the 31 pathogens known to cause foodborne illness, Salmonella is estimated to contribute to the second highest number of illnesses, the most hospitalizations, and the highest number of deaths in the US when compared to other domestically acquired foodborne illnesses1. Salmonellosis is the bacterial illness caused by Salmonella infection. It is estimated there are approximately 1.2 million cases of salmonellosis and around 450 deaths every year in the US due to Salmonella1. Over time there has been marked variability in the number of reported cases per year. Salmonellosis is a mandatory reportable illness in Oregon and available information indicates that incidence rates of this disease have been stable since the new millennium2. The objective of this study is to perform spatial analysis of lab-confirmed Salmonella in Oregon counties for the years 2008-2017 for which county level data are available and determine whether some counties have a higher risk of Salmonella infection compared to others. I also wish to explore the socioeconomic factors associated with high incidence rate counties. My research question that I wish to explore is:How are spatial patterns of Salmonella related to spatial patterns of socioeconomic factors? Certain socioeconomic patterns such as lower levels of education and income may increase rates of Salmonella in these populations as a result of improperly preparing/cooking foods, less strict sanitation practices, and/or higher rates of eating high risk foods.
A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent
The Oregon Health Authority has created a database called the Oregon Public Health Epidemiology User System (ORPHEUS) as a repository for relevant exposure and geospatial data related to disease cases reported to public health departments all across the state. This database has been maintained by the state since 1989 and includes information regarding various diseases. The dataset I will be using is a collection of every single reported non-typhoidal Salmonella case within Oregon from 2008-2017. The distinction between typhoidal Salmonella and non-typhoidal is that the typhoidal variety of Salmonella causes typhoid fever while non-typhoidal Salmonella causes salmonellosis (a common gastrointestinal disease and a type of “food poisoning” as it is usually referred to). The spatial resolution of this data has been obscured to the county level to protect personal privacy and confidentiality. I will also be using data from the American Community Survey and the CDC’s Social Vulnerability Index. These datasets contain social vulnerability related variables for Oregon at the county level. In the case of the American Community Survey, data is available for the years 2009-2017 and the Social Vulnerability Index has data available for 2014 and 2016. Yearly county population estimates will also be used from Portland State University’s Population Research Center. Because of the high amounts of available data I will choose to start my exploratory analysis for Oregon in 2014 as all data is reported for that year.
Hypotheses: predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.
I expect counties with younger populations (higher proportions of infants and newborns) as well as counties with higher proportions of females to have higher adjusted incidences of Salmonella. Prior surveillance suggests that children under the age of 5 are at the highest risk for Salmonella infection likely due to their developing immune system and how they interact with their environment. Specifically, many young children do not/are unable to wash their hands prior to touching their mouths. Females are also known to have a higher risk of Salmonella infection, however the mechanism behind this is relatively unknown with some explanations suggesting that it is due to that females are more likely to have more interactions with young children. I also expect counties with lower Social Vulnerability scores to have higher rates of Salmonella infections. Higher rates of poverty and lower amounts of education are often associated with more negative health outcomes.
Approaches: describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.
I would like to calculate age and sex adjusted rates of disease for each county in Oregon. I am also interested in undertaking cluster analysis and calculate spatial autocorrelation among Oregon counties over time. Finally, I would like to perform a regression of county disease incidence rates by the different socio-economic factors found in the American Community Survey and Social Vulnerability Index. I would be interested in learning about spatial Poisson regression to assess which variables are significantly associated with the presence of disease. I would also be interested in learning about hotspot analysis to evaluate if there are areas of Oregon with significantly higher disease rates. Ideally, all of my analyses will be performed in R and ArcGIS.
Expected outcome: what do you want to produce — maps? statistical relationships? other?
I would like to produce choropleth maps of adjusted Salmonella infection rates as well as for hotspot analysis. I want to produce regression models to describe how incidence rates of Salmonella vary across different socioeconomic indicators. I also want to create graphs to describe spatial autocorrelation patterns as well as to show disease rates over time.
Significance. How is your spatial problem important to science? to resource managers?
This analysis will be helpful to identify county populations which are at higher risk for Salmonella infections. The inclusion of social vulnerability variables will be useful for state/local policy makers. Reforms can be proposed or further studied to assess how addressing the needs of particularly vulnerable populations will affect the incidence of Salmonella. This research will be beneficial for further public health research as trends found here may also hold true for other foodborne illness. The aim of this research is to benefit the health of communities in Oregon by highlighting the association between social vulnerability and the risk of foodborne illness.
Your level of preparation: how much experience do you have with (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) image processing, (e) other relevant software
I have no experience with Arc-Info, programming in Python, and image processing. I have some limited experience within Modelbuilder. I am very comfortable performing statistical analyses within R and have some experience using the software to create maps using various packages.
References
- Estimates of Foodborne Illness in the United States. Centers for Disease Control and Prevention. https://www.cdc.gov/foodborneburden/2011-foodborne-estimates.html#modalIdString_CDCTable_0. Published July 15, 2016. Accessed July 31, 2018.
- Oregon Health Authority. Salmonellosis 2016 Report. Oregon Public Health Division. Available at: https://www.oregon.gov/OHA/PH/DISEASESCONDITIONS/COMMUNICABLEDISEASE/DISEASESURVEILLANCEDATA/ANNUALREPORTS/Documents/2016/2016-Salmon.pdf. Accessed July 31, 2018.
Seth, very good start. Here are some things to work on: 1) Research question. You have not articulated a research question. Try rephrasing as “How are spatial patterns of salmonella (A) related to spatial patterns of socioeconomic factors (B) as a result of mechanism C (why would these be related? how do socioeconomic factors affect food consumption patterns and susceptibility)?” 2). Data. You have lots of data. Please select one year to start with. 4) Analyses. For Ex 1 I suggest you try to create county-level (or finer) maps of age distributions and proportions of females, and quantify the spatial pattern of these measures: are they spatially autocorrelated? Clustered?. For Ex. 1. you could also do a hotspot analysis, and you could calculate temporal autocorrelation to ask whether disease rates are consistently high or low in certain locations. For Ex 2., I suggest you use cross-correlation and/or GWR to look at how salmonella is related to social vulnerability.
Hi Dr. Jones I went back through my post and made the recommended changes you explained here. I explicitly created a research question with an included mechanism. I also made sure to say that I am initially looking at one year (2014). Thank you very much for the pointers!