Tag Archives: rural

Final Project: Examining the relationship between rurality and colorectal cancer mortality in Texas counties

1.The research question that you asked.

My initial research question was about creation and comparison of rural indices for Texas. I did end up making a rural index for Texas, but instead of creating multiple indices and comparing the importance of different rural indicators, I incorporated an outcome variable, colorectal cancer (CRC) mortality, for Exercise 2 and 3. So my final research question ended up being as follows: how does the spatial pattern of CRC mortality in Texas relate to proxy measures of rurality and a constructed rural index for counties in Texas?

 

2. A description of the dataset you examined, with spatial and temporal resolution and extent.

For my project, I utilized Texas county median household income (2010 Census Data), Texas county population density (2010 Census data), and rasterized discrete data of land use for the state of Texas (2011 National Land Cover Database). Aggregated CRC mortality rates for Texas counties were obtained for the years 2011-2015 from the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program U.S. Population Data – 1969-2015.

 

3. Hypotheses: predictions of patterns and processes you looked for.

Based on my revised research question I remade before exercise 2, I expected 2 major spatial patterns for my analyses: 1) areas immediately surrounding high CRC mortality counties to have higher “rural” values in 3 rural indicator variables: household income, population density, and percent land developed, with more “urban” values as distance increases away from the counties and 2) CRC mortality counts to significantly increase as county rural index values (from the 3 indicator variables) became more rural. I chose rural indicator variables based both on true measures of where population centers in Texas are located (population density and land development) and a well-established marker of rural status (income). For spatial pattern 1 from my neighborhood analysis, I expected there to be uniform shifts in buffer means towards more “urban” values in each of the 3 indicator variables as distance increased away from the high CRC counties. I expected this effect because counties with high CRC mortality are commonly rural, so areas in counties surrounding them with lower CRC mortality may show increasingly urban indicator values. For spatial pattern 2, I expected CRC mortality to increase as county rural index values became more rural in Texas. I expected this pattern because CRC mortality has been linked to rural status and various indicator variables in previous research, though a weighted rural index has never been used for Texas cancer data.

 

4. Approaches: analysis approaches you used.

For my analyses I used the following approaches:

Exercise 1: I utilized principal component analysis (PCA) to construct a rural index for Texas counties using 3 indicator variables.

Exercise 2: I did a neighborhood analysis of high CRC mortality counties by creating multi-ring buffers around centroids of 4 Texas counties, intersected the buffers with county polygons containing rural indicator variable values, and calculated buffer means for indicator variables for each donut around each county.

Exercise 3: I completed a Poisson regression of CRC mortality counts (Y) and my constructed rural index (X) to examine the effect of rural status on CRC mortality for Texas counties.

 

5. Results: what did you produce — maps? statistical relationships? other?

For all 3 exercises, my results included maps (Texas county maps, buffer maps, standardized residual maps), statistical relationships (p-values, coefficients, etc.), and plotting of statistical relationships (biplots, influence plots, line plots). I produced the maps in Arc Pro, while all other visualizations were produced in R.

 

6.What did you learn from your results? How are these results important to science? to resource managers? + software/technical limitations

I believe exercise 1 displays the effectiveness of PCA for construction of rural indices. More deductive methods for rural classification are very much needed in rural health, and I believe this method could improve detection and prevention of rural health disparities.

In exercise 2, the 4 high CRC mortality counties did not all follow the expected rural indicator spatial “gradient” that I expected. Two of the counties exhibited increasing urban scores as distance increased away from the counties, while the other two showed the opposite pattern. I think this result could be due to the arbitrary distances I chose for the buffers around the counties and the modifiable areal unit problem of utilizing county indicator variable data instead of more spatially defined values. Also, significant regionality in Texas could exist for the indicator variables I chose, as the 4 counties were not located in the same regions of Texas. This could have affected the relationships I found in each of the counties.  For example, certain rural regions in Texas may have lower or higher household income than other rural regions of Texas due to factors such as available jobs or regional/local governmental policies. Also, there are likely other variables that are indicators of rurality that I could have included in the analysis that may have more consistent spatial patterns.

In exercise 3, the results from the Poisson regression followed statistical pattern I expected: as county CRC mortality increases, rural index scores become more rural. I believe my results show important introductory associations between CRC mortality and rurality in Texas that indicate further and deeper study into these associations should be considered. To show the exercise 3 results, I utilized a map of standardized residuals in Arc and statistical plots in R.

The technical limitations of my analyses were mainly due to extensive missing cancer data for the state of Texas. This missingness was due to data suppression by the CDC’s National Vital Statistics System in order to maintain confidentiality of cancer cases. I did not have any obvious software issues besides difficulty with the Calculate Field tool in Arc Pro, where the tool would consistently fail when using more complicated formulas for data transformations. I also had some problems when joining Excel table to attribute tables, where Arc would not show symbology for the newly-joined data until exporting to geodatabase and restarting the program.

 

7. Your learning: what did you learn about software (a) Arc-Info, (b) Modelbuilder and/or GIS programming in Python, (c) R, (d) other?

I believe I greatly improved my skills in ArcGIS this term, especially in data wrangling and cleaning to convert my data to formats that maximize the visualization. Also, I feel I became much better at using key plotting packages in R, such as ggplot2.

 

8. What did you learn about statistics, including (a) hotspot, (b) spatial autocorrelation (including correlogram, wavelet, Fourier transform/spectral analysis), (c) regression (OLS, GWR, regression trees, boosted regression trees), (d) multivariate methods (e.g., PCA), and (e) or other techniques?

I believe the major statistical and spatial skills I improved in this course include PCA, neighborhood analysis, Poisson regression, and (especially) regression diagnostics. I had not used PCA in R before this term and feel very comfortable using it going forward in my research on rural health. The many assumptions I had to follow for the Poisson regression in exercise 3 improved my ability to run diagnostics in R and create intuitive assumption plots.