My spatial problem deals mainly with determining what scale best fits both birth registry point data in Texas (4.7 million births from 1996-2002) and available spatial resolution of MODIS aerosol optical thickness data. A smaller cell size will more accurately determine ambient particulate matter exposure levels, but may leave too many cells with 0 births at different spatial scales. Increases in the cell size will allow a better coverage of the state, but may limit the spatial statistical relationships of low birth weight rates and accuracy of ambient air pollution exposures. A model will need to be created to combine ground-based air monitor exposure levels and satellite data to accurately determine rural particulate matter exposures.
A temporal problem deals with creating a model that will determine ambient air pollution exposure levels in each cell during different known susceptibility windows during all 4.7 million pregnancies. An analysis will need to be done to determine the variability of particulate matter levels on multiple time scales and incorporate the best fit with pregnancy susceptibility windows.
A combination of the spatial analysis and temporal analysis will incorporate both time lags and spatial clustering. This aspect of the project should be relatively straightforward. The goal of this section will aim to determine whether 1) LBW are clustered in space and time, or 2) whether individual emitters (using EPA TRI data set) are spatially and temporally correlated with LBW.
Below are some examples of different cell size and temporal scales.
1. 2008-2009 LBW hot spot analysis based on Texas census tracts
2. A hot spot analysis using .1x.1 degree (roughly 10km x 10km) grids of 2008-2009 of LBW rates
3. A hot spot analysis using 1×1 degree (roughly 100km x 100km) grids of 1996-2009 LBW rates.
Given the nature of the problem, as you stated above, it will be very challenging to match up the spatial birth data scale with the spatial MODIS particulate matter data scale. Have you thought about using a spatial multilevel modeling approach, wherein adjacent (or even distance-based) cells can inform the cells with zero births. Spatial smoothing, or even non-spatial smoothing, should be useful in this regard. Such an approach could avoid the whole question of “what is the proper scale?” all together.
Another potential approach is to resolve the .1x.1 degree data to reflect census-tract level estimates. It may be possible to average over a census tract the grid-specific pollutant concentrations. For instance, if a census tract falls entirely within a grid, then the tract is assigned that average value. For cases in which multiple grid cells overlap with a single census tract, it could be possible to assign a tract-level concentration based upon proportion of relative coverage of the grid over the tract, and take an average of those grids in order to derive an assigned tract-level concentration. This approach, also, could avoid the “what is the proper scale?” question.
First, a question: You mentioned in your presentation that urban areas have lower frequencies of LBW. I’m not sure if I misheard, but it seems to go against the notion that urban areas have the highest levels of air pollution.
Eric has some great ideas above, especially in terms of using the available data to interpolate a surface and make the data set more smooth. You’d have several options to choose from including an inverse distance weight as well as a more intensive kriging method.
I am not very familiar with the spatial analysis tools available, but from what I’ve seen in your presentation and others, you could possibly use hotspot analysis again for your temporal question if you decided to use specific cell sizes. I was thinking you could create 9-10 month moving average hotspot windows starting with the 1996 dataset, which you could then compare with another tool within ArcGIS (which I’m sure exists) that could give you the rates of changes between locations as well as trends over the 6 years. Let me know if this makes sense. Good luck!
I have a few questions first.
Do you have air pollution data at the same temporal scale as the birth data? ie is it for 1996-2002? If so, I also think it would be neat to do a time series, just to see how/if the hotspots are moving around the state.
I also have the same confusion as to the fact that you seem to see more LBW outside of urban areas. Does this have to do with/ can you see a correlation to any type of industry? Say cattle ranches? Are you able to look at the source of the air pollution?
Have you tried any sort of map algebra, mosaic, or overlay process to see if you can combine layers and get a “compromise” spatial scale? I think this is similar/ another way to do what Eric suggested in his second paragraph.
You could look into doing a Principle Components Analysis on the time series data to determine the significant temporal clusters of LBW and then use those time periods in your analysis.
I think it will be important to match up the spatial scales for the LBW and birth data. I second Eric’s suggestion of resolving the .1 degree x .1 degree data to reflect the scale of census track. You could look at average value per census track as well as minimum/maximum values.
If you wanted to determine the appropriate spatial scale of analysis for the LBW data, you could try using the Incremental Spatial Autocorrelation tool (Spatial Statistics Tools/Analyzing Patterns) to calculate the significant distances at which clustering is most pronounced.