The map below displays my study area and the concentrations of naphthalene at public schools in the city of Eugene. These concentrations represent census tract centroid estimates derived from the US EPA’s National Air Toxics Assessment (NATA) for the year 2005. Also depicted on the map are sources of naphthalene emissions, such as major and minor arteriole roads as well as federally regulated industrial emissions sites. I used these estimates to develop a land use regression (LUR) model to predict naphthalene air concentrations at two air monitoring locations operated by the Lane Regional Air Protection Agency (LRAPA [depicted in the map as purple triangles]), using Ordinary Least Squares regression. Originally, my predicted concentrations from my LUR model did not account for temporal variation in concentrations nor did it account for spatial autocorrelation of air pollutants. Therefore, my goal in this class was to figure out a method that may be able to account for temporal and spatial dependency in my LUR model. The following text summarizes my progress in this effort towards developing a spatiotemporal LUR model.
The graph below clearly portrays the wide variation in naphthalene air concentrations by month, as measured by LRAPA at the Bethel and Amazon Park air monitoring sites. The flat lines represent the naphthalene air concentrations averaged over the year at the two air monitoring sites. As one approach in this project to account for temporality, I calculated the average concentration between the two sites, then I chose to calculate the ratio of a given months concentration over the annualized average concentration for each site. This ratio was then applied to the NATA estimates for the schools in order to derive month-specific concentrations. These month specific concentrations were then annualized by calculating the average over all months. These new annualized concentrations are displayed in the next graph below.
As you can see, the temporally adjusted annual concentrations for the school sites shifted all the values upward in concentration. These new “seasonalized” estimates were then used as data inputs to develop what I denote as a “Temporal” LUR.
I next ran the LUR model separately in R using the temporal data and the non-temporal data. I then developed a spatial weights matrix in R, by Eugene zip codes, to use as my spatial constraint in a spatial conditional autoregressive (CAR) model. I denote as the “Spatial Only” LUR model as the model using the non-temporal data, and I denote the “Spatiotemporal” LUR model using the temporal data. Each model was run separately to predict naphthalene concentrations at the LRAPA air monitoring locations. The color-coded table below compares the respective models by their respective percent difference between the predicted concentrations and the LRAPA observed concentrations. In terms of percent difference, and on an individual site basis, the Temporal Only model was improved over the Non-Spatial/Non-Temporal model, while the Spatial Only model was also improved compared to the Non-Spatial/Non-Temporal model. The Spatiotemporal model was improved for both sites compared to the Spatial Only model for both sites. While the Spatiotemporal model was improved over the Temporal Only model for the Bethel site, but not for the Amazon site.
I next averaged the percent differences over the two sites for each model in order to derive an aggregate percent difference measure. The graph below clearly depicts the superiority of the Spatiotemporal and the Temporal model over the other two models. On average, the Spatiotemporal model performed the best. While all models tended to under-predict concentrations, on average, these results suggest that factoring in more than just GIS variables in an Ordinary Least Squares regression is desirable when developing a LUR model to predict air concentrations.
The above results are intriguing and provide guidance for next steps. I decided to move forward and fit the temporal data in a GAM model in R. The graph below displays the model fit with standard errors around the GAM regression line. The seasonal variation in naphthalene concentrations, not surprisingly, is characterized by a non-linear relationship. This GAM model could be used to develop a more sophisticated spatiotemporal LUR model that is capable of predicting monthly naphthalene air concentrations. Prediction of monthly concentrations would be useful in environmental health epidemiology studies that aim to examine temporal relationships between high air pollution and health effects.