The map below displays my study area and the concentrations of naphthalene at public schools in the city of Eugene. These concentrations represent census tract centroid estimates derived from the US EPA’s National Air Toxics Assessment (NATA) for the year 2005. Also depicted on the map are sources of naphthalene emissions, such as major and minor arteriole roads as well as federally regulated industrial emissions sites. I used these estimates to develop a land use regression (LUR) model to predict naphthalene air concentrations at two air monitoring locations operated by the Lane Regional Air Protection Agency (LRAPA [depicted in the map as purple triangles]), using Ordinary Least Squares regression. Originally, my predicted concentrations from my LUR model did not account for temporal variation in concentrations nor did it account for spatial autocorrelation of air pollutants. Therefore, my goal in this class was to figure out a method that may be able to account for temporal and spatial dependency in my LUR model. The following text summarizes my progress in this effort towards developing a spatiotemporal LUR model.

GEO580_StudyArea

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The graph below clearly portrays the wide variation in naphthalene air concentrations by month, as measured by LRAPA at the Bethel and Amazon Park air monitoring sites. The flat lines represent the naphthalene air concentrations averaged over the year at the two air monitoring sites. As one approach in this project to account for temporality, I calculated the average concentration between the two sites, then I chose to calculate the ratio of a given months concentration over the annualized average concentration for each site. This ratio was then applied to the NATA estimates for the schools in order to derive month-specific concentrations. These month specific concentrations were then annualized by calculating the average over all months. These new annualized concentrations are displayed in the next graph below.

seasons

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

As you can see, the temporally adjusted annual concentrations for the school sites shifted all the values upward in concentration. These new “seasonalized” estimates were then used as data inputs to develop what I denote as a “Temporal” LUR.

seas.adj

 

I next ran the LUR model separately in R using the temporal data and the non-temporal data. I then developed a spatial weights matrix in R, by Eugene zip codes, to use as my spatial constraint in a spatial conditional autoregressive (CAR) model. I denote as the “Spatial Only” LUR model as the model using the non-temporal data, and I denote the “Spatiotemporal” LUR model using the temporal data. Each model was run separately to predict naphthalene concentrations at the LRAPA air monitoring locations. The color-coded table below compares the respective models by their respective percent difference between the predicted concentrations and the LRAPA observed concentrations. In terms of percent difference, and on an individual site basis, the Temporal Only model was improved over the Non-Spatial/Non-Temporal model, while the Spatial Only model was also improved compared to the Non-Spatial/Non-Temporal model. The Spatiotemporal model was improved for both sites compared to the Spatial Only model for both sites. While the Spatiotemporal model was improved over the Temporal Only model for the Bethel site, but not for the Amazon site.

mod.compare

 

 

 

 

 

 

 

 

 

I next averaged the percent differences over the two sites for each model in order to derive an aggregate percent difference measure. The graph below clearly depicts the superiority of the Spatiotemporal and the Temporal model over the other two models. On average, the Spatiotemporal model performed the best. While all models tended to under-predict concentrations, on average, these results suggest that factoring in more than just GIS variables in an Ordinary Least Squares regression is desirable when developing a LUR model to predict air concentrations.

avg.compare

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The above results are intriguing and provide guidance for next steps. I decided to move forward and fit the temporal data in a GAM model in R. The graph below displays the model fit with standard errors around the GAM regression line. The seasonal variation in naphthalene concentrations, not surprisingly, is characterized by a non-linear relationship. This GAM model could be used to develop a more sophisticated spatiotemporal LUR model that is capable of predicting monthly naphthalene air concentrations. Prediction of monthly concentrations would be useful in environmental health epidemiology studies that aim to examine temporal relationships between high air pollution and health effects.

gam

 

If you read my posts from last week you would have noticed that I had a mini-breakthrough with regard to coming up with a method to seasonally adjust my input air pollution concentrations for my LUR model. This week I proceeded with using these seasonally adjusted annual concentrations in my LUR model. I predicted naphthalene concentrations at two sites in Eugene and compared the seasonally adjusted model with the no seasonal adjustment model. The results are below (and exciting!). The table demonstrates how, for each monitoring location, my predictive model’s accuracy is improved markedly with the seasonal adjustments (e.g. 14.6% vs. 22.6% and 1.1% vs. 10.3%). This encouraging and provides preliminary evidence that modeling temporal variation will improve annual estimates of air pollution exposures.

Monitoring Site

LRAPA Observed

OLS + Seasonally Adjusted Estimated Concentration

(% Difference)

OLS Without Seasonal Adjusted Estimated concentration

(% Difference)

Bethel Site

0.0789

0.0674

(-14.6%)

0.06112

(-22.6%)

Amazon Site

0.0533

0.0527

(-1.1%)

0.0478

(-10.3%)

I proceeded with performing the seasonal adjustments I proposed at the end of my last blog post (see “Temporal Data” post). Briefly, I used the ratios of mean monthly naphthalene concentrations over the annual naphthalene concentrations for all air monitoring data (i.e. both sites combined). These ratios were then used as adjustment factors to simulate monthly observations for the NATA data that I have. The monthly simulated data was then averaged over a year-long period to obtain the “seasonally adjusted” annual estimates.

The graph below displays the seasonally adjusted versus the unadjusted naphthalene annual concentrations. The adjustment factors caused the annual estimates to increase slightly. These seasonal estimates will now be used as my simulated input data for estimation of annual concentrations for my LUR. model.

Seasonally Adjusted

 

I have been busy trying to figure out how exactly to incorporate seasonal variation of naphthalene ambient air concentrations into my Land Use Regression (LUR) model. To start off, I obtained time series data of naphthalene concentrations from the Lane Regional Air Protection Agency (LRAPA), just to see if there is indeed a significant amount of variation temporally. The graph below portrays the variation of mean monthly naphthalene concentrations for two air monitoring sites in Eugene (Amazon Park and Bethel) between April 2010 through April 2011.

Presentation1
Data: LRAPA

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

While this graph above is somewhat informative, it does not quite describe the relationship in full. The graph below presents the data disaggregated by month without averaging over the months, which indicates a great deal of variation within months and between months. This suggests a complex temporal variation that would likely require a non-linear type model. I therefore decided to use all of the data, rather than aggregated means, to better characterize the temporal relationship (i.e. still retaining the within month variations).

each.obs

Next I modeled the change in monthly log transformed naphthalene concentrations (since naphthalene was clearly not normally distributed and the transformation lead to a log-normal distribution) using month as a predictor and air monitoring location as an additional predictor. The seasonal predictor was fit with a smoothing function using generalized additive model in order to allow for the obvious non-linear trend between month and naphthalene concentrations.

 

 

 

 

 

The following graph displays the relationship between month and log naphthalene concentration using the GAM function in R to allow for a non-linear trend. The use of a smooth function proved to be a better fit given the obvious visual trend and that seasonal adjustment was only significantly predictive once treated as a non-linear trend (the linear model was not significant for season [again not surprisingly!]). This work confirmed the notion of a complex relationship between pollutant concentration and month, even after adjusting for the air monitoring location.

As far as next steps is concerned, I was thinking of using this temporal GAM modelling approach for my LUR model. I would use the historical dataset from LRAPA to simulate temporal observations (i.e. monthly)  for my NATA data set. If I’m able to simulate this dataset, it would enable me to include time in my LUR model (using a GAM for month), and therefore enable me to see if adding temporal relationships will improve my predictive LUR model. I view this as a novel way of incorporating temporal variation as an explanatory variable. One approach could be taking the ratios of monthly to annual observed LRAPA naphthalene concentrations, and multiplying these monthly ratio with the NATA annual estimates. The simulated data would then be inputs for performing the LUR GAM model.

GAM

For my spatial problem I will examine the role of spatial autocorrelation and seasonality in developing a land use regression (LUR) model. In particular I am interested in optimizing the incorporation of spatial autocorrelation and seasonality for prediction of air pollution in the City of Eugene.

For those unfamiliar with a LUR, it essentially combines GIS variables that are predictive of air pollution concentrations along with actual air pollution measurements in order to predict air pollution at unmonitored locations using ordinary least squares (OLS) regression. The problem with a typical LUR model is that they don’t account for spatial autocorrelation. The value of accounting for spatial autocorrelation is due to the fact that spatially based data, such as air pollution, is typically spatially correlated.

This past quarter in my GEO580 course I developed a LUR that did account for spatial autocorrelation by modeling the covariance of air pollutant concentrations of adjacent zip code boundaries, using a spatial CAR model. For this class I wish to develop this idea even further by using multiple techniques, namely geographically weighted regression (GWR), a spatial CAR model, and OLS to compare the model results to actual air pollution measurements. This work will require me to use both ArcGIS spatial analyst toolbox and the R statistical software.

As mentioned above, I am interested in including seasonal trends in air pollutant variation in order to see if inclusion of seasonal variation is capable of improving model estimates. To do this I propose to incorporate seasonal ratios to annual ratios of air pollutant concentrations.

To keep this work focused I will use data on just one air pollutant, as opposed to last quarter wherein I developed a LUR for seven different pollutants. By focusing on just one pollutant I hope to keep the work efficient and effective toward achieving my goals in this class. Ideally, this work will help to inform my dissertation proposal work.