I have been busy trying to figure out how exactly to incorporate seasonal variation of naphthalene ambient air concentrations into my Land Use Regression (LUR) model. To start off, I obtained time series data of naphthalene concentrations from the Lane Regional Air Protection Agency (LRAPA), just to see if there is indeed a significant amount of variation temporally. The graph below portrays the variation of mean monthly naphthalene concentrations for two air monitoring sites in Eugene (Amazon Park and Bethel) between April 2010 through April 2011.
While this graph above is somewhat informative, it does not quite describe the relationship in full. The graph below presents the data disaggregated by month without averaging over the months, which indicates a great deal of variation within months and between months. This suggests a complex temporal variation that would likely require a non-linear type model. I therefore decided to use all of the data, rather than aggregated means, to better characterize the temporal relationship (i.e. still retaining the within month variations).
Next I modeled the change in monthly log transformed naphthalene concentrations (since naphthalene was clearly not normally distributed and the transformation lead to a log-normal distribution) using month as a predictor and air monitoring location as an additional predictor. The seasonal predictor was fit with a smoothing function using generalized additive model in order to allow for the obvious non-linear trend between month and naphthalene concentrations.
The following graph displays the relationship between month and log naphthalene concentration using the GAM function in R to allow for a non-linear trend. The use of a smooth function proved to be a better fit given the obvious visual trend and that seasonal adjustment was only significantly predictive once treated as a non-linear trend (the linear model was not significant for season [again not surprisingly!]). This work confirmed the notion of a complex relationship between pollutant concentration and month, even after adjusting for the air monitoring location.
As far as next steps is concerned, I was thinking of using this temporal GAM modelling approach for my LUR model. I would use the historical dataset from LRAPA to simulate temporal observations (i.e. monthly) for my NATA data set. If I’m able to simulate this dataset, it would enable me to include time in my LUR model (using a GAM for month), and therefore enable me to see if adding temporal relationships will improve my predictive LUR model. I view this as a novel way of incorporating temporal variation as an explanatory variable. One approach could be taking the ratios of monthly to annual observed LRAPA naphthalene concentrations, and multiplying these monthly ratio with the NATA annual estimates. The simulated data would then be inputs for performing the LUR GAM model.