Recap:
I was interested in two questions:
1. Is there spatial pattern to the observed error?
2. Does the scale of prediction increase or decrease error?
I started noticing some of my points with error contained samples of gravel. The focus of my study was on soft bottom sediment and the regional maps I am using for prediction have areas of rock and gravel masked out. There were only eight records in 218 samples that contained gravel. Finally, as gravel skews grain size measurements to the right, I decided these eight samples were outliers and removed them from the analysis.
The species I decided to focus on is a marine worm that had a strong response to higher values of grain size and was found at Depths greater than 55 meters. Due to its strong preference to grain size and depth, I started created plots to determine if error was occurring within a certain range of these two variables.
Most of the error was occurring at depth around a grain size of 4. Coloring the plots by sampling site provided some useful feedback.
This graph depicts records of species absence on the left and species presence on the right. While the species preferred deep and high grain size, areas within this range where the species was absent was predominantly from one site, NSAF. This site also happened to be the most southern site in the region. Also, the shallowest region where the species was found to be present was in the next site just north of NSAF, off the coast of Eureka, CA. So, I wanted to look at the data again, isolating these two most southern sites from the rest.
In this graph, the two most southern sites are on the bottom graph while the remaining sites are on the top graph. The species response to grain size appears shifted to higher values and there appears to be more records of absence at depth. I different response to physical features in the southern range could be due to a number of factors, such as a more narrow shelf (less opportunity for larval dispersal) and a different characteristic to the silt and sand particles as the rivers along the southern region drain the Southern Cascade Mountains as opposed to the Coast Range further north.
Pooling all this information together, I made several changes to the model. I simplified the model by removing the explanatory variables: Organic Carbon, Nitrogen, and Percent Silt/Sand. I added a latitude variable and categorized it into north and south. Due to the simplification of the model, I was able to add more categories to both the depth and grain size variable to capture more of the relationships between them and the species response.
The final model.
Model performance improved.
Original Model: 11.0% error
Revised Model: 6.9% error
Original Model on Left, Revised Model on Right:
I also calculated a difference between the two maps. On the left is a map depicting the change in habitat suitability. On the right is a map of regions where there was a shift in prediction (i.e. absent to present).
Next step: Does the scale of prediction increase or decrease error?