My goal over the last few weeks has been to determine the relationship between red-tailed hawk residency and environmental variables. I realized though that before I could do so, there were some data quality issues that I needed to address. Specifically, I realized that the 0 values in my residency raster (a ratio of the number of days red-tailed hawks were observed to the number of days any other species were observed) represented locations where there weren’t any hawks. Since only the locations where hawks were observed are relevant to my analyses, I removed records with a ratio of 0.

 

In removing these 0 values from my data (which cut the data roughly in half), I realized that they probably had a significant influence on the hotspot analyses I ran in the first few weeks of the class. I re-ran the hotspot analysis, and as expected, the hotspots were much more finely articulated. I decided to see what influence other portions of the data might have on the hotspot analysis so I tried iterations with <100%(meaning a hawk was seen on every day that any bird was seen), 0< and <100%, and 0< and <50%. The latter two produce nearly identical hotspot maps so it seemed appropriate to use all data 0< and <100%.

Screen Shot 2015-05-10 at 4.48.20 PM  Screen Shot 2015-05-10 at 4.50.04 PMScreen Shot 2015-05-10 at 4.49.25 PM Screen Shot 2015-05-10 at 4.52.24 PM

Hotspot analyses from upper right, clockwise: all data, <100%, 0< and <100%, and 0%<.

 

Next I prepared my environmental variable data. For my regression model, I used 8 variables:

  • Population – value of containing census tract from US Census data
  • Average precipitation – value of cell from 2014 PRISM data
  • Minimum Temperature – ibid
  • Percent open space with 1km radius – reclassified NLCD data (Herbaceous Upland, Grasslands/Herbaceous, Planted/Cultivated, Pasture/Hay, Row Crops. Small Grains, Fallow, Urban/Recreational Grasses = 1, everything else 0) > Focal statistics mean
  • Dominant land cover in 500m radius– focal statistics majority on NLCD
  • Avg percent canopy cover in 500m radius – focal statistics mean
  • Avg percent impervious surface in 500m radius – ibid
  • Land cover diversity in 500m radius – focal statistics variety

 

I then ran the Ordinary Least Squares regression tool. My R-squared value was .214. From the report the tool produced, I concluded that the residuals were not randomly distributed.

Screen Shot 2015-05-12 at 9.40.37 PM

 

To be sure, I also ran the Spatial Autocorrelation tool on the residuals, and there is a 1% chance that the distribution could be random. I also ran a hotspot analysis on the residuals at two scales, 1,000 ft (the minimum distance band that wouldn’t produce an error) and 47,891 ft (the calculated distance band from my previous analyses). While the 1,000 ft distance band did not produce anything interpretable, the 47,891 ft distance band may point to flaws in model design. That is, the distinct locations where the model over-predicted and under-predicted may suggest other environmental variables I should include or ones I should modify/drop from my model. I haven’t figured out what these are yet though.

Screen Shot 2015-05-10 at 6.15.52 PM Screen Shot 2015-05-10 at 6.16.14 PM

Hotspot analyses on residuals at 1,000 ft (left) and 47,891 ft (right).

 

 

The Koenker (BP) Statistic indicated that my model is heterscedastic (i.e. the model is not evenly fit for high and low dependent variable values). To try to understand why, I re-ran the OLS tool on a subset of my data where the residency ratio < 5% and another subset where residency > %50. Both of these subsets totaled about 3500 records each. The R-square value for > 50% was .29 and .14 for < 5%. The differences in the histograms are also telling.

 

Screen Shot 2015-05-12 at 9.15.20 PM
< 5%, population, avg precip, min temperature, percent open space/km, dominant land cover
Screen Shot 2015-05-12 at 9.49.40 PM
> 50%, population, avg precip, min temperature, percent open space/km, dominant land cover
Screen Shot 2015-05-12 at 9.16.10 PM
< 5%, avg percent canopy per 500m, avg percent impervious surface per 500m, land cover diversity

 

 

 

 

Screen Shot 2015-05-12 at 9.48.43 PM
> 50%, avg percent canopy per 500m, avg percent impervious surface per 500m, land cover diversity

 

From these plots, I will try to develop other environmental variables that may be better predictors of residency.