1 Research Question
Counties reacted differently towards the Great Recession (officially started from Dec.2007 to June 2009). Economic resilience is defined as the regional capacity “to absorb and resist shocks as well as to recover from them” (Han and Goetz, 2015). There are two dimensions of economic resilience, resistance and recovery. This research focuses on resistance.
The research question is what factors are associated with the resistance to the 2008 recession at the county level in the US? The variable drop developed Han and Goetz (2015) is used to measure the resistance (actually vulnerability), which is the amount of impulse that a county experiences from a shock (the percentage of deviation of the actual employment from the expected employment during the Great Recession.).
Figure 1 Regional economic change from a major shock and concepts of drop and rebound
Rising income inequality is considered as one structural cause of the crisis (Raghuram, 2012). A rising trend in income inequality is observed in Figure 2. Therefore prerecession rising income inequality is hypothesized to increase drop (reciprocal of resistance).
Figure 2 Income inequality in the U.S., 1967-2014 Source: Data from U.S. Census Bureau 2014a, 2014b.
There are two values for four of the income inequality measures and income distribution indicator at year 2013. Because a portion of the 2014 CPS ASEC, about 30,000 addresses of the 98,000 addresses, received redesigned questions for income and health insurance coverage. The value based on this portion denoted 2013a. While the remaining 68,000 addresses received the income questions similar to questions used in the 2013 CPS ASEC. This portion is labeled 2013b (U.S. Census Bureau,2016a, 2016b).
Spatial autocorrelation is hypothesized to act in the relationship. The effect of income inequality or drop may not be limited within a region but attenuate with distance. Drop in a county might be affected by its own characteristics as well as the surrounding counties (spatial lag and spatial error model).
2 Description of the dataset:
Income inequality: the Gini coefficient
Income distribution: poverty rate and the share of aggregate income held by households earning $200,000 or more
Control variables: Population growth rate from 2001-2005, % Black or African & American (2000), % Hispanic or Latino (2000)
Capital Stock variables:
Human Capital: % population with Bachelor’s degree or higher, age group (20-29, 30-49), female civilian labor force participation rate (2000)
Natural Capital: natural amenity scale (1999)
Social Capital: social capital index (2005)
Built Capital(2000): median housing value (2000)
Financial Capital (2000): share of dividends, interest and rent(2000)
Economic structure:
Employment share of 20 two-digit NAICS industries (manufacturing, construction, etc. other services (except public administration) omitted)
Table 1 Summary Statistics | |||||||
Variables | Obs | Mean | Std. Dev. | Min | Max | Description | Data source |
Drop, rebound and resilience | |||||||
drop | 2,839 | 0.186 | 0.116 | -0.236 | 0.895 | Wu estimates based on 2003-2014 BLS QCEW | |
Income Distribution | |||||||
Gini coefficient | 3,057 | 0.434 | 0.037 | 0.333 | 0.605 | Census 2000 P052 | |
Poverty rate | 3,055 | 0.142 | 0.065 | 0.000 | 0.569 | Census 2000 P087 | |
% Aggregate income held by HH earning 200K or more | 3,141 | 0.091 | 0.050 | 0.000 | 0.456 | Census 2000 P054 | |
Control Variables | |||||||
Population growth rate 2001 -2005 | 3,055 | 0.020 | 0.056 | -0.203 | 0.428 | BEA 2001-2005 CA5N | |
%Black or African American | 3,055 | 0.087 | 0.145 | 0.000 | 0.865 | Census 2000 SF1 QT-P3 | |
%Hispanic or Latino | 3,055 | 0.063 | 0.121 | 0.001 | 0.975 | ||
Capital stocks | |||||||
% persons with Bachelor’s degree or higher | 3,055 | 0.164 | 0.076 | 0.049 | 0.605 | Human Capital | Census 2000 SF3 P037 |
%Total: 20 to 29 years | 3,055 | 0.118 | 0.033 | 0.030 | 0.346 | Census 2000 P012 | |
%Total: 30 to 49 years | 3,055 | 0.288 | 0.026 | 0.158 | 0.420 | ||
%Female Civilian Labor Force Participation | 3,055 | 0.547 | 0.065 | 0.266 | 0.809 | Census 2000 SF3 QT-P24 | |
Natural amenity scale | 3,055 | 0.056 | 2.296 | -6.400 | 11.170 | Natural capital | 1999 USDA- ERS Natural Amenity Index |
Social capital index 2005 | 3,055 | -0.004 | 1.390 | -3.904 | 14.379 | Social capital | Rupasingha, Goetz and Freshwater, 2006 |
Median value All owner-occupied housing units | 3,055 | 80495.190 | 41744.080 | 12500 | 583500 | Built capital | Census 2000 H085 |
Share of dividend, interest and rent | 3,110 | 0.188 | 0.053 | 0.059 | 0.561 | Financial capital | 2000 BEA CA5 |
Economic Structure | |||||||
Farm employment | 3,107 | 0.090 | 0.087 | 0 | 0.627 | BEA 2001 CA25N | |
Forestry, fishing, and related activities | 3,107 | 0.006 | 0.016 | 0 | 0.232 | ||
Mining, quarrying, and oil and gas extraction | 3,107 | 0.010 | 0.033 | 0 | 0.839 | ||
Utilities | 3,107 | 0.002 | 0.006 | 0 | 0.200 | ||
Construction | 3,107 | 0.058 | 0.032 | 0 | 0.260 | ||
Manufacturing | 3,107 | 0.109 | 0.089 | 0 | 0.558 | ||
Wholesale trade | 3,107 | 0.023 | 0.019 | 0 | 0.239 | ||
Retail trade | 3,107 | 0.109 | 0.030 | 0 | 0.372 | ||
Transportation and warehousing | 3,107 | 0.021 | 0.023 | 0 | 0.262 | ||
Information | 3,107 | 0.011 | 0.010 | 0 | 0.137 | ||
Finance and insurance | 3,107 | 0.027 | 0.017 | 0 | 0.201 | ||
Real estate and rental and leasing | 3,107 | 0.022 | 0.016 | 0 | 0.135 | ||
Professional, scientific, and technical services | 3,107 | 0.025 | 0.028 | 0 | 0.828 | ||
Management of companies and enterprises | 3,107 | 0.003 | 0.006 | 0 | 0.147 | ||
Administrative and support and waste management and remediation services | 3,107 | 0.026 | 0.024 | 0 | 0.192 | ||
Educational services | 3,107 | 0.006 | 0.010 | 0 | 0.112 | ||
Health care and social assistance | 3,107 | 0.052 | 0.048 | 0 | 0.305 | ||
Arts, entertainment, and recreation | 3,107 | 0.012 | 0.018 | 0 | 0.726 | ||
Accommodation and food services | 3,107 | 0.048 | 0.038 | 0 | 0.386 | ||
3,107 | 0.054 | 0.020 | 0 | 0.162 | |||
Government and government enterprises | 3,107 | 0.170 | 0.073 | 0.026 | 0.888 | ||
Note: ACS- American Community Survey, BEA- Bureau of Economic Analysis, BLS- Bureau of Labor Statistics, ERS-Economic Research Service, QCEW-Quarterly Census of Employment and Wages.
Analysis units: U.S. counties |
3 Hypotheses
- a) Pre-recession income inequality and other demographic economic and industrial factors affect drop at county level. drop(reciprocal of resistance) is positively related to income inequality
- b) drop (reciprocal of resistance) is spatially autocorrelated.
- c) drop (reciprocal of resistance) is related to characteristics of a county and characteristics (here focused on drop) of neighboring counties.at county level4 Approaches
Approaches
- a) drop(reciprocal of resistance) is positively related to income inequality and related with other characteristics of a county
Ordinary least squares regression
- b) drop (reciprocal of resistance) is spatially autocorrelated.
Global Moran’s I and Anselin Local Moran’s I
- c) drop (reciprocal of resistance) is on drop of neighboring counties.
Spatial lag model in Geoda
5 Results
- a) OLS regression
From the table below, we can see the Gini coefficient, poverty rate, and the share of aggregate income held by HH earning $200, 000 or more are not significant to predict drop. Therefore the prerecession income inequality and income distribution are not correlated with drop (reciprocal to resistance).
Besides income inequality, there are several other significant variables, for example, population growth rate from 2001-2005, % Black or African American, % Hispanic or Latino, % pop with Bachelor’s degree or higher, natural amenity scale, employment share of industries like manufacturing, wholesale trade, retail trade, transportation and warehousing, information, finance and insurance, educational services, health care and social assistance, accommodation and food services, government and government enterprises.
Table 2 Full model of drop | |
Drop | Full Model |
Gini coefficient in 2000 | -0.322 |
(-1.60) | |
Poverty rate | 0.153 |
(1.69) | |
% Aggregate income held by HH earning 200K or more | 0.166 |
(1.48) | |
Population growth rate 2001 -2005 | 0.205*** |
(3.61) | |
%Black or African American | 0.0542* |
(2.33) | |
%Hispanic or Latino | -0.113*** |
(-5.16) | |
% persons with Bachelor’s degree or higher | -0.132* |
(-2.09) | |
%Total: 20 to 29 years | -0.0218 |
(-0.21) | |
%Total: 30 to 49 years | -0.0731 |
(-0.58) | |
%Female Civilian Labor Force Participation | -0.0260 |
(-0.37) | |
Natural amenity scale | 0.00728*** |
(6.13) | |
Social capital index 2005 | -0.00386 |
(-1.32) | |
Median value All owner-occupied housing units | -4.13e-08 |
(-0.47) | |
Share of dividend, interest and rent | 0.0632 |
(0.99) | |
Farm employment | -0.0183 |
(-0.29) | |
Forestry, fishing, and related activities | 0.163 |
(0.99) | |
Mining, quarrying, and oil and gas extraction | -0.0548 |
(-0.49) | |
Utilities | 0.0237 |
(0.06) | |
Construction | 0.0631 |
(0.64) | |
Manufacturing | -0.112* |
(-2.32) | |
Wholesale trade | -0.580*** |
(-4.58) | |
Retail trade | -0.620*** |
(-5.99) | |
Transportation and warehousing | -0.337*** |
(-3.47) | |
Information | -0.590** |
(-2.62) | |
Finance and insurance | -0.710*** |
(-5.04) | |
Real estate and rental and leasing | 0.203 |
(0.87) | |
Professional, scientific, and technical services | 0.0355 |
(0.15) | |
Management of companies and enterprises | -0.207 |
(-0.53) | |
Administrative and support and waste management and remediation services | -0.0719 |
(-0.59) | |
Educational services | -0.894*** |
(-4.85) | |
Health care and social assistance | -0.215*** |
(-4.26) | |
Arts, entertainment, and recreation | -0.00162 |
(-0.01) | |
Accommodation and food services | -0.244** |
(-2.92) | |
Government and government enterprises | -0.231*** |
(-4.12) | |
_cons | 0.527*** |
(5.19) | |
N | 2771 |
adj. R2 | 0.199 |
Log likelihood | 2383.06 |
AICc | -4696.13 |
Schwarz criterion | -4488.7 |
Jarque Bera | ***Non-normality |
Breusch-Pagan tes | ***Non-stationary |
White Test | *** heteroscedasticity |
For weight Matrix Queen Contiguity 1st order | |
Moran’s I (error) | 10.5178*** |
Lagrange Multiplier (lag) | 506.9370*** |
Robust LM (lag) | 19.9202*** |
Lagrange Multiplier (error) | 1.#INF *** |
Robust LM (error) | 1.#INF *** |
Lagrange Multiplier (SARMA) | 1.#INF *** |
For weight Matrix Queen Contiguity 1st and 2nd order | |
Moran’s I (error) | 7.9411*** |
Lagrange Multiplier (lag) | 569.0604*** |
Robust LM (lag) | 18.1515*** |
Lagrange Multiplier (error) | 1.#INF *** |
Robust LM (error) | 1.#INF *** |
Lagrange Multiplier (SARMA) | 1.#INF *** |
Significant results for Jarque Bera, Breusch-Pagan test and White Test show non-normality, non-stationary, and heteroscedasticity exist. The value of Moran’s I shows there is spatial autocorrelation. While the five Lagrange Multiplier test statistics are reported in the diagnostic output. The first set of tests is between Lagrange Multiplier (LM), which tests for the presence of spatial dependence, and Robust LM, which tests which if either lag or error spatial dependence could be at work. The second set of tests is lag or error. Lag refers to spatially lagged dependent variable, whereas error refers to spatial autoregressive process for the error term. The first two (LM-Lag and Robust LM-Lag) pertain to the spatial lag model as the alternative. The next two (LM-Error and Robust LM-Error) refer to the spatial error model as the alternative. In my results, LM-Lag and LM-Error are all significant while Robust LM-Lag is less significant (Prob 0.00002 in Spatial lag with queen contiguity 1st order /0.00001 for 1st and 2nd order) than Robust LM-Error (Prob 0.00000). The last test, LM-SARMA, relates to the higher order alternative of a model with both spatial lag and spatial error terms. This test is only included for the sake of completeness, since it is not that useful in practice.
According to Anselin 2005, I should use spatial error model. But I chose to spatial lag model. “In the rare instance that both would be highly significant, go with the model with the largest value for the test statistic. However, in this situation, some caution is needed, since there may be other sources of misspecification. One obvious action to take is to consider the results for different spatial weights and/or to change the basic (i.e., not the spatial part) specification of the model.”
- b) Spatial autocorrelation
From both the global Moran’s I and Anselin Local Moran’s I, drop and the Gini coefficient are spatial autocorrelated.
Figure 3 and 4 Local Moran’s I of drop and the Gini coefficient 2000.
c) Spatial lag model
To deal with the spatial autocorrelation, I create spatial weights using queen contiguity. Contiguity based weights are weights based on shared borders (or vertices) instead of distance. The queen criterion determines neighboring units as those that have any point in common, including both common boundaries and common corners. Therefore, the number of neighbors for any given unit according to the queen criterion will be equal to or greater than that using the rook criterion which eliminates corner neighbors i.e. tracts that don’t have a full boundary segment in common.
I tried 3 orders of queen contiguity.
A county’s drop is affected by drop in neighboring counties. Because spatially weighted drop is significant in all three models. The coefficients are all positive, indicating drop in a specific county is positively related to its neighboring counties. Hence a county surrounded by high drop counties will be high in drop while a county surrounded by low drop counties is likely to be low in drop.
Moreover, the coefficient of the weighted drop increase with the contiguity order. My explanation is that when the drop of a county is affected by more counties in its neighborhood, this region would be overall vulnerable or robust, therefore the coefficients are large.
Models | Log likelihood | AICc | Schwarz criterion | W_drop |
OLS | 2383.06 | -4696.13 | -4488.7 | |
Spatial lag 1 (1st queen contiguity order) | 2416.53 | -4761.07 | -4547.71 | 0.215*** |
Spatial lag 2 (2nd queen contiguity order) | 2426.12 | -4780.24 | -4566.88 | 0.356*** |
Spatial lag3 (3rd queen contiguity order) | 2425.8 | -4779.59 | -4566.24 | 0.436*** |
An increase in log likelihood from 2383.06 (OLS) to 2416.53 (Spatial Lag with 1st order neighbors), 2426.12 (Spatial lag with 1st and 2nd order neighbors), 2425.8 (Spatial lag with 1st, 2nd , 3rd order neighbors). Compensating the improved fit for the added variable (the spatially lagged dependent variable), the AIC (from -4696.13 to -4761.07/-4780.24/-4779.59) and SC (from -4488.7 to -4547.71/-4566.88/-4566.24) both decrease relative to OLS, again suggesting an improvement of fit for the spatial lag specifications.
Among the three spatial lag models, the spatial lag with 1st and 2nd order neighbors has the highest log likelihood and lowest AICc and Schwarz criterion. Spatially lagged drop on 1st and 2nd order neighbors best fit the model.
Limited number of Diagnostics are provides with the ML lag estimation of spatial lag model 2(using Queen contiguity 1st and 2nd order). A Breusch-Pagan test for Heteroscedasticity in the error terms. The highly significant value of 587.2494 suggests that heteroscedasticity is still a serious problem.
The second test is an alternative to the asymptotic significance test on the spatial autoregressive coefficient, it is not a test on remaining spatial autocorrelation. The Likelihood Ratio Test is one of the three classic specification tests comparing the null model (the classic regression specification) to the alternative spatial lag model. The value of 86. 1133 confirms the strong significance of the spatial autoregressive coefficient.
The three classic tests are asymptotically equivalent, but in finite samples should follow the ordering: W > LR > LM. In my example, the Wald test is 9.63145^2 = 92.8 (rounded), the LR test is 86. 1133 and the LM-Lag test was 506.9370, which is not compatible with the expected order. This probably suggests that other sources of misspecification may invalidate the asymptotic properties of the ML estimates and test statistics. Given the rather poor fit of the model to begin with, the high degree of non-normality and strong heteroscedasticity, this is not surprising. Further consideration is needed of alternative specifications, either including new explanatory variables or incorporating different spatial weights.
6 Significance
Understanding what factors affect county level resistance could offer some insights that local policies to promote regional resistance and prevent future downturns.
My results show pre-recession income inequality is not a significant factor in explaining drop. Drop, which refers to the percentage of employment loss during the Great Recession, is significantly explained by prerecession population growth rate, race and ethnicity, educational attainment, natural amenity scale and several industries. Results in full model of drop show that counties with a lower population growth rate from 2001 to 2005, a lower percentage of Black or African American, a higher percentage of Hispanic or Latino population, a higher percentage of people with Bachelor’s degree or higher, a lower natural amenity scale, and higher employment shares in manufacturing, wholesale trade, retail trade, transportation and warehousing, information, finance and insurance, educational services, health care and social assistance, accommodation and food services and government and government enterprises would be more resistant and experience a low employment deviation from expected growth path during the Great Recession.
Further investigation is needed for industries such as manufacturing, retail trade, information, finance and insurances, educational services, health care and social assistance that slow down drop.
7 Learning
I learned principal components analysis in R which shrink the size of the explanatory variables. But I find it’s hard to explain in the GWR results. Moreover, my GWR still shows a low local r squared (only less than 20 counties has R squared higher than 0.3). So there might be misspecification in my original model.
Spatial lag and spatial error model in GeoDa. I also did spatial error model, but results are not shown here. The spatial lag model with 1st and 2nd queen contiguity order yield higher loglikelihood than any of the spatial error model. But due to my previous LM test for lag and error, spatial error is more significant and spatial error model should have better result. This needs more investigation. I find the tools very useful. I am curious to find a way to test the spatial lag in income inequality in neighboring counties.
Moreover, I learned a lot from the tutorials and projects from my classmates. They are wonderful!
8 Statistics
Hotspot:
Pros: identify the clustering of values, show patterns, easy to apply, fits both binary and continuous values
Cons: sensitivity to geographic outliers, edge effect, sample size >30, univariate analysis, couldn’t identify outliers as Local Moran’s I, sensitive to spatial distribution of points
Spatial autocorrelation (Local Moran’s I):
Pros: similar to hotspot analysis, could identify outliers (high-low outliers, low-high outliers)
Cons: similar to hotspot analysis
Regression (OLS, GWR):
OLS gives a general coefficients, a global relationship. It is multivariate analysis and could include many explanatory variables.
GWR builds a local regression equation for each feature in the dataset.
Cons: Maps of coefficients could only show one relationship at a time, cannot include large numbers of explanatory variables.
Multivariate methods (PCA):
Pros: could reduce the size of explanatory variables, resulted PC (linear combination of the candidate variables) capture the most variation of the dataset
Cons: it’s hard to explain the coefficients/results when more than one pcs work in the regression.
Learning: Resulted principal components capture the most variation of the dataset, however, it may not best capture the variation in the dependent variables.
Reference:
Anselin, Luc. “Exploring spatial data with GeoDaTM: a workbook.” Urbana 51 (2004): 61801.
Han, Yicheol, and Stephan J. Goetz. “The Economic Resilience of US Counties during the Great Recession.” The Review of Regional Studies 45, no. 2 (2015): 131.
Rajan, Raghuram. Fault lines. HarperCollins Publishers, 2012.
Rupasingha, Anil, Stephan J. Goetz, and David Freshwater. “The production of social capital in US counties.” The journal of socio-economics 35, no. 1 (2006): 83-101. Data Retrieved on March 23rd, 2016 http://aese.psu.edu/nercrd/community/social-capital-resources
U.S. Census Bureau, Current Population Survey, Annual Social and Economic Supplements “H2 Share of aggregate income received by each fifth and top 5 percent of households all races: 1967 to 2014”,” H4 Gini ratios for households by race and Hispanic origin of householder: 1967 to 2014”, (2014a) Retrieved on May 17th , 2016 https://www.census.gov/hhes/www/income/data/historical/inequality/
U.S. Census Bureau, Current Population Survey, Annual Social and Economic Supplements Table 2 Poverty status of people by family relationship, race, and Hispanic origin: 1959 to 2014” (2014b) Retrieved on May 17th , 2016 https://www.census.gov/hhes/www/poverty/data/historical/people.htmlU.S. Census Bureau, Historical Income Tables Footnote. (2016a). Retrieved on May 17th, 2016 from http://www.census.gov/hhes/www/income/data/historical/ftnotes.html
U.S. Census Bureau Historical Poverty Tables-Footnote. (2016b). Retrieved on May 17th, 2016 from https://www.census.gov/hhes/www/poverty/histpov/footnotes.html