This is the second in our continuing series of blog posts analyzing trends in apprehensions along the southwest border. The first piece in the series can be found here. In this segment, we discuss our data sources and some of the challenges associated with measuring our quantities of interest.
SWB Apprehensions Data
The Department of Homeland Security publishes official data on the number of immigrants apprehended while attempting to illegally cross the border between ports of entry. For more than fifteen years, these figures have been released annually, reflecting sector-level data for each month of the preceding year.
In the model to test whether policy changes impacts the rates of illegal immigration, we include the 1-month lag of SWB sector apprehensions on the hypotheses that those deported in one month are likely to attempt re-entry the next, and that whatever features are driving immigration but are not captured by our exogenous variables may be reflected in the previous month’s apprehension figures.
We also include several features pertaining to the weather in the border sectors, to account for any changes in immigration attempts due to high temperatures or precipitation. NOAA collects weather information from a variety of US weather stations along the Southwest Border and reports monthly average temperatures for each weather station. Unfortunately, NOAA’s data series are grouped into specific sectors along the SWB, but these sectors differ from CBP’s sectors, so we had to calculate an estimate of the weather in CBP sectors based on the weather in the overlapping NOAA sectors.
We defined a sector’s weather as the complete NOAA weather series nearest to the OBP headquarters for each sector, on the observation that headquarters are typically near to the SWB, and they are likely placed in these locations to facilitate enforcement activities at the border, and therefore are nearby attempts at illegal entry. We located the OBP headquarters in Google Maps and calculated the pairwise distance between the location and each weather station in the NOAA data, and selected the top five closest stations. From there, we selected the weather station with the nearest, complete data series. Using these estimates, we calculated average monthly temperature and total monthly precipitation for each sector.
Derived Features: Composite Wages & Wage Forecasts
Our approach is strongly inspired by Hanson and Spilimbergo’s approach of computing the difference in relative wages between the United States and Mexico as a rough measure of the monetary gains a Mexican immigrant might expect should he obtain employment in the United States. This feature is computed in several steps, which are complicated by data availability challenges.
Mexican Labor Participation
First, the shares of Mexican participation in each of seven industries (construction, finance/insurance/real estate, manufacturing, retail, services, transportation, and wholesale) in the United States are estimated using the US Census’ Public Use Microdata 1-Year Sample (PUMS). We use the same data to compute the standard errors for participation rates. Prior to 2000, the PUMS data on the Census’ website are not readable by modern (read: my) computers. Data points for the years 1998 and 1999 are therefore imputed using a weighted linear regression against year (MX Share of Industry= β0+β1×year+ϵ) where the weights are the inverse of the standard errors of the estimates of the Mexican participation in each industry. (Errors are assumed to be uncorrelated between years.) Weighting of each year’s observations is necessary because the Census data collection and reporting methodology changes, sometimes radically, between years. Consequently, the standard errors of these estimates in 2012 and 2013 are dramatically smaller than the estimates from the beginning of the millennium. While imperfect, this imputation method does permit annual estimates for these two years, which would otherwise be unavailable.
One shortcoming of this analysis is that the composite wage index omits agricultural wages. The PUMS data shows that national Mexican participation in agriculture is consistently higher than Mexican participation in every other industry that we examined. At times, the Mexican participation rate in agriculture is more than twice the rate of Mexican participation in the next-largest industry share. However, for practical and historical reasons, agricultural wage data is much less robust and is only reported annually, despite the obvious seasonal patterns in agricultural labor. Owing to the large magnitude of its participation rate relative to other industries, inclusion of agricultural wages in the analysis would have the effect of “buoying” the wages to be near to agricultural wages for a given year, and suppressing the very month-to-month variation in wages which we are attempting to investigate vis-à-vis immigration flows across the SWB.
Wages in the US
Data on the national average wage in each industry is collected from the US Department of Labor. US wages are deflated by the US CPI. For each month, we collect the monthly average wage and then compute the composite wage as the sum of the Mexican industry participation-weighted shares to produce a single US wage index.
Wages in Mexico
Data on Mexican wages are more difficult to pin down. Monthly wage data, for our time period FY1998-FY2013, is only available for factory workers (through the Mexican statistical bureau, National Institute of Statistics and Geography, Spanish acronym: INEGI). Other industries do not report wages on a monthly basis or are inconsistent in the reporting. For factory workers, the wages are reported in two time series, one spanning 1994 to 1998, and the other spanning 1997 to present. The wages survey instrument changed dramatically in 1997 to attempt more accurate measurement of workers’ (rather than managers’) wages. The consequence is that the latter series appears to report a dramatic decline in wages. We believe that this sharp decline is due to the change in instrument. Comparison of the period of overlap for the two surveys permits the counterfactual estimation of the wages in 1994 to 1998 as if the new, revised survey had been administered. We use a simple log-log linear regression for this task:
One peculiarity of the Mexican wage time series is that in December of each year, the wage is dramatically higher than in either January or November. This is true for both the old and the new survey instruments. It’s possible that the Christmas season happens to lift wages briefly, only to return them to ordinary values after its passage, or that there is some other, unknown effect, but we believe that it’s more plausible that the wage surveys are incidentally capturing some kind of a Christmas bonus effect in December. Therefore, the regression model includes an indicator variable for December so that Decembers’ estimates are more consistent with this trend. Mexican wages are deflated by the Mexican CPI, as taken from INEGI.
Data on the US Dollar-Mexican Peso exchange rate are taken from the US Federal Reserve and included as a separate regressor to capture the effect of fluctuations in the value of the Mexican currency.
Catch and Release
We also include a dummy (CARP) which takes on a value of 1 for the years 2006 onward, indicating the administrative action ending “catch-and-release.” A lengthier description of CARP can be found in the first post in this series. Additionally, we cross the CARP dummy with lagged apprehensions to measure the deterrent effect of CARP, on the hypothesis that the change in enforcement policy will change the degree of autocorrelation present in our data. We believe that illegal entrants that have been apprehended, detained, and then repatriated might experience a deterrent effect that decreases the likelihood of them reattempting illegal entry between ports of entry. Alternatively, if there is no deterrent effect, we will estimate this coefficient near zero, implying that the autocorrelation is constant in both CARP and non-CARP time periods.
Stay tuned for the third piece in this series where we will discuss the model itself and our results.