Working Papers

Identification of a triangular random coefficient model using a correction function (R&R at Review of Economics and Statistics)

Previously, identification of triangular random coefficient models required a restriction on the dimension of the first stage heterogeneity or independence assumptions across the different sources of the heterogeneity. This note proposes a new identification strategy that does not rely on either of these restrictions but rather uses conditional linear projections to construct “correction functions” to address endogeneity and gain identification of the average partial effect. This identification strategy allows for both continuous and discrete instruments although the formulation of the correction function differs depending on the setting. Finally, a simple simulation illustrates that the proposed identification strategy is valid in settings where no other existing methods can identify average partial effects.

Sample Selection in Linear Panel Data Models with Heterogeneous Coefficients. (Joint with Riju Joshi, R&R at Journal of Applied Econometrics) 

We propose a parametric estimation procedure for linear panel data models with sample selection and heterogeneous coefficients that are present in both outcome model and selection model.  Our two-step estimation procedure accounts for endogeneity from the selection process and endogeneity from correlation between the individual unobserved heterogeneity and the observed covariates using control function methods. Conditional linear projections are used to establish a tractable control function approach that builds upon the original Heckman correction to sample selection. Monte Carlo simulations illustrate the finite sample properties of our estimator and demonstrate that our proposed estimator outperforms standard estimators. We apply the proposed approach to estimate gender differences in high-stakes time constrained decisions using Elo ratings data from the World Chess Federation. When addressing both sources of endogeneity, we find a much larger gender skill gap and substantial differences across the genders in strategically selecting into time constrained matches.

Addressing Sample Selection Bias for Machine Learning Methods. (Joint with Dylan Brewer, R&R at Journal of Applied Econometrics)

(Supplementary Appendix)

We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common suggestions are to re-weight or control for variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using common machine-learning algorithms. Common machine learning practices such as re-weighting or controlling for variables that influence selection into the training or testing sample often worsens sample selection bias. We suggest two control-function approaches that remove the effects of selection bias before training and find that they reduce mean-squared prediction error in simulations with a high degree of selection.  We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re-election bids. We find that ignoring selection on unobservables leads to substantially  higher predicted vote shares for the incumbent than when the control function approach is used.

gtsheckman: Generalized Two Step Heckman Estimator.

In this article I introduce the gtsheckman command which estimates a generalized two step Heckman sample selection estimator adjusted for heteroskedasticity. This estimator has been previously proposed in Carlson and Joshi (2022) where the presence of heteroskedasticity was motivated by a panel data setting with random coefficients. The gtsheckman offers several advantages to the heckman, twostep command including robust inference, a more general control function specification and incorporating heteroskedasticity. 

Stata command: gtsheckman.ado

Stata help file: gtsheckman.sthlp

Stata examples:

Stata 2022 Conference Slides 

Heckman sample selection estimators under heteroskedasticity (Joint with Wei Zhao)

This paper studies the properties of two Heckman sample selection estimators, full information maximum likelihood (FIML) and limited information maximum likelihood (LIML), under heteroskedasticity. In this case, FIML is inconsistent while LIML can be consistent in certain settings. For the LIML estimator, we provide robust asymptotic variance formulas, not currently provided with standard Stata commands. Since heteroskedasticity affects these two estimators' performance, this paper also offers guidance on how to properly test for heteroskedasticity. We propose a new demeaned Breusch-Pagan test to detect general heteroskedasticity in sample selection settings as well as a test for when LIML is consistent under heteroskedasticity. The Monte Carlo simulations illustrate that both of the proposed test procedures perform well.

Behavior of Pooled and Joint Estimators in Probit Model with Random Coefficients and Serial Correlation. (Joint with Jeffrey Wooldridge and Ying Zhu, draft coming soon)

Works in Progress

Estimation of a Binary Response Dynamic Panel Data Model with Attrition (Joint with Anastasia Semykina)

Estimators for Binary Response Models with Categorical Endogenous Variables