Processing math: 100%
  • 1 Introduction
    • 1.1 Introduction: Matching Estimator
  • 2 Identification
    • 2.1 Matching
    • 2.2 Identification
    • 2.3 ATT and ATE
  • 3 Estimation
    • 3.1 Estimation Methods
    • 3.2 Approach 1: Regression, or Analogue Approach
    • 3.3 Nonparametric Estimation
    • 3.4 Curse of dimensionality
    • 3.5 Parametric Estimation, or going back to linear regression
    • 3.6 Approach 2: MNearest Neighborhood Matching
    • 3.7 Approach 3: Propensity Score Matching

1 Introduction

1.1 Introduction: Matching Estimator

  • Idea: Compare individuals with the same characteristics X across treatment and control groups
  • Key assumption: Treatment is random once we control for the observed characteristics.
  • Do you remember we already learnt a similar idea before?

2 Identification

2.1 Matching

  • Let Xi denote the observed characteristics:
    • age, income, education, race, etc..
  • Assumption 1: Di(Y0i,Y1i)|Xi
    • Conditional on Xi, no selection bias.
    • Selection on observables assumption / ignorability
  • Assumption 2: Overlap assumption P(Di=1|Xi=x)(0,1) x
    • Given x, we should be able to observe people from both control and treatment group.
    • We call P(Di=1|Xi=x) propensity score.

2.2 Identification

  • The assumption implies that E[Y1i|Di=1,Xi]=E[Y1i|Di=0,Xi]=E[Y1i|Xi]E[Y0i|Di=1,Xi]=E[Y0i|Di=0,Xi]=E[Y0i|Xi]

  • The ATT for Xi=x is given by E[Y1iY0i|Di=1,Xi]=E[Y1i|Di=1,Xi]E[Y0i|Di=1,Xi]=E[Yi|Di=1,Xi]E[Y0i|Di=0,Xi]=E[Yi|Di=1,Xi]avg with Xi in treatmentE[Yi|Di=0,Xi]avg with Xi in control

  • The components in the last line are identified (can be estimated).

  • Intuition: Comparing the outcome across control and treatment groups after conditioning on Xi

2.3 ATT and ATE

  • ATT is given by ATT=E[Y1iY0i|Di=1]=E[Y1iY0i|Di=1,Xi=x]fXi(x|Di=1)dx=E[Yi|Di=1](E[Yi|Di=0,Xi=x])fXi(x|Di=1)

  • ATE is ATE=E[Y1iY0i]=E[Y1iY0i|Xi=x]fXi(x)dx=E[Yi|Di=1,Xi=x]fXi(x)dx=+E[Yi|Di=0,Xi=x]fXi(x)dx

3 Estimation

3.1 Estimation Methods

  • We need to estimate E[Yi|Di=1,Xi=x] and E[Yi|Di=0,Xi=x]
  • Several ways to implement the above idea
    1. Regression: Nonparametric and Parametric
    2. Nearest neighborhood matching
    3. Propensity Score Matching

3.2 Approach 1: Regression, or Analogue Approach

  • Let ˆμk(x) be an estimator of μk(x)=E[Yi|Di=k,Xi=x] for k{0,1}
  • The analog estimators are ^ATE=1NNi=1ˆμ1(Xi)ˆμ0(Xi)^ATT=N1Ni=1Di(Yiˆμ0(Xi))N1Ni=1Di
  • How to estimate μk(x)=E[Yi|Di=k,Xi=x] ?

3.3 Nonparametric Estimation

  • Suppose that Xi{x1,,xK} is discrete with small K
    • Ex: two demographic characteristics (male/female, white/non-white). K=4
  • Then, a nonparametric binning estimator is ˆμk(x)=Ni=11{Di=k,Xi=x}YiNi=11{Di=k,Xi=x}
  • Here, I do not put any parametric assumption on μk(x)=E[Yi|Di=k,Xi=x].

3.4 Curse of dimensionality

  • Issue: Poor performance if K is large due to many covariates.
  • So many potential groups, too few observations for each group.
  • With K variables, each of which takes L values, LK possible groups (bins) in total.
  • This is known as curse of dimensionality.
  • Relatedly, if X is a continuous random variable, can use kernel regression.

3.5 Parametric Estimation, or going back to linear regression

  • If you put parametric assumption such as E[Yi|Di=0,Xi=x]=βxiE[Yi|Di=1,Xi=x]=βxi+τ0 then, you will have a model yi=βxi+τDi+ϵi
  • You can think the matching estimator as controlling for omitted variable bias by adding (many) covariates (control variables) xi.
  • This is one reason why matching estimator may not be preferred in empirical research.
    • Remember: Controlling for those covariates is of course important. This can be combined with other empirical strategies (IV, DID, etc).

3.6 Approach 2: MNearest Neighborhood Matching

  • Idea: Find the counterpart in other group that is close to me.
  • Define ˆyi(0) and ˆyi(1) be the estimator for (hypothetical) outcomes when treated and not treated. ˆyi(0)={yiif Di=01MjLM(i)yjif Di=1
  • LM(i) is the set of M individuals in the opposite group who are “close” to individual i
    • Several ways to define the distance between Xi and Xj, such as dist(Xi,Xj)=||XiXj||2
  • Need to choose (1) M and (2) the measure of distance
    • R has several packages for this.

3.7 Approach 3: Propensity Score Matching

  • Use propensity score P(Di=1|Xi=x) as a distance to define who is the closest to me.
  • Implementation:
    1. Estimate propensity score function by logit or probit using a flexible function of Xi.
    2. Calculate the propensity score for each observation. Use it to define the pair.