17 분 소요

Paper Review : Optimal Bandwidth Choice for the Regression discontinuity EstimatorPermalink

1. Basic modelPermalink


Potential outcome framework

Notation

Sample size : NN

Yi(1)Yi(1) : potential outcome for unit ii given treatment

Yi(0)Yi(0) : potential outcome for unit ii without treatment

WiWi : Whether the treatment received or not. Wi=1Wi=1 : treatment received, Wi=0Wi=0 : not received

Then, the observed outcome YiYi is

Yi=Yi(Wi)=Yi(0)  if  Wi=0Yi=Yi(Wi)=Yi(1)  if  Wi=1Yi=WiYi(1)+(1Wi)Yi(0)Yi=Yi(Wi)=Yi(0)  if  Wi=0Yi=Yi(Wi)=Yi(1)  if  Wi=1Yi=WiYi(1)+(1Wi)Yi(0)


Regression discontinuity design

XiXi : Forcing variable with scalar covariate. This variable determines the treatment

m(x)m(x) : conditional expectation of YiYi given Xi=xXi=x

m(x)=E(Yi|Xi=x)m(x)=E(Yi|Xi=x)

In SRD design, treatment is determined solely by the value of the forcing variable XiXi being on either side of a fixed and known threshold cc or

Wi=I{Xic}Wi=I{Xic}

Then, we focus on average effect of the treatment for units with covariate values equal to the threshold

τSRD=E(Yi(1)Yi(0)|Xi=c)τSRD=E(Yi(1)Yi(0)|Xi=c)
If the conditional distribution functions $F_{Y(0) X}(y x)andandF_{Y(1) X}(y x)iscontinuousiniscontinuousinxforallforallyandtheconditionalfirstmomentsandtheconditionalfirstmomentsE(Y_i(1) X_i=x)andandE(Y_i(0) X_i=x)existandarecontinuousatexistandarecontinuousatx=c$, then
τSRD=μ+μ=limxcm(x)limxcm(x)τSRD=μ+μ=limxcm(x)limxcm(x)

The estimand is the difference of two regression functions evaluated at boundary points.

We use local linear regression at each side to estimate τSRDτSRD

Fit local linear regression

(ˆα(x),ˆβ(x))=argminα,βΣNi=1I(Xi<x)(Yiαβ(Xix))2K(Xixh)(ˆα+(x),ˆβ+(x))=argminα,βΣNi=1I(Xix)(Yiαβ(Xix))2K(Xixh)(^α(x),^β(x))=argminα,βΣNi=1I(Xi<x)(Yiαβ(Xix))2K(Xixh)(^α+(x),^β+(x))=argminα,βΣNi=1I(Xix)(Yiαβ(Xix))2K(Xixh)

Then, the estimated regression function m() at x is

ˆmh(x)=ˆα(x)  if  x<cˆmh(x)=ˆα+(x)  if  xc

Then, the estimated τSRD is

ˆτSRD=ˆμ+ˆμ, whereˆμ=limxcˆmh(x)=ˆα(c), ˆμ+=limxcˆmh(x)=ˆα+(c),


2. Error Criterion and Infeasible optimal bandwidth choicePermalink


1) Error criteriaPermalink


optimal choice of the bandwidth h에서 사용되었던 기존의 방법론 : cross validation or ad hoc methods.

Cross validation를 적용하기 위해 사용한 식 : mean integrated squared error criterion(MISE)

MISE(h)=E(x(ˆmh(x)m(x))2f(x)dx)

where f(x) is forcing variable.

Problem

MISE(h)를 이용하여 구한 optimal bandwidth h는 최적의 ˆτSRD를 만들어주는 h가 아닌 최적의 ˆmh(x)를 만들어주는 h가 됨. 즉 목적함수가 다름

Property of τSRD

(1) τSRDm(x)x=c에서의 좌극한, 우극한 값만 필요함. 실제로 m(x)를 추정할 때 x=c를 기준으로 각각의 side의 data를 진행하여 local linear regression을 진행하기 때문에, 시행하는 local linear regression 횟수는 2번, τSRD 추정할 때 사용하는 값 역시 2개($\hat{\mu}-, \hat{\mu}+$)

(2) 추정하는 두 개의 값 ($\hat{\mu}-, \hat{\mu}+$)이 boundary point!

따라서 MISE가 아닌 다른 error criteria를 이용하여 best h를 추정해야 함. 즉, τSRD에 대한 mean squared error를 정의하고

MSE(h)=E((ˆτSRDτSRD)2)=E(((ˆμ+μ+)(ˆμμ))2)

RD design에서의 optimal bandwidth h는 위 MSE(h)를 최소화시키는 h가 됨

h=argminhMSE(h)

Problem

sample size가 커져도, h가 0으로 converge하지 않는 경우가 발생

(side별로 estimation을 진행하기 때문에, bias가 상쇄되는 경우가 발생할 수 있음)

It does not seem appropriate to base estimation on global criteria when identification is local

  • Focusing on the bandwidth that minimizes a first-order approximation to MSE(h) : Asymptotic mean squared error AMSE(h)

Second concern : Single bandwidth

local linear regression을 두번 진행하기 때문에, 각각의 local linear regression에 최적의 bandwidth가 존재할수도 있음, 따라서

MSE(h,h+)=E(((ˆμ+(h+)μ+)(ˆμ(h)μ))2)

를 최소화시키는 h,h+를 찾을수도 있음.

Problem

Suppose the bias for both estimators are strictly increasing. Then, we can set h+(h) such that the bias of the RD estimate cancel out.

(E(ˆμ(h))μ)(E(ˆμ+(h+(h))μ+))=0

h 를 크게 setting한 후, 위의 bias가 0이 되도록 적절한 h+를 찾아주기만 하면 됨. 즉 bandwidth가 무한히 커지더라도, bias가 0이 될 수 있으므로 문제가 발생할 수 있음. 따라서 실제 적용에 문제가 발생할 수 있음!


2) An asymptotic expansion of the expected errorPermalink


Notation

m(k)+(c) : right limits of the kth derivative of m(x) at the threshold c

m(k)(c) : left limits of the kth derivative of m(x) at the threshold c

σ2+(c) : The right limit of the conditional variance $\sigma^2(x)=Var(Y_i X_i=x)atthethresholdc$
σ2(c) : The left limit of the conditional variance $\sigma^2(x)=Var(Y_i X_i=x)atthethresholdc$


Assumption

(1) (Yi,Xi),for i=1,,N are iid

(2) The marginal distribution of the forcing variabel Xi, denoted f(), is continuous and bounded away from zero at the threshold c

(3) The conditional mean $m(x)=E(Y_i X_i=x)hasatleastthreecontinuousderivativesinanopenneighbourhoodofX=c.Therightandleftlimitsofthekthderivativeofm(x)atthethresholdcaredenotedbym^{(k)}+(c)andm^{(k)}-(c)$ .

(4) The kernel K() is non-negative, bounded, differs from zero on a compact interval [0,a], and is continuous on (0,a)

(5) The conditional variance function $\sigma^2(x) = Var(Y_i X_i=x)isboundedinanopenneighbourhoodofX=candrightandleftcontinuousatc$.

(6) The second derivatives from the right and the left differ at the threshold: $m^{(2)}+(c) \neq m^{(2)}-(c)$


**Definition : **AMSE(h)

AMSE(h)=C1h4(m(2)+(c)m(2)(c))2+C2Nh(σ2+(c)f(c)+σ2(c)f(c))

C1,C2 are functions of the kernel:

C1=14(ν22ν1ν3ν2ν0ν21)2, C2=ν22π02ν1ν2π1+ν21π2(ν2ν0ν21)2

where

νj=0ujK(u)du, πj=0ujK2(u)du

In AMSE, the first term

C1h4(m(2)+(c)m(2)(c))2

corresponds to the square of the bias and the second term

C2Nh(σ2+(c)f(c)+σ2(c)f(c))

corresponds to the variance.

In bias term clarifies the role that assumption (6) will play.

The leading term in the expansion of the bias : order h4 if assumption (6) holds

If the assumption (6) does not hold, the bias converges to zero faster, allowing for estimation for τSRD at a faster rate of convergence.

(실제로는 second derivative가 같은지 확인이 어렵기 때문에, assumption (6)을 만족한 상태에서 진행. (6)을 만족하지 않더라도, proposed estimator τSRD : constistent)

(second derivative가 같은 경우와 다른 경우 optimal bandwidth를 찾는 방법에서 차이가 있을 수 있음. 이 논문에서는 다른 경우에 대해서 다룸)


Lemma 1 (Mean Squared Error Approximation and Optimal Bandwidth

(1) Suppose assumptions (1) - (5) hold. Then

MSE(h)=AMSE(h)+op(h4+1Nh)

(2) Suppose that also assumption (6) holds. Then,

hopt=argminhAMSE(h)=Ck(σ2+(c)+σ2(c)f(c)(m(2)+(c)m(2)(c))2)1/5N1/5

where CK=(C24C1)1/5 , indexed by the kernel K()

For the edge kernel, with $K(u)=I{ u \leq1}(1- u ),theconstantC_{K, edge} \approx 3.4375$
For the uniform kernel with $K(u)=I{ u \leq1/2},theconstantC_{K, uniform} \approx 5.40$


3. Feasible optimal bandwidth choicePermalink


1) A simple plug-in bandwidthPermalink


hopt=argminhAMSE(h)=Ck(σ2+(c)+σ2(c)f(c)(m(2)+(c)m(2)(c))2)1/5N1/5

에서 필요한 값

σ2+(c),σ2(c),f(c),m(2)+(c),m(2)(c),K()

을 해당 unknown quantities의 consistent estimator로 교체

˜hopt=Ck(ˆσ2+(c)+ˆσ2(c)ˆf(c)(ˆm(2)+(c)ˆm(2)(c))2)1/5N1/5

problem

First-order bias가 매우 작을 때 문제가 발생할 수 있음

위 경우 $m^{(2)}+(x) = m^{(2)}-(x),h_{opt}$ 식에서 분모 값이 매우 큰 값이 나올 수 있음. 이 경우 bandwidth가 부정확하고 variance가 커질 수 있음.

추가적으로, estimator for τSRD가 poor property를 가지게 됨. -because the true finite sample bias depends on global properties of the regression function that are not captured by the asymptotic approximation used to calculate the bandwidth.


(1) RegularizationPermalink

hopt의 분모가 0이 되지 않도록 수치 조절

The bias in the plug-in estimator for the reciprocal of the squared difference in second derivatives is

E(1(ˆm(2)+(c)ˆm(2)(c))21(m(2)+(c)m(2)(c))2)=(3(Var(ˆm(2)+(c))+Var(ˆm(2)(c)))(m(2)+(c)m(2)(c))4)+o(N2α)

Then, for $r=3(Var(\hat{m}^{(2)}+(c))+Var(\hat{m}^{(2)}-(c)))$, the bias in the modified estimator for the reciprocal of the squared difference in second derivatives in of lower order

E(1(ˆm(2)+(c)ˆm(2)(c))2+r1(m(2)+(c)m(2)(c))2)=o(N2α)

This in turn motivates the modified bandwidth estimator

ˆhopt=CK(ˆσ2(c)+ˆσ2+(c)ˆf(c)((ˆm(2)+(c)ˆm(2)(c))2+r++r))15N15

where

r+=3^Var(ˆm(2)(c)),r=3^Var(ˆm(2)+(c))

Then, this bandwidth will not become infinite even in the cases when the difference in curvatures at the threshold is zero.


(2) Implementing the regularizationPermalink

We estimate the second derivative m(2)+(c) by fitting a quadratic function to the observations with Xi[c,c+h].

The initial bandwidth h here will be different from the bandwidth $\hat{h}{opt}usedintheestimationof\tau{SRD}$

Notation

Nh,+ : the number of units with covariate values in this interval

ˉX=1Nh,+ΣcXic+hXi

$\hat{\mu}{j, h, +} = \frac{1}{N{h, +}}\Sigma_{c\leq X_i\leq c+h}(X_i-\bar{X})^j:jthcenteredmomentoftheX_iintheinterval[c, c+h]$.

Then, we can get r+

r+=12Nh,+(σ2+(c)ˆμ4,h,+(ˆμ2,h,+)2(ˆμ3,h,+)2/ˆμ2,h,+)

However, fourth moments are difficult to estimate precisely, we approximate this expression exploiting the fact that for small h, the distribution of the forcing variable can be approximated by a uniform distribution on [c,c+h], so that

μ2,h,+h2/12, μ3,h,+0, μ4,h,+h460

Using this facts,

ˆr+=2160ˆσ2+(c)Nh,+h4, ˆr=2160ˆσ2(c)Nh,h4

Then using $\hat{r} = \hat{r}- + \hat{r}+$, we can get

ˆhopt=CK(ˆσ2(c)+ˆσ2+(c)ˆf(c)((ˆm(2)+(c)ˆm(2)(c))2+r++r))15N15
  • Check

We need specific estimators $\hat{\sigma}^2+(c), \hat{\sigma}^2-(c), \hat{f}(c), \hat{m}^{(2)}+(c), \hat{m}^{(2)}-(c)$

Any combination of consistent estimators for $\sigma^2+(c), \sigma^2-(c), f(c), m^{(2)}+(c), m^{(2)}-(c)$ substituted into expression, with or without the regularity terms, will have the same optimality properties

The proposes estimator is relatively simple, but the more important point is that it is a specific estimator: It gives a convenient starting point and benchmark for doing a sensitivity analyses regarding bandwidth choice.

The bandwidth selection algorithm to be relatively robust to these choices.


2) An algorithm for bandwidth selectionPermalink


(1) Step 1. Estimation of density f(c) and conditional variances $\sigma^2-(c)and\sigma^2+(c)$Permalink

First, calculate the sample variance of the forcing variable, S2X=Σ(XiˉX)2/(N1)

Use the Silverman rule to get a pilot bandwidth for calculating the density and variance at c

For normal kernel and a normal reference density : h=1.06SXN1/5

Modification : Uniform kernel on [1,1] and normal reference density

h1=1.84SXN1/5

Then, calculate

Nh1,=ΣNi=1I{ch1Xi<c}, Nh1,+=ΣNi=1I{cXic+h1}ˉYh1,=1Nh1,Σch1Xi<cYiˉYh1,+=1Nh1,+ΣcXic+h1Yi

Now estimate the density of Xi at c as

ˆf(c)=Nh1,+Nh1,2Nh1

and estimate the limit of the conditional variances of Yi given Xi=x at x=c

ˆσ2(c)=1Nh1,1Σch1Xi<c(YiˉYh1,)2ˆσ2+(c)=1Nh1,+1ΣcXic+h1(YiˉYh1,+)2

these estimators are consistent for the density and the conditional variance, respectively.

(2) Step 2. Estimation of second derivatives $\hat{m}{+}^{(2)}(c)and\hat{m}{-}^{(2)}(c)$Permalink

First, we need pilot bandwidths h2,,h2,+ Fit a third-order polynomial to the data, including an indicator for Xi0 .

Yi=γ0+γ1I(Xic)+γ2(Xic)+γ3(Xic)2+γ4(Xic)3+ϵi

and estimate m(3)(C) as ˆm(3)(c)=6ˆγ4

Note that ˆm(3)(c) is in general not a consistent estimate of m(3)(c) but will converge to some constant at a parametric rate. However, we do not need a consistent estimate of the third derivative at c here to obtain consistent estimator for the second derivative.

Calculate h2,+,h2,

h2,+=3.56(ˆσ2+(c)ˆf(c)(ˆm(3)(c))2)1/7N1/7+h2,=3.56(ˆσ2(c)ˆf(c)(ˆm(3)(c))2)1/7N1/7+

Where N and N+ are the number of observations to the left and right of the threshold, respectively.

h2,,h2,+ are estimates of the optimal bandwidth for calculation of the second derivative at a boundary point using a local quadratic and a uniform kernel.

Given the pilot bandwidth h2,+, we estimate the curvature m(2)+(c) by a local quadratic fit. To be precise, temporarily discard the observations other than the N2,+ oservations with cXic+h2,+.

Label the new data

ˆY+=(Y1,...,YN2,+), ˆX+=(X1,...,XN2,+)T=[1T1T2]

where $\boldsymbol{T’}j = ((X_1-c)^j, …, (X{N_{2, +}}-c)^j)$

The estimated regression coefficients are

ˆλ=(TT)1TˆY

and calculate ˆm2+(c)=2ˆλ3

Similarly, we can calculate ˆm2(c)

(3) Step 3. Calculation of regularization term $\hat{r}-and\hat{r}+andcalculationof\hat{h}_{opt}$Permalink

Given the previous steps, the regularization terms are calculated as follows:

ˆr+=2160ˆσ2+(c)N2,+h42,+, ˆr=2160ˆσ2(c)N2,h42,,

Then finally, we can get the proposed bandwidth:

ˆhopt=CK(ˆσ2(c)+ˆσ2+(c)ˆf(c)((ˆm(2)+(c)ˆm(2)(c))2+r++r))15N15

Given the bandwidth ˆhopt, we get

ˆτSRD=limxcˆmˆhopt(x)limxcˆmˆhopt(x)

where ˆmh(x) is the local linear regression estimator.

##### (3) Properties of algorithm

First, the resulting RD estimator ˆτSRD is consistent at the best rate for non-parametric regression functions at a point.

Second, the estimated constant term in the reference bandwidth converges to the best constant.

Third, we have a Li type optimality result for the mean squared error and consistency at the optimal rate for the RD estimate.

Theorem : Properties of ˆhopt

Suppose assumptions (1) - (5) hold. Then:

(1) consistency : If assumption (6) hold. then,

ˆτSRDτSRD=Op(N2/5)

(2) consistency : If assumption (6) does not hold, then

ˆτSRDτSRD=Op(N3/7)

(3) convergence of bandwidth

ˆhopthopthopt=op(1)

(4) Li’s optimality

MSE(ˆhopt)MSE(hopt)MSE(hopt)=op(1)

If assumption (6) does not hold, there can be

m(2)+(x)=m(2)(x)

implying that the bias term of AMSE vanishes, which would improve convergence.

(4) DesJardins-McCall bandwidth selectionPermalink

The objective criterion is different

E((ˆμ+μ+)2+(ˆμμ)2)

The single optimal bandwidth based on the DesJardins and McCall criterion is

hDM=CK(σ2+(c)+σ2(c)f(c)(m2+(c)2+m(2)(c)2))1/5N1/5

This will in large samples lead to a smaller bandwidth than our proposed bandwidth choice if the second derivatives are of the same sign. Also, this model actually use different bandwidths on the left and the right and also use a Epancechnikov kernel.

(5) Ludwig-Miller cross-validationPermalink

Let N and N+ be the number of observations with Xi<c and Xic. for δ(0,1), let θ(δ) and θ+(δ) be the δth quantile of the Xi among the subsample of observations with Xi<c and Xic, respectively, so that

θ(δ)=argmina{a|(Σni=1I{Xia}δN}θ+(δ)=argmina{a|(Σni=1I{cXia}δN+}

Not the LM cross-validation criterion we use is of the form

CVδ(h)=ΣNi=1I{θ(1δ)Xiθ+(δ)}(Yiˆmh(Xi))2

Key feature of ˆmh(x) is that for values of x<c, it only uses observations with Xi<x to estimate m(x) and for values of xc, it only uses observations with Xi>x to estimate m(x), so that ˆmh(Xi) does not depend on Yi, as is necessary for cross validation.

By using a value for δ close to zero, we only use observations close to the threshold to evaluate the cross-validation criterion.

Issue

by using LM cross-validation, the criterion focuses on minimizing

E((ˆμ+μ+)2+(ˆμμ)2)

rather than

E(((ˆμ+ˆμ)(μ+μ))2)

Therefore, even letting δ0 with the sample size in the cross-validation procedure will not result in an optimal bandwidth.


5. ExtensionPermalink


1) Fuzzy regression designPermalink


In FRD design, the treatment Wi is not a deterministic function of the forcing variable. Instead, the probability $P(W_i=1 X_i=x)changesdiscontinuouslyatthethresholdc$. in FRD design, the treatment effect is
τFRD=limxcE(Yi|Xi=x)limxcE(Yi|Xi=x)limxcE(Wi|Xi=x)limxcE(Wi|Xi=x)
In this case, we need to estimate two regression functions, each at two boundary points
The expected outcome given the forcing variable $E(Y_i X_i=x)totherightandleftofthethresholdc$
The expected value of the treatment variable given the forcing variable $E(W_i X_i=x)totherightandleftofc$

Define

τY=limxcE(Yi|Xi=x)limxcE(Yi|Xi=x),τW=limxcE(Wi|Xi=x)limxcE(Wi|Xi=x)

with ˆτY, ˆτW denoting the corresponding estimators, so that

τFRD=τYτW, ˆτFRD=ˆτYˆτW Then, we can approximate the difference $\hat{\tau}{FRD} - \tau{FRD}$ by

ˆτFRDτFRD=1τW(ˆτYτY)τYτ2W(ˆτWτW)+op((ˆτYτY)+(ˆτwτw))

This is the basis for the asymptotic approximation to the MSE around h=0

AMSEFRD(h)=C1h4(1τW(m(2)Y,+(c)m(2)Y,(c))τYτ2W(m(2)W,+(c)m(2)W,(c)))2+C2Nhf(c)(1τ2W(σ2Y,+(c)+σ2Y,(c))+τ2Yτ4W(σ2W,+(c)+σ2W,(c))2τYτ3W(σYW,+(c)+σYW,(c)))

C1,C2 are functions of the kernel:

C1=14(ν22ν1ν3ν2ν0ν21)2, C2=ν22π02ν1ν2π1+ν21π2(ν2ν0ν21)2

where

νj=0ujK(u)du, πj=0ujK2(u)du

Difference between SRD and FRD is the addition of probability of treatment variable, therefore we need to consider the variance term of Wi and covariance of Wi,Yi

The bandwidth that minimizes the AMSE in the fuzzy design is

hopt,RFRD=CKN1/5×((σ2Y,+(c)+σ2Y,(c))+τ2FRD(σ2W,+(c)+σ2W,(c))2τFRD(σYW,+(c)+σYW,(c))f(c)((m(2)Y,+(c)m(2)Y,(c))τFRD(m(2)W,+(c)m(2)W,(c)))2)1/5

The analogue of the bandwidth proposed for the SRD is

ˆhopt,RFRD=CKN1/5×((ˆσ2Y,+(c)+ˆσ2Y,(c))+ˆτ2FRD(ˆσ2W,+(c)+ˆσ2W,(c))2ˆτFRD(ˆσYW,+(c)+ˆσYW,(c))ˆf(c)((ˆm(2)Y,+(c)ˆm(2)Y,(c))ˆτFRD(ˆm(2)W,+(c)ˆm(2)W,(c)))2+ˆrY,++ˆrY,+ˆτFRD(ˆrW,++ˆrW,))1/5

Implementation

First, using the algorithm described for the SRD case separately for the treatment indicator and the outcome, calculate

ˆτFRD,ˆf(c),ˆσ2Y,+,ˆσ2Y,,ˆσ2W,+,ˆσ2W,,ˆm(2)Y,+(c),ˆm(2)Y,(c),ˆm(2)W,+(c),ˆm(2)W,(c),ˆrY,+,ˆrY,,ˆrW,+,ˆrW,

Second, using the initial Silverman bandwidth, use the deviations from the means to estimate the conditional covariances $\hat{\sigma}{YW, +}(c), \hat{\sigma}{YW, -}(c)$

Then substitute everything into the expression for the bandwidth.

In practice, this often leads to bandwidth choices similar to those based on the optimal bandwidth for estimation of only the numerator of the RD estimand. One may therefore simply wish to use the basic algorithm ignoring the fact that the regression discontinuity design is fuzzy.


2) Additional covariatesPermalink


The presence of additional covariates does not affect the RD analyses very much. If the distribution of the additional covariates does not exhibit any discontinuity around the threshold for the forcing variable, and as a result, those covariates are approximately independent of the treatment indicator for smaples constructed to be close to the threshold.

In that case, the covariates only affect the precision of the estimator, and one can modify the previous analysis using the conditional variance of Yi given all covariates at the threshold, $\sigma^2_-(c x)and\sigma^2_+(c x)insteadofthevariances\sigma^2-(c)and\sigma^2+(c)$ that condition only on the forcing variable.

In practice, this modification does not affect the optimal bandwidth much unless the additional covariates have great explanatory power, and the basic algorithm is likely to perform adequately even in the presence of covariates.

정리

RD design에서 local linear regression을 적용할 때, 선택해야 할 parameter가 bandwidth h

이전에는 기존의 local linear regression에서의 bandwidth selection처럼, MISE를 minimize하는 h를 이용하여 RD design에 적용하였음

하지만 MISE를 criteria로 하여 찾은 optimal bandwidth h는 m(x) 함수(local linear estimator) 자체를 best하게 만들어주는 h

이를 그대로 RD design에 적용하는데는 문제가 있음

RD design에서 추정해야 하는 값과, local linear regression에서 추정해야 하는 값이 다르다! RD design에서는 cutoff에서의 추정값만을 사용하기 때문에, 전체 함수에 대해서 best하게 만들어주는 bandwidth가 아닌, cutoff에서의 추정값, 더 정확히는 tau_SRD를 best하게 추정해주는 h를 찾아야 한다

tau_SRD가 조금 특별한 값 - boundary point

위 두 문제 때문에 기존의 local linear regression 방법에서 사용되었던 bandwidth selection은 문제가 있다!

어떻게 해결했어?

tau_SRD에 대한 MSE를 정의 하고, 위 MSE 비슷한 AMSE를 minimize시키는 h를 최적의 bandwidth라고 하자!

AMSE 해석

첫번째 텀 : bias^2텀

구성 : m+^2(c)의 bias와 m-^2(c)의 bias로 이루어져 있음

두번째 텀 : variance 텀

구성 : m+^2(c)의 variance와 m-^2(c)의 variance로 이루어져 있음

AMSE가 MSE랑 많이 비슷해서

AMSE를 minimize하는 h가 최적의 h

—– 실제 estimation

optimal h 식에서 우리가 모르는 값이 6개

Ck - kernel function select하면 결정

나머지 모르는 값 - consistent하게만 정해주면 결과가 일치한다

문제점 : m(c)의 second derivative가 비슷하면 문제가 발생함(bandwidth가 무한히 커질 수 있지)

해결방법 : regularization : 분모항 bias에서 착안, 결과 분자항을 더해주면 error 작게 나오면서 위의 문제를 해결할 수 있음 (왜 분모에 더하지??? 조금 더 생각)

r도 사실은 몰라요 : 왜냐면 m(x)를 모르니까

해결 : quadratic regression 이용하여 추정함 - 너무 복잡해서 approximation 사용함

실제로 어떻게 하냐 - 3 step

  1. f(c), sigma^2-(c), sigma^2+(c) 요거 추정

이 때 사용하는 h를 silverman rule을 이용하여 제공함

제공된 h를 이용하여 emphirical distribution of X, 분산 추정치 사용함

point : 얘네들이 다 consistent하다! + 다른 consistent한 estimator 사용해도 된다.

  1. second derivative 추정

이 때 사용하는 h는 third order polynomial regression 이용하여 fit (local linear 사용하기 전에 RD design에서 사용했던 방법 중 하나)

왜 추정하냐? h 추정할 때 third derivative가 필요하기 때문

추정치 바탕으로 h2+, h2- 추정하고, second order local polynomial regression second derivative 추정

  1. 다 넣어서 h_opt 구하기

좋았다 + regularization안한거보다 한게 더 좋음

만약에 적용을 한다

수식적인 접근 : local likelihood 식이랑 위 논문에서 제공된 식이랑 다름

problem : 밑의 증명이 local linear regressor가 closed form이어서 증명이 가능했는데, 내가 사용할 모형은 closed form이 아닐 거 같아서 생각을 좀 더 해야 함 + categorical outcome 해석을 더 해야함 - 흐름은 이해했는데, 이 부분 해결을 못함 - 요거 어떻게 풀어냈는지 좀 알아내야 함

  • 찾은 논문 : 그 논문 + ordinal outcome에 대해서 같은 방법론 적용한거 밖에 못찾음

찾는 방법 google scholar - 개많음

”” - 너무 적음 - 맞나?

scopus, 다른 사이트 두개에서 찾았는데 안보였음

댓글남기기