Optimal Bandwidth Choice for the Regression discontinuity estimator
Paper Review : Optimal Bandwidth Choice for the Regression discontinuity EstimatorPermalink
1. Basic modelPermalink
Potential outcome framework
Notation
Sample size : NN
Yi(1)Yi(1) : potential outcome for unit ii given treatment
Yi(0)Yi(0) : potential outcome for unit ii without treatment
WiWi : Whether the treatment received or not. Wi=1Wi=1 : treatment received, Wi=0Wi=0 : not received
Then, the observed outcome YiYi is
Yi=Yi(Wi)=Yi(0) if Wi=0Yi=Yi(Wi)=Yi(1) if Wi=1Yi=WiYi(1)+(1−Wi)Yi(0)Yi=Yi(Wi)=Yi(0) if Wi=0Yi=Yi(Wi)=Yi(1) if Wi=1Yi=WiYi(1)+(1−Wi)Yi(0)Regression discontinuity design
XiXi : Forcing variable with scalar covariate. This variable determines the treatment
m(x)m(x) : conditional expectation of YiYi given Xi=xXi=x
m(x)=E(Yi|Xi=x)m(x)=E(Yi|Xi=x)In SRD design, treatment is determined solely by the value of the forcing variable XiXi being on either side of a fixed and known threshold cc or
Wi=I{Xi≥c}Wi=I{Xi≥c}Then, we focus on average effect of the treatment for units with covariate values equal to the threshold
τSRD=E(Yi(1)−Yi(0)|Xi=c)τSRD=E(Yi(1)−Yi(0)|Xi=c)If the conditional distribution functions $F_{Y(0) | X}(y | x)andandF_{Y(1) | X}(y | x)iscontinuousiniscontinuousinxforallforallyandtheconditionalfirstmomentsandtheconditionalfirstmomentsE(Y_i(1) | X_i=x)andandE(Y_i(0) | X_i=x)existandarecontinuousatexistandarecontinuousatx=c$, then |
The estimand is the difference of two regression functions evaluated at boundary points.
We use local linear regression at each side to estimate τSRDτSRD
Fit local linear regression
(ˆα−(x),ˆβ−(x))=argminα,βΣNi=1I(Xi<x)(Yi−α−β(Xi−x))2K(Xi−xh)(ˆα+(x),ˆβ+(x))=argminα,βΣNi=1I(Xi≥x)(Yi−α−β(Xi−x))2K(Xi−xh)(^α−(x),^β−(x))=argminα,βΣNi=1I(Xi<x)(Yi−α−β(Xi−x))2K(Xi−xh)(^α+(x),^β+(x))=argminα,βΣNi=1I(Xi≥x)(Yi−α−β(Xi−x))2K(Xi−xh)Then, the estimated regression function m(⋅) at x is
ˆmh(x)=ˆα−(x) if x<cˆmh(x)=ˆα+(x) if x≥cThen, the estimated τSRD is
ˆτSRD=ˆμ+−ˆμ−, whereˆμ−=limx↑cˆmh(x)=ˆα−(c), ˆμ+=limx↓cˆmh(x)=ˆα+(c),2. Error Criterion and Infeasible optimal bandwidth choicePermalink
1) Error criteriaPermalink
optimal choice of the bandwidth h에서 사용되었던 기존의 방법론 : cross validation or ad hoc methods.
Cross validation를 적용하기 위해 사용한 식 : mean integrated squared error criterion(MISE)
MISE(h)=E(∫x(ˆmh(x)−m(x))2f(x)dx)where f(x) is forcing variable.
Problem
MISE(h)를 이용하여 구한 optimal bandwidth h는 최적의 ˆτSRD를 만들어주는 h가 아닌 최적의 ˆmh(x)를 만들어주는 h가 됨. 즉 목적함수가 다름
Property of τSRD
(1) τSRD는 m(x) 중 x=c에서의 좌극한, 우극한 값만 필요함. 실제로 m(x)를 추정할 때 x=c를 기준으로 각각의 side의 data를 진행하여 local linear regression을 진행하기 때문에, 시행하는 local linear regression 횟수는 2번, τSRD 추정할 때 사용하는 값 역시 2개($\hat{\mu}-, \hat{\mu}+$)
(2) 추정하는 두 개의 값 ($\hat{\mu}-, \hat{\mu}+$)이 boundary point!
따라서 MISE가 아닌 다른 error criteria를 이용하여 best h를 추정해야 함. 즉, τSRD에 대한 mean squared error를 정의하고
MSE(h)=E((ˆτSRD−τSRD)2)=E(((ˆμ+−μ+)−(ˆμ−−μ−))2)RD design에서의 optimal bandwidth h는 위 MSE(h)를 최소화시키는 h가 됨
h∗=argminhMSE(h)Problem
sample size가 커져도, h∗가 0으로 converge하지 않는 경우가 발생
(side별로 estimation을 진행하기 때문에, bias가 상쇄되는 경우가 발생할 수 있음)
It does not seem appropriate to base estimation on global criteria when identification is local
- Focusing on the bandwidth that minimizes a first-order approximation to MSE(h) : Asymptotic mean squared error AMSE(h)
Second concern : Single bandwidth
local linear regression을 두번 진행하기 때문에, 각각의 local linear regression에 최적의 bandwidth가 존재할수도 있음, 따라서
MSE(h−,h+)=E(((ˆμ+(h+)−μ+)−(ˆμ−(h−)−μ−))2)를 최소화시키는 h−,h+를 찾을수도 있음.
Problem
Suppose the bias for both estimators are strictly increasing. Then, we can set h+(h−) such that the bias of the RD estimate cancel out.
(E(ˆμ−(h−))−μ−)−(E(ˆμ+(h+(h−))−μ+))=0h− 를 크게 setting한 후, 위의 bias가 0이 되도록 적절한 h+를 찾아주기만 하면 됨. 즉 bandwidth가 무한히 커지더라도, bias가 0이 될 수 있으므로 문제가 발생할 수 있음. 따라서 실제 적용에 문제가 발생할 수 있음!
2) An asymptotic expansion of the expected errorPermalink
Notation
m(k)+(c) : right limits of the kth derivative of m(x) at the threshold c
m(k)−(c) : left limits of the kth derivative of m(x) at the threshold c
σ2+(c) : The right limit of the conditional variance $\sigma^2(x)=Var(Y_i | X_i=x)atthethresholdc$ |
σ2−(c) : The left limit of the conditional variance $\sigma^2(x)=Var(Y_i | X_i=x)atthethresholdc$ |
Assumption
(1) (Yi,Xi),for i=1,…,N are iid
(2) The marginal distribution of the forcing variabel Xi, denoted f(⋅), is continuous and bounded away from zero at the threshold c
(3) The conditional mean $m(x)=E(Y_i | X_i=x)hasatleastthreecontinuousderivativesinanopenneighbourhoodofX=c.Therightandleftlimitsofthekthderivativeofm(x)atthethresholdcaredenotedbym^{(k)}+(c)andm^{(k)}-(c)$ . |
(4) The kernel K(⋅) is non-negative, bounded, differs from zero on a compact interval [0,a], and is continuous on (0,a)
(5) The conditional variance function $\sigma^2(x) = Var(Y_i | X_i=x)isboundedinanopenneighbourhoodofX=candrightandleftcontinuousatc$. |
(6) The second derivatives from the right and the left differ at the threshold: $m^{(2)}+(c) \neq m^{(2)}-(c)$
**Definition : **AMSE(h)
AMSE(h)=C1h4(m(2)+(c)−m(2)−(c))2+C2Nh(σ2+(c)f(c)+σ2−(c)f(c))C1,C2 are functions of the kernel:
C1=14(ν22−ν1ν3ν2ν0−ν21)2, C2=ν22π0−2ν1ν2π1+ν21π2(ν2ν0−ν21)2where
νj=∫∞0ujK(u)du, πj=∫∞0ujK2(u)duIn AMSE, the first term
C1h4(m(2)+(c)−m(2)−(c))2corresponds to the square of the bias and the second term
C2Nh(σ2+(c)f(c)+σ2−(c)f(c))corresponds to the variance.
In bias term clarifies the role that assumption (6) will play.
The leading term in the expansion of the bias : order h4 if assumption (6) holds
If the assumption (6) does not hold, the bias converges to zero faster, allowing for estimation for τSRD at a faster rate of convergence.
(실제로는 second derivative가 같은지 확인이 어렵기 때문에, assumption (6)을 만족한 상태에서 진행. (6)을 만족하지 않더라도, proposed estimator τSRD : constistent)
(second derivative가 같은 경우와 다른 경우 optimal bandwidth를 찾는 방법에서 차이가 있을 수 있음. 이 논문에서는 다른 경우에 대해서 다룸)
Lemma 1 (Mean Squared Error Approximation and Optimal Bandwidth
(1) Suppose assumptions (1) - (5) hold. Then
MSE(h)=AMSE(h)+op(h4+1Nh)(2) Suppose that also assumption (6) holds. Then,
hopt=argminhAMSE(h)=Ck(σ2+(c)+σ2−(c)f(c)(m(2)+(c)−m(2)−(c))2)1/5N−1/5where CK=(C24C1)1/5 , indexed by the kernel K(⋅)
For the edge kernel, with $K(u)=I{ | u | \leq1}(1- | u | ),theconstantC_{K, edge} \approx 3.4375$ |
For the uniform kernel with $K(u)=I{ | u | \leq1/2},theconstantC_{K, uniform} \approx 5.40$ |
3. Feasible optimal bandwidth choicePermalink
1) A simple plug-in bandwidthPermalink
에서 필요한 값
σ2+(c),σ2−(c),f(c),m(2)+(c),m(2)−(c),K(⋅)을 해당 unknown quantities의 consistent estimator로 교체
˜hopt=Ck(ˆσ2+(c)+ˆσ2−(c)ˆf(c)(ˆm(2)+(c)−ˆm(2)−(c))2)1/5N−1/5problem
First-order bias가 매우 작을 때 문제가 발생할 수 있음
위 경우 $m^{(2)}+(x) = m^{(2)}-(x)가발생할수있고,이는h_{opt}$ 식에서 분모 값이 매우 큰 값이 나올 수 있음. 이 경우 bandwidth가 부정확하고 variance가 커질 수 있음.
추가적으로, estimator for τSRD가 poor property를 가지게 됨. -because the true finite sample bias depends on global properties of the regression function that are not captured by the asymptotic approximation used to calculate the bandwidth.
(1) RegularizationPermalink
hopt의 분모가 0이 되지 않도록 수치 조절
The bias in the plug-in estimator for the reciprocal of the squared difference in second derivatives is
E(1(ˆm(2)+(c)−ˆm(2)−(c))2−1(m(2)+(c)−m(2)−(c))2)=(3(Var(ˆm(2)+(c))+Var(ˆm(2)−(c)))(m(2)+(c)−m(2)−(c))4)+o(N−2α)Then, for $r=3(Var(\hat{m}^{(2)}+(c))+Var(\hat{m}^{(2)}-(c)))$, the bias in the modified estimator for the reciprocal of the squared difference in second derivatives in of lower order
E(1(ˆm(2)+(c)−ˆm(2)−(c))2+r−1(m(2)+(c)−m(2)−(c))2)=o(N−2α)This in turn motivates the modified bandwidth estimator
ˆhopt=CK(ˆσ2−(c)+ˆσ2+(c)ˆf(c)((ˆm(2)+(c)−ˆm(2)−(c))2+r++r−))15N−15where
r+=3^Var(ˆm(2)(c)−),r−=3^Var(ˆm(2)+(c))Then, this bandwidth will not become infinite even in the cases when the difference in curvatures at the threshold is zero.
(2) Implementing the regularizationPermalink
We estimate the second derivative m(2)+(c) by fitting a quadratic function to the observations with Xi∈[c,c+h].
The initial bandwidth h here will be different from the bandwidth $\hat{h}{opt}usedintheestimationof\tau{SRD}$
Notation
Nh,+ : the number of units with covariate values in this interval
ˉX=1Nh,+Σc≤Xi≤c+hXi
$\hat{\mu}{j, h, +} = \frac{1}{N{h, +}}\Sigma_{c\leq X_i\leq c+h}(X_i-\bar{X})^j:jthcenteredmomentoftheX_iintheinterval[c, c+h]$.
Then, we can get r+
r+=12Nh,+(σ2+(c)ˆμ4,h,+−(ˆμ2,h,+)2−(ˆμ3,h,+)2/ˆμ2,h,+)However, fourth moments are difficult to estimate precisely, we approximate this expression exploiting the fact that for small h, the distribution of the forcing variable can be approximated by a uniform distribution on [c,c+h], so that
μ2,h,+≈h2/12, μ3,h,+≈0, μ4,h,+≈h460Using this facts,
ˆr+=2160ˆσ2+(c)Nh,+h4, ˆr−=2160ˆσ2−(c)Nh,−h4Then using $\hat{r} = \hat{r}- + \hat{r}+$, we can get
ˆhopt=CK(ˆσ2−(c)+ˆσ2+(c)ˆf(c)((ˆm(2)+(c)−ˆm(2)−(c))2+r++r−))15N−15- Check
We need specific estimators $\hat{\sigma}^2+(c), \hat{\sigma}^2-(c), \hat{f}(c), \hat{m}^{(2)}+(c), \hat{m}^{(2)}-(c)$
Any combination of consistent estimators for $\sigma^2+(c), \sigma^2-(c), f(c), m^{(2)}+(c), m^{(2)}-(c)$ substituted into expression, with or without the regularity terms, will have the same optimality properties
The proposes estimator is relatively simple, but the more important point is that it is a specific estimator: It gives a convenient starting point and benchmark for doing a sensitivity analyses regarding bandwidth choice.
The bandwidth selection algorithm to be relatively robust to these choices.
2) An algorithm for bandwidth selectionPermalink
(1) Step 1. Estimation of density f(c) and conditional variances $\sigma^2-(c)and\sigma^2+(c)$Permalink
First, calculate the sample variance of the forcing variable, S2X=Σ(Xi−ˉX)2/(N−1)
Use the Silverman rule to get a pilot bandwidth for calculating the density and variance at c
For normal kernel and a normal reference density : h=1.06SXN−1/5
Modification : Uniform kernel on [−1,1] and normal reference density
h1=1.84SXN−1/5Then, calculate
Nh1,−=ΣNi=1I{c−h1≤Xi<c}, Nh1,+=ΣNi=1I{c≤Xi≤c+h1}ˉYh1,−=1Nh1,−Σc−h1≤Xi<cYiˉYh1,+=1Nh1,+Σc≤Xi≤c+h1YiNow estimate the density of Xi at c as
ˆf(c)=Nh1,−+Nh1,−2Nh1and estimate the limit of the conditional variances of Yi given Xi=x at x=c
ˆσ2−(c)=1Nh1,−−1Σc−h1≤Xi<c(Yi−ˉYh1,−)2ˆσ2+(c)=1Nh1,+−1Σc≤Xi≤c+h1(Yi−ˉYh1,+)2these estimators are consistent for the density and the conditional variance, respectively.
(2) Step 2. Estimation of second derivatives $\hat{m}{+}^{(2)}(c)and\hat{m}{-}^{(2)}(c)$Permalink
First, we need pilot bandwidths h2,−,h2,+ Fit a third-order polynomial to the data, including an indicator for Xi≥0 .
Yi=γ0+γ1I(Xi≥c)+γ2(Xi−c)+γ3(Xi−c)2+γ4(Xi−c)3+ϵiand estimate m(3)(C) as ˆm(3)(c)=6ˆγ4
Note that ˆm(3)(c) is in general not a consistent estimate of m(3)(c) but will converge to some constant at a parametric rate. However, we do not need a consistent estimate of the third derivative at c here to obtain consistent estimator for the second derivative.
Calculate h2,+,h2,−
h2,+=3.56(ˆσ2+(c)ˆf(c)(ˆm(3)(c))2)1/7N−1/7+h2,−=3.56(ˆσ2−(c)ˆf(c)(ˆm(3)(c))2)1/7N−1/7+Where N− and N+ are the number of observations to the left and right of the threshold, respectively.
h2,−,h2,+ are estimates of the optimal bandwidth for calculation of the second derivative at a boundary point using a local quadratic and a uniform kernel.
Given the pilot bandwidth h2,+, we estimate the curvature m(2)+(c) by a local quadratic fit. To be precise, temporarily discard the observations other than the N2,+ oservations with c≤Xi≤c+h2,+.
Label the new data
ˆY+=(Y1,...,YN2,+), ˆX+=(X1,...,XN2,+)T=[1T1T2]where $\boldsymbol{T’}j = ((X_1-c)^j, …, (X{N_{2, +}}-c)^j)$
The estimated regression coefficients are
ˆλ=(T′T)−1T′ˆYand calculate ˆm2+(c)=2ˆλ3
Similarly, we can calculate ˆm2−(c)
(3) Step 3. Calculation of regularization term $\hat{r}-and\hat{r}+andcalculationof\hat{h}_{opt}$Permalink
Given the previous steps, the regularization terms are calculated as follows:
ˆr+=2160ˆσ2+(c)N2,+h42,+, ˆr−=2160ˆσ2−(c)N2,−h42,−,Then finally, we can get the proposed bandwidth:
ˆhopt=CK(ˆσ2−(c)+ˆσ2+(c)ˆf(c)((ˆm(2)+(c)−ˆm(2)−(c))2+r++r−))15N−15Given the bandwidth ˆhopt, we get
ˆτSRD=limx↓cˆmˆhopt(x)−limx↑cˆmˆhopt(x)where ˆmh(x) is the local linear regression estimator.
##### (3) Properties of algorithm
First, the resulting RD estimator ˆτSRD is consistent at the best rate for non-parametric regression functions at a point.
Second, the estimated constant term in the reference bandwidth converges to the best constant.
Third, we have a Li type optimality result for the mean squared error and consistency at the optimal rate for the RD estimate.
Theorem : Properties of ˆhopt
Suppose assumptions (1) - (5) hold. Then:
(1) consistency : If assumption (6) hold. then,
ˆτSRD−τSRD=Op(N−2/5)(2) consistency : If assumption (6) does not hold, then
ˆτSRD−τSRD=Op(N−3/7)(3) convergence of bandwidth
ˆhopt−hopthopt=op(1)(4) Li’s optimality
MSE(ˆhopt)−MSE(hopt)MSE(hopt)=op(1)If assumption (6) does not hold, there can be
m(2)+(x)=m(2)−(x)implying that the bias term of AMSE vanishes, which would improve convergence.
(4) DesJardins-McCall bandwidth selectionPermalink
The objective criterion is different
E((ˆμ+−μ+)2+(ˆμ−−μ−)2)The single optimal bandwidth based on the DesJardins and McCall criterion is
hDM=CK(σ2+(c)+σ2−(c)f(c)(m2+(c)2+m(2)−(c)2))1/5N−1/5This will in large samples lead to a smaller bandwidth than our proposed bandwidth choice if the second derivatives are of the same sign. Also, this model actually use different bandwidths on the left and the right and also use a Epancechnikov kernel.
(5) Ludwig-Miller cross-validationPermalink
Let N− and N+ be the number of observations with Xi<c and Xi≥c. for δ∈(0,1), let θ−(δ) and θ+(δ) be the δth quantile of the Xi among the subsample of observations with Xi<c and Xi≥c, respectively, so that
θ−(δ)=argmina{a|(Σni=1I{Xi≤a}≥δN−}θ+(δ)=argmina{a|(Σni=1I{c≤Xi≤a}≥δN+}Not the LM cross-validation criterion we use is of the form
CVδ(h)=ΣNi=1I{θ−(1−δ)≤Xi≤θ+(δ)}(Yi−ˆmh(Xi))2Key feature of ˆmh(x) is that for values of x<c, it only uses observations with Xi<x to estimate m(x) and for values of x≥c, it only uses observations with Xi>x to estimate m(x), so that ˆmh(Xi) does not depend on Yi, as is necessary for cross validation.
By using a value for δ close to zero, we only use observations close to the threshold to evaluate the cross-validation criterion.
Issue
by using LM cross-validation, the criterion focuses on minimizing
E((ˆμ+−μ+)2+(ˆμ−−μ−)2)rather than
E(((ˆμ+−ˆμ−)−(μ+−μ−))2)Therefore, even letting δ→0 with the sample size in the cross-validation procedure will not result in an optimal bandwidth.
5. ExtensionPermalink
1) Fuzzy regression designPermalink
In FRD design, the treatment Wi is not a deterministic function of the forcing variable. Instead, the probability $P(W_i=1 | X_i=x)changesdiscontinuouslyatthethresholdc$. in FRD design, the treatment effect is |
- In this case, we need to estimate two regression functions, each at two boundary points
-
The expected outcome given the forcing variable $E(Y_i X_i=x)totherightandleftofthethresholdc$ -
The expected value of the treatment variable given the forcing variable $E(W_i X_i=x)totherightandleftofc$
Define
τY=limx↓cE(Yi|Xi=x)−limx↑cE(Yi|Xi=x),τW=limx↓cE(Wi|Xi=x)−limx↑cE(Wi|Xi=x)with ˆτY, ˆτW denoting the corresponding estimators, so that
τFRD=τYτW, ˆτFRD=ˆτYˆτW Then, we can approximate the difference $\hat{\tau}{FRD} - \tau{FRD}$ by
ˆτFRD−τFRD=1τW(ˆτY−τY)−τYτ2W(ˆτW−τW)+op((ˆτY−τY)+(ˆτw−τw))This is the basis for the asymptotic approximation to the MSE around h=0
AMSEFRD(h)=C1h4(1τW(m(2)Y,+(c)−m(2)Y,−(c))−τYτ2W(m(2)W,+(c)−m(2)W,−(c)))2+C2Nhf(c)(1τ2W(σ2Y,+(c)+σ2Y,−(c))+τ2Yτ4W(σ2W,+(c)+σ2W,−(c))−2τYτ3W(σYW,+(c)+σYW,−(c)))C1,C2 are functions of the kernel:
C1=14(ν22−ν1ν3ν2ν0−ν21)2, C2=ν22π0−2ν1ν2π1+ν21π2(ν2ν0−ν21)2where
νj=∫∞0ujK(u)du, πj=∫∞0ujK2(u)duDifference between SRD and FRD is the addition of probability of treatment variable, therefore we need to consider the variance term of Wi and covariance of Wi,Yi
The bandwidth that minimizes the AMSE in the fuzzy design is
hopt,RFRD=CKN−1/5×((σ2Y,+(c)+σ2Y,−(c))+τ2FRD(σ2W,+(c)+σ2W,−(c))−2τFRD(σYW,+(c)+σYW,−(c))f(c)((m(2)Y,+(c)−m(2)Y,−(c))−τFRD(m(2)W,+(c)−m(2)W,−(c)))2)1/5The analogue of the bandwidth proposed for the SRD is
ˆhopt,RFRD=CKN−1/5×((ˆσ2Y,+(c)+ˆσ2Y,−(c))+ˆτ2FRD(ˆσ2W,+(c)+ˆσ2W,−(c))−2ˆτFRD(ˆσYW,+(c)+ˆσYW,−(c))ˆf(c)((ˆm(2)Y,+(c)−ˆm(2)Y,−(c))−ˆτFRD(ˆm(2)W,+(c)−ˆm(2)W,−(c)))2+ˆrY,++ˆrY,−+ˆτFRD(ˆrW,++ˆrW,−))1/5
Implementation
First, using the algorithm described for the SRD case separately for the treatment indicator and the outcome, calculate
ˆτFRD,ˆf(c),ˆσ2Y,+,ˆσ2Y,−,ˆσ2W,+,ˆσ2W,−,ˆm(2)Y,+(c),ˆm(2)Y,−(c),ˆm(2)W,+(c),ˆm(2)W,−(c),ˆrY,+,ˆrY,−,ˆrW,+,ˆrW,−Second, using the initial Silverman bandwidth, use the deviations from the means to estimate the conditional covariances $\hat{\sigma}{YW, +}(c), \hat{\sigma}{YW, -}(c)$
Then substitute everything into the expression for the bandwidth.
In practice, this often leads to bandwidth choices similar to those based on the optimal bandwidth for estimation of only the numerator of the RD estimand. One may therefore simply wish to use the basic algorithm ignoring the fact that the regression discontinuity design is fuzzy.
2) Additional covariatesPermalink
The presence of additional covariates does not affect the RD analyses very much. If the distribution of the additional covariates does not exhibit any discontinuity around the threshold for the forcing variable, and as a result, those covariates are approximately independent of the treatment indicator for smaples constructed to be close to the threshold.
In that case, the covariates only affect the precision of the estimator, and one can modify the previous analysis using the conditional variance of Yi given all covariates at the threshold, $\sigma^2_-(c | x)and\sigma^2_+(c | x)insteadofthevariances\sigma^2-(c)and\sigma^2+(c)$ that condition only on the forcing variable. |
In practice, this modification does not affect the optimal bandwidth much unless the additional covariates have great explanatory power, and the basic algorithm is likely to perform adequately even in the presence of covariates.
정리
RD design에서 local linear regression을 적용할 때, 선택해야 할 parameter가 bandwidth h
이전에는 기존의 local linear regression에서의 bandwidth selection처럼, MISE를 minimize하는 h를 이용하여 RD design에 적용하였음
하지만 MISE를 criteria로 하여 찾은 optimal bandwidth h는 m(x) 함수(local linear estimator) 자체를 best하게 만들어주는 h
이를 그대로 RD design에 적용하는데는 문제가 있음
RD design에서 추정해야 하는 값과, local linear regression에서 추정해야 하는 값이 다르다! RD design에서는 cutoff에서의 추정값만을 사용하기 때문에, 전체 함수에 대해서 best하게 만들어주는 bandwidth가 아닌, cutoff에서의 추정값, 더 정확히는 tau_SRD를 best하게 추정해주는 h를 찾아야 한다
tau_SRD가 조금 특별한 값 - boundary point
위 두 문제 때문에 기존의 local linear regression 방법에서 사용되었던 bandwidth selection은 문제가 있다!
어떻게 해결했어?
tau_SRD에 대한 MSE를 정의 하고, 위 MSE 비슷한 AMSE를 minimize시키는 h를 최적의 bandwidth라고 하자!
AMSE 해석
첫번째 텀 : bias^2텀
구성 : m+^2(c)의 bias와 m-^2(c)의 bias로 이루어져 있음
두번째 텀 : variance 텀
구성 : m+^2(c)의 variance와 m-^2(c)의 variance로 이루어져 있음
AMSE가 MSE랑 많이 비슷해서
AMSE를 minimize하는 h가 최적의 h
—– 실제 estimation
optimal h 식에서 우리가 모르는 값이 6개
Ck - kernel function select하면 결정
나머지 모르는 값 - consistent하게만 정해주면 결과가 일치한다
문제점 : m(c)의 second derivative가 비슷하면 문제가 발생함(bandwidth가 무한히 커질 수 있지)
해결방법 : regularization : 분모항 bias에서 착안, 결과 분자항을 더해주면 error 작게 나오면서 위의 문제를 해결할 수 있음 (왜 분모에 더하지??? 조금 더 생각)
r도 사실은 몰라요 : 왜냐면 m(x)를 모르니까
해결 : quadratic regression 이용하여 추정함 - 너무 복잡해서 approximation 사용함
실제로 어떻게 하냐 - 3 step
- f(c), sigma^2-(c), sigma^2+(c) 요거 추정
이 때 사용하는 h를 silverman rule을 이용하여 제공함
제공된 h를 이용하여 emphirical distribution of X, 분산 추정치 사용함
point : 얘네들이 다 consistent하다! + 다른 consistent한 estimator 사용해도 된다.
- second derivative 추정
이 때 사용하는 h는 third order polynomial regression 이용하여 fit (local linear 사용하기 전에 RD design에서 사용했던 방법 중 하나)
왜 추정하냐? h 추정할 때 third derivative가 필요하기 때문
추정치 바탕으로 h2+, h2- 추정하고, second order local polynomial regression second derivative 추정
- 다 넣어서 h_opt 구하기
좋았다 + regularization안한거보다 한게 더 좋음
만약에 적용을 한다
수식적인 접근 : local likelihood 식이랑 위 논문에서 제공된 식이랑 다름
problem : 밑의 증명이 local linear regressor가 closed form이어서 증명이 가능했는데, 내가 사용할 모형은 closed form이 아닐 거 같아서 생각을 좀 더 해야 함 + categorical outcome 해석을 더 해야함 - 흐름은 이해했는데, 이 부분 해결을 못함 - 요거 어떻게 풀어냈는지 좀 알아내야 함
- 찾은 논문 : 그 논문 + ordinal outcome에 대해서 같은 방법론 적용한거 밖에 못찾음
찾는 방법 google scholar - 개많음
”” - 너무 적음 - 맞나?
scopus, 다른 사이트 두개에서 찾았는데 안보였음
댓글남기기