# 经济代写|计量经济学代写ECONOMETRICS代考|ECON395 Leave-One-Out Regression

## 经济代写|计量经济学代写ECONOMETRICS代考|Leave-One-Out Regression

There are a number of statistical procedures – residual analysis, jackknife variance estimation, crossvalidation, two-step estimation, hold-out sample evaluation- which make use of estimators constructed on sub-samples. Of particular importance is the case where we exclude a single observation and then repeat this for all observations. This is called leave-one-out (LOO) regression.

Specifically, the leave-one-out estimator of the regression coefficient $\beta$ is the least squares estimator constructed using the full sample excluding a single observation $i$. This can be written as
\begin{aligned} \widehat{\beta}{(-i)} &=\left(\sum{j \neq i} X_j X_j^{\prime}\right)^{-1}\left(\sum_{j \neq i} X_j Y_j\right) \ &=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}-X_i X_i^{\prime}\right)^{-1}\left(\boldsymbol{X}^{\prime} \boldsymbol{Y}-X_i Y_i\right) \ &=\left(\boldsymbol{X}{(-i)}^{\prime} \boldsymbol{X}{(-i)}\right)^{-1} \boldsymbol{X}{(-i)}^{\prime} \boldsymbol{Y}{(-i)} . \end{aligned}
Here, $\boldsymbol{X}{(-i)}$ and $\boldsymbol{Y}{(-i)}$ are the data matrices omitting the $i^{\text {th }}$ row. The notation $\widehat{\beta}{(-i)}$ or $\widehat{\beta}{-i}$ is commonly used to denote an estimator with the $i^{\text {th }}$ observation omitted. There is a leave-one-out estimator for each observation, $i=1, \ldots, n$, so we have $n$ such estimators.

The leave-one-out predicted value for $Y_i$ is $\widetilde{Y}i=X_i^{\prime} \widehat{\beta}{(-i)}$. This is the predicted value obtained by estimating $\beta$ on the sample without observation $i$ and then using the covariate vector $X_i$ to predict $Y_i$. Notice that $\widetilde{Y}_i$ is an authentic prediction as $Y_i$ is not used to construct $\widetilde{Y}_i$. This is in contrast to the fitted values $\hat{Y}_i$ which are functions of $Y_i$.

## 经济代写|计量经济学代写ECONOMETRICS代考|Inﬂﬂuential Observations

Another use of the leave-one-out estimator is to investigate the impact of influential observations, sometimes called outliers. We say that observation $i$ is influential if its omission from the sample induces a substantial change in a parameter estimate of interest.

For illustration consider Figure $3.4$ which shows a scatter plot of realizations $\left(Y_i, X_i\right)$. The 25 observations shown with the open circles are generated by $X_i \sim U[1,10]$ and $Y_i \sim \mathrm{N}\left(X_i, 4\right)$. The $26^{\text {th }}$ observation shown with the filled circle is $X_{26}=9, Y_{26}=0$. (Imagine that $Y_{26}=0$ was incorrectly recorded due to a mistaken key entry.) The figure shows both the least squares fitted line from the full sample and that obtained after deletion of the $26^{t h}$ observation from the sample. In this example we can see how the $26^{\text {th }}$ observation (the “outlier”) greatly tilts the least squares fitted line towards the $26^{\text {th }}$ observation. In fact, the slope coefficient decreases from $0.97$ (which is close to the true value of $1.00$ ) to $0.56$, which is substantially reduced. Neither $Y_{26}$ nor $X_{26}$ are unusual values relative to their marginal distributions so this outlier would not have been detected from examination of the marginal distributions of the data. The change in the slope coefficient of $-0.41$ is meaningful and should raise concern to an applied economist.

From (3.43) we know that
$$\widehat{\beta}-\widehat{\beta}_{(-i)}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} X_i \widetilde{e}_i .$$
By direct calculation of this quantity for each observation $i$, we can directly discover if a specific observation $i$ is influential for a coefficient estimate of interest.

For a general assessment, we can focus on the predicted values. The difference between the fullsample and leave-one-out predicted values is
$$\widehat{Y}i-\widetilde{Y}_i=X_i^{\prime} \widehat{\beta}-X_i^{\prime} \widehat{\beta}{(-i)}=X_i^{\prime}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} X_i \widetilde{e}i=h{i i} \widetilde{e}i$$ which is a simple function of the leverage values $h{i i}$ and prediction errors $\widetilde{e}i$. Observation $i$ is influential for the predicted value if $\left|h{i i} \widetilde{e}i\right|$ is large, which requires that both $h{i i}$ and $\left|\widetilde{e}_i\right|$ are large.

## 经济代写|计量经济学代写ECONOMETRICS代考|Leave-One-Out回归

\begin{aligned} \widehat{\beta}{(-i)} &=\left(\sum{j \neq i} X_j X_j^{\prime}\right)^{-1}\left(\sum_{j \neq i} X_j Y_j\right) \ &=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}-X_i X_i^{\prime}\right)^{-1}\left(\boldsymbol{X}^{\prime} \boldsymbol{Y}-X_i Y_i\right) \ &=\left(\boldsymbol{X}{(-i)}^{\prime} \boldsymbol{X}{(-i)}\right)^{-1} \boldsymbol{X}{(-i)}^{\prime} \boldsymbol{Y}{(-i)} . \end{aligned}

$Y_i$的遗漏预测值为$\widetilde{Y}i=X_i^{\prime} \widehat{\beta}{(-i)}$。这是在没有观察$i$的样本上估计$\beta$得到的预测值，然后使用协变量向量$X_i$预测$Y_i$。注意，$\widetilde{Y}_i$是一个真实的预测，因为$Y_i$没有用于构造$\widetilde{Y}_i$。这与$Y_i$ .

## 经济代写|计量经济学代写ECONOMETRICS代考|有影响的观察

$$\widehat{\beta}-\widehat{\beta}_{(-i)}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} X_i \widetilde{e}_i .$$

$$\widehat{Y}i-\widetilde{Y}_i=X_i^{\prime} \widehat{\beta}-X_i^{\prime} \widehat{\beta}{(-i)}=X_i^{\prime}\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} X_i \widetilde{e}i=h{i i} \widetilde{e}i$$，这是杠杆值$h{i i}$和预测误差$\widetilde{e}i$的一个简单函数。如果$\left|h{i i} \widetilde{e}i\right|$很大，则观察$i$对预测值有影响，这要求$h{i i}$和$\left|\widetilde{e}_i\right|$都很大。

