## 经济代写|计量经济学代写Introduction to Econometrics代考|Multicollinearity

As discussed in Section 3.24, if $\boldsymbol{X}^{\prime} \boldsymbol{X}$ is singular then $\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}$ and $\widehat{\beta}$ are not defined. This situation is called strict multicollinearity as the columns of $\boldsymbol{X}$ are linearly dependent, i.e., there is some $\alpha \neq 0$ such that $\boldsymbol{X} \alpha=0$. Most commonly this arises when sets of regressors are included which are identically related. In Section 3.24 we discussed possible causes of strict multicollinearity and discussed the related problem of ill-conditioning which can cause numerical inaccuracies in severe cases.

A related common situation is near multicollinearity which is often called “multicollinearity” for brevity. This is the situation when the regressors are highly correlated. An implication of near multicollinearity is that individual coefficient estimates will be imprecise. This is not necessarily a problem for econometric analysis if the reported standard errors are accurate. However, robust standard errors can be sensitive to large leverage values which can occur under near multicollinearity. This leads to the undesirable situation where the coefficient estimates are imprecise yet the standard errors are misleadingly small.

We can see the impact of near multicollinearity on precision in a simple homoskedastic linear regression model with two regressors
$$Y=X_1 \beta_1+X_2 \beta_2+e$$
and
$$\frac{1}{n} \boldsymbol{X}^{\prime} \boldsymbol{X}=\left(\begin{array}{ll} 1 & \rho \ \rho & 1 \end{array}\right)$$
In this case
$$\operatorname{var}[\widehat{\beta} \mid \boldsymbol{X}]=\frac{\sigma^2}{n}\left(\begin{array}{cc} 1 & \rho \ \rho & 1 \end{array}\right)^{-1}=\frac{\sigma^2}{n\left(1-\rho^2\right)}\left(\begin{array}{cc} 1 & -\rho \ -\rho & 1 \end{array}\right) .$$
The correlation $\rho$ indexes collinearity since as $\rho$ approaches 1 the matrix becomes singular. We can see the effect of collinearity on precision by observing that the variance of a coefficient estimate $\sigma^2\left[n\left(1-\rho^2\right)\right]^{-1}$ approaches infinity as $\rho$ approaches 1 . Thus the more “collinear” are the regressors the worse the precision of the individual coefficient estimates.

## 经济代写|计量经济学代写Introduction to Econometrics代考|Clustered Sampling

In Section 4.2 we briefly mentioned clustered sampling as an alternative to the assumption of random sampling. We now introduce the framework in more detail and extend the primary results of this chapter to encompass clustered dependence.

It might be easiest to understand the idea of clusters by considering a concrete example. Duflo, Dupas and Kremer (2011) investigate the impact of tracking (assigning students based on initial test score) on educational attainment in a randomized experiment. An extract of their data set is available on the textbook webpage in the file DD K2011.

In 2005, 140 primary schools in Kenya received funding to hire an extra first grade teacher to reduce class sizes. In half of the schools (selected randomly) students were assigned to classrooms based on an initial test score (“tracking”); in the remaining schools the students were randomly assigned to classrooms. For their analysis the authors restricted attention to the 121 schools which initially had a single first-grade class.
The key regression ${ }^5$ in the paper is
$$\text { TestScore }{\text {ig }}=-0.071+0.138 \text { Tracking }_g+e{i g}$$
where TestScore $_{i g}$ is the standardized test score (normalized to have mean 0 and variance 1) of student $i$ in school $g$, and Tracking $g$ is a dummy equal to 1 if school $g$ was tracking. The OLS estimates indicate that schools which tracked the students had an overall increase in test scores by about 0.14 standard deviations, which is meaningful. More general versions of this regression are estimated, many of which take the form
$$\text { TestScore }{i g}=\alpha+\gamma \text { Tracking }_g+X{i g}^{\prime} \beta+e_{i g}$$
where $X_{i g}$ is a set of controls specific to the student (including age, gender, and initial test score).
A difficulty with applying the classical regression framework is that student achievement is likely correlated within a given school. Student achievement may be affected by local demographics, individual teachers, and classmates, all of which imply dependence. These concerns, however, do not suggest that achievement will be correlated across schools, so it seems reasonable to model achievement across schools as mutually independent. We call such dependence clustered.

$$Y=X_1 \beta_1+X_2 \beta_2+e$$

$$\frac{1}{n} \boldsymbol{X}^{\prime} \boldsymbol{X}=\left(\begin{array}{ll} 1 & \rho \ \rho & 1 \end{array}\right)$$

$$\operatorname{var}[\widehat{\beta} \mid \boldsymbol{X}]=\frac{\sigma^2}{n}\left(\begin{array}{cc} 1 & \rho \ \rho & 1 \end{array}\right)^{-1}=\frac{\sigma^2}{n\left(1-\rho^2\right)}\left(\begin{array}{cc} 1 & -\rho \ -\rho & 1 \end{array}\right) .$$

2005 年，肯尼亚的 140 所小学获得资金聘请额外的一年级教师以减少班级规模。在一半的学校 (随 机选择) 中，学生根据初始测试分数（“跟踪”）分配到教室；在其余学校，学生被随机分配到教室。 对于他们的分析，作者将注意力集中在最初只有一个年级班级的 121 所学校。

$$\text { TestScore ig }=-0.071+0.138 \text { Tracking }g+e i g$$ 其中测试分数 $i$ 是学生的标准化考试成绩（归一化为均值为 0 ，方差为 1 ) $i$ 在学校 $g$, 和追踪 $g$ 是一个 虚拟的等于 1 的学校 $g$ 正在跟踪。OLS 估计表明，跟踪学生的学校的考试成绩总体提高了约 0.14 个标 准差，这是有意义的。估计了此回归的更一般版本，其中许多采用以下形式 $$\text { TestScore } i g=\alpha+\gamma \text { Tracking }_g+X i g^{\prime} \beta+e{i g}$$

