# 统计代写|回归分析代写Regression Analysis代考|The Linear Regression Function, and Why It Is Wrong

## 统计代写|回归分析代写Regression Analysis代考|The Linear Regression Function, and Why It Is Wrong

Usually, when people learn regression, they learn to understand the relationship between $Y$ and $X$ as a linear function. Specifically, the linearity assumption states that the means of the conditional distributions $p(y \mid x)$ fall precisely on a straight line of the form $\beta_0+\beta_1 x$, i.e., that $\mu_x=\mathrm{E}(Y \mid X=x)=\beta_0+\beta_1 x$.

See Figure 1.7 above for a graphic illustration of what this assumption tells you about the means of the conditional distributions: In that graph, four conditional distributions are shown, corresponding to four distinct values $X=x$. The linearity assumption states that the means of those four distributions, as well as the means for all other conditional distributions that are not shown in Figure 1.7, fall precisely on a straight line $\beta_0+\beta_1 x$, for some values of the parameters $\beta_0$ and $\beta_1$. The linearity assumption does not require that you know the numerical values of $\beta_0$ and $\beta_1$; rather, it simply states that the conditional means fall on some line $\beta_0+\beta_1 x$, for some (usually unknown) numerical values of the parameters $\beta_0$ and $\beta_1$.

The parameter $\beta_0$ is called the intercept of the line. When $\mathrm{E}(Y \mid X=x)=\beta_0+\beta_1 x$, it follows that $\mathrm{E}(Y \mid X=0)=\beta_0+\beta_1(0)=\beta_0$. In words, if the linearity assumption is true, then the mean of the distribution of $Y$ when $X=0$ is equal to $\beta_0$. Often, the range of $X$ does not include 0 , in which case that interpretation is not particularly useful. In such cases, you can vaguely interpret $\beta_0$ as a parameter related to the unconditional mean of $Y$ : If the mean of $Y$ is larger, then $\beta_0$ will be larger to reflect the vertical height, or distance from zero, of the regression function.

The parameter $\beta_1$ tells you something about the relationship between $Y$ and $X$. If the linearity assumption is true, then this parameter is the difference between the conditional means of the distributions of $Y$ where the $X$ variable differs by 1.0, which can be demonstrated as follows:
\begin{aligned} \mathrm{E}(Y \mid X=x+1)-\mathrm{E}(Y \mid X=x) & =\left{\beta_0+\beta_1(x+1)\right}-\left(\beta_0+\beta_1 x\right) \ & =\left{\beta_0+\beta_1 x+\beta_1\right}-\beta_0-\beta_1 x \ & =\beta_0+\beta_1 x+\beta_1-\beta_0-\beta_1 x \ & =\beta_1 \end{aligned}

## 统计代写|回归分析代写Regression Analysis代考|LOESS: An Estimate of the True (Curved) Mean Function

So, the linearity assumption $\mathrm{E}(Y \mid X=x)=\beta_0+\beta_1 x$ is wrong. What is right? What is right is that $\mathrm{E}(Y \mid X=x)=f(x)$, which is some function $f(x)$ that you do not know. However, data allow you to estimate such unknown quantities.

If your data set had lots of repeats on particular $x$ values, you could use the average of the $Y$ data values where $X=x$ to estimate the function $f(x)$. For example, consider the data in Table 1.6 below obtained from a survey of students in a class. The $Y$ variable is “rating of the instructor,” on a discrete 1 to 5 scale (where 5 means “best”), and the $X$ variable is “expected grade in course,” where $0=$ ” $\mathrm{F}^{\prime \prime}, 1=$ ” $\mathrm{D}$ “, $2=$ ” $\mathrm{C}$ “, $3=$ “B”, and $4=$ “A.”

Using the data shown in Table 1.6, an obvious estimate of $\mathrm{E}(Y \mid X=2)$ is $\hat{f}(2)=(2+3) / 2=2.5$ (the hat $\left(“{ }^{\prime \prime}\right)$ signifies that this is just an estimate, not the true expected value). Similar, intuitively obvious estimates are $\hat{f}(3)=(5+2+4+4) / 4=3.75$, and $\hat{f}(4)=(5+4+4+5) / 4=4.5$.

The data and the estimated mean function are shown in Figure 1.14. Notice that the function $\hat{f}(x)$ is not perfectly linear, as is expected since there are three distinct $X$ values.
$R$ code for Figure 1.14
$\mathrm{x}=\mathrm{c}(2,2,3,3,3,3,4,4,4,4)$
$y=c(2,3,5,2,4,4,5,4,4,5)$
$\mathrm{x} 1=\mathrm{c}(2,3,4)$
f. hat $=c(2.5,3.75,4.5)$
plot (x, jitter $(y, 5)$, $y l a b=$ “Rating of Instructor (jittered)”,
$x l a b=$ “Expected Grade”, cex. axis $=0.8$, cex. $l a b=0.8$ )
points $(x 1, f$. hat, pch $=” X “)$
points $(x 1, f$. hat, type=”1″, Ity=2)

$R$代码参见图1.14
$\mathrm{x}=\mathrm{c}(2,2,3,3,3,3,4,4,4,4)$
$y=c(2,3,5,2,4,4,5,4,4,5)$
$\mathrm{x} 1=\mathrm{c}(2,3,4)$
F. hat $=c(2.5,3.75,4.5)$
plot (x, jitter $(y, 5)$, $y l a b=$“教员评分(抖动)”，
$x l a b=$“期望成绩”，等。轴$=0.8$, cex。$l a b=0.8$)

