Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|The Variable Inclusion Principle

Earlier in this chapter and in prior chapters, we alluded to this concept, which is stated specifically as follows:
The variable inclusion principle
Include all lower-order terms related to higher-order terms in a polynomial regression model (regardless of “significance” or lack thereof.)
Recall that the “order” of a polynomial term is either the exponent or sum of the exponents. Lower order terms are those terms in the same variable or variables, with smaller order. Here are some applications of the variable inclusion principle:

• Quadratic term, $x^2$. Lower order terms are $x^1=x$ and $x^0=1$. The variable inclusion principle states that if you have $x^2$ in the model, then you should also include $x$ (the linear term) and 1 (the intercept).
• Cubic term, $x^3$. Lower order terms are $x^2, x^1=x$, and $x^0=1$. The variable inclusion principle states that if you have $x^3$ in the model, then you should also include $x^2$ (the quadratic term), $x$ (the linear term), and 1 (the intercept).
• Interaction term, $x_1 x_2$. Write this term as $x_1^1 x_2^1$. Lower order terms are $x_1^1 x_2^0=x_1$, $x_1^0 x_2^1=x_2$, and $x_1^0 x_2^0=1$. The variable inclusion principle states that if you have $x_1 x_2$ in the model, then you should also include $x_1, x_2$ and 1 (the intercept).
• Linear term, $x$. The only lower-order term is $x^0=1$ (the intercept). The variable inclusion principle states that you should always put an intercept in the model.

## 统计代写|回归分析代写Regression Analysis代考|Why You Should Include the Linear Term in a Quadratic Model

The model is truly linear, and not curved, in the simulation above. In the model that obeys the inclusion criterion, the quadratic term is correctly deemed “insignificant” $(p=0.525)$. The linear term is also “insignificant” $(p=0.161)$, so one might be tempted to remove it from the model, thus violating the inclusion principle. In the resulting model that does not obey the inclusion criterion, where the linear term is excluded, the quadratic term is highly “significant” $\left(p<2 \times 10^{-16}\right)$.

Again, if you violate the variable inclusion principle, then the coefficient of the higherorder term does not measure what you want it to measure. Here, leaving out the linear term (thus violating the inclusion principle) means that the coefficient of the quadratic term does not measure curvature. Rather, since the linear term was excluded, the quadratic term simply acts as a surrogate for the linear term. Instead of measuring only curvature, the coefficient of the quadratic term also measures the linear effect, if you violate the variable inclusion principle.

## 统计代写|回归分析代写Regression Analysis代考|Why You Should Include the Linear Term in a Quadratic Model

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|Multicollinearity

Multicollinearity $(\mathrm{MC})$ refers to the $X$ variables being “collinear” to varying degrees. In the case of two $X$ variables, $X_1$ and $X_2$, collinearity means that the two variables are close to linearly related. A “perfect” multicollinearity means that they are perfectly linearly related. See Figure 8.4.
Often, “multicollinearity” with just two $X$ variables is called simply “collinearity.” Figure 8.4, right panel, illustrates the meaning of the term “collinear.”
With more $X$ variables, it is not so easy to visualize multicollinearity. But if one of the $X$ variables, say $X_j$, is closely related to all the other $X$ variables via
$$X_j \cong a_0 X_0+a_1 X_1+\ldots+a_{j-1} X_{j-1}+a_{j+1} X_{j+1}+\ldots+a_k X_k$$
then there is multicollinearity. And if the ” $\cong$ ” is, in fact, an ” $=$ ” in the equation above, then there is a perfect multicollinearity. (Note that the variable $X_0$ is the intercept column having all 1’s).
A perfect multicollinearity causes the $\mathbf{X}^{\mathrm{T}} \mathbf{X}$ matrix to be non-invertible, implying that there are no unique least-squares estimates. Equations 0 through $k$ shown in Section 7.1 can still be solved for estimates of the $\beta$ ‘s, but some equation or equations will be redundant with others, implying that there are infinitely many solutions for $\hat{\beta}_0, \hat{\beta}_1, \ldots$, and $\hat{\beta}_k$. Thus the effects of the individual $X_j$ variables on $Y$ are not identifiable when there is a perfect multicollinearity.

To understand the notion that there can be an infinity of solutions for the estimated $\beta$ ‘s, consider the case where there is only one $X$ variable. A perfect multicollinearity, in this case, means that $X_1=a_0 X_0$, so that the $X_1$ column is all the same number, $a_0$. Figure 8.5 shows how data might look in this case, where $x_i=10$ for every $i=1, \ldots, n$, and also shows several possible least-squares fits, all of which have the same minimum sum of squared errors.

## 统计代写|回归分析代写Regression Analysis代考|The Quadratic Model in One $X$ Variable

The simplest of polynomial models is the simple quadratic model,
$$f(x)=\beta_0+\beta_1 x+\beta_2 x^2$$
These models are quite flexible, see Figure 9.1 for various examples.
R code for Figure 9.1
$x=\operatorname{seq}(0.2,10, .1)$
$\mathrm{EY} 1=-3+0.3 * \mathrm{X}+0.1^{\star} \mathrm{X}^{\wedge} 2 ; \mathrm{EY} 2=2-0.9 * \mathrm{X}+0.3{ }^{\star} \mathrm{X}^{\wedge} 2$
$\mathrm{EY} 3=-1+3.0 * \mathrm{X}-0.4{ }^{\star} \mathrm{X}^{\wedge} 2 ; \mathrm{EY} 4=1+1.2{ }^{\star} \mathrm{X}-0.1 \mathrm{X}^{\wedge} 2$
plot $(x, E Y 1$, type=”1″, lty=1, ylab $=” E(Y \mid x=x) “)$
points(x, EY2, type=”1″, lty=2); points(x, EY3, type=”1″, lty=3)
points (x, EY4, type=”1″, lty=4)
legend $(0,10, \mathrm{c}$ (“b0 b1 b2 “,”-3.0 $0.3 \quad 0.1 “, “$
$\mathrm{C}(0,1,2,3,4))$

As it is the case for all models in this chapter, the ” $\beta$ ” coefficients cannot be interpreted in the way discussed in Chapter 8 , where you increase the value of one $X$ variable while keeping the others fixed, because there are functional relationships among the various terms in the model. Specifically, in the example of a quadratic polynomial function, you cannot increase $x^2$, while keeping $x$ fixed. But you can still interpret the parameters by understanding the graphs in Figure 9.1. In particular, $\beta_2$ measures curvature: When $\beta_2<0$, there is concave curvature, when $\beta_2>0$ there is convex curvature, and when $\beta_2=0$, there is no curvature. Further, the larger the $\left|\beta_2\right|$, the more extreme is the curvature.

The intercept term $\beta_0$ has the same meaning as before: It is the value of $f(x)$ when $x=0$. This interpretation is correct but, as always, it is a not useful interpretation when the range of the $x$ data does not cover 0 . Still, the coefficient is needed in the model as a “fitting constant,” which adjusts the function up or down as needed to match the observable data.

To interpret $\beta_1$, note that it is possible to increase $x$ by 1 when $x^2$ is fixed, but the only way that can happen is when you move from $x=-0.5$ to $x=+0.5$. Consider the solid graph shown in Figure 9.1: Here, $f(x)=-3+0.3 x+0.1 x^2$, so that $f(-0.5)=-3+0.3(-0.5)+0.1(-0.5)^2=-3.125$, and $f(+0.5)=-2.825$; these values differ by exactly 0.3 , the coefficient $\beta_1$ that multiplies $x$. While this math gives a correct way to interpret $\beta_1$ in the quadratic model, it is not useful if the range of the $X$ data does not cover zero.

## 统计代写|回归分析代写Regression Analysis代考|Multicollinearity

$$X_j \cong a_0 X_0+a_1 X_1+\ldots+a_{j-1} X_{j-1}+a_{j+1} X_{j+1}+\ldots+a_k X_k$$

## 统计代写|回归分析代写Regression Analysis代考|The Quadratic Model in One $X$ Variable

$$f(x)=\beta_0+\beta_1 x+\beta_2 x^2$$

$x=\operatorname{seq}(0.2,10, .1)$
$\mathrm{EY} 1=-3+0.3 * \mathrm{X}+0.1^{\star} \mathrm{X}^{\wedge} 2 ; \mathrm{EY} 2=2-0.9 * \mathrm{X}+0.3{ }^{\star} \mathrm{X}^{\wedge} 2$
$\mathrm{EY} 3=-1+3.0 * \mathrm{X}-0.4{ }^{\star} \mathrm{X}^{\wedge} 2 ; \mathrm{EY} 4=1+1.2{ }^{\star} \mathrm{X}-0.1 \mathrm{X}^{\wedge} 2$
Plot $(x, E Y 1$, type=”1″， lty=1, ylab $=” E(Y \mid x=x) “)$
points(x, EY2, type=”1″， lty=2);points(x, EY3, type=”1″， lty=3)
points (x, y4, type=”1″， lty=4)

$\mathrm{C}(0,1,2,3,4))$

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|The Causal Model

If you want the coefficient of $X$ to be a causal effect, you need a different model than the conditional mean model, one that includes many more $X^{\prime}$. Here is one:
$$Y=\gamma_0+\gamma_1 X_1+\gamma_2 X_2+\ldots+\gamma_k X_k+\varepsilon^{\prime}$$
In this model:

$X_1$ is your main $X$ variable of interest, the one whose causal effect on $Y$ you wish to measure.

$X_2$ through $X_k$ are all other (usually unmeasured) variables that also causally affect $Y$. In this model, changes (manipulations) in $X_1$ cause changes in the distribution of $Y$ when all other possible causal variables $X_2-X_k$ are held fixed.

$\varepsilon^{\prime}$ is a random error term. This term might be identically zero, in which case the causal model is a deterministic model, and this does not change any of the arguments below. Otherwise, with enough $X^{\prime}$ s, it is reasonable to assume that this term is uncorrelated with everything; e.g., $\varepsilon^{\prime}$ might be subatomic quantum noise.
You can re-arrange the causal model as follows:
$$Y=\gamma_0+\gamma_1 X+\delta$$
where $X=X_1$ and
$$\delta=\gamma_2 X_2+\ldots+\gamma_k X_k+\varepsilon^{\prime}$$
In this model, $\operatorname{Cov}(X, \delta) \neq 0$. Instead, $\operatorname{Cov}(X, \delta)=\sum_{j=2}^k \gamma_j \operatorname{Cov}\left(X, X_j\right)$. Thus, applying Theorem 6.5, the OLS estimate $\hat{\sigma}{x y} / \hat{\sigma}_x^2$ is inconsistent for $\gamma_1$, with probability limit $\gamma_1+\sum{j=2}^k \gamma_j \operatorname{Cov}\left(X, X_j\right) / \sigma_x^2$.

## 统计代写|回归分析代写Regression Analysis代考|The Instrumental Variable Method

The goal is to come up with an estimator whose probability limit is $\gamma_1$. If you could measure all the relevant unobserved confounders $X_2, \ldots, X_k$, then the simple OLS multiple regression estimate of $\gamma_1$ in model (1) would do the trick. But you usually cannot. And even if you could, there might be hundreds of such variables, and you would not want to run OLS with so many predictors. What to do? Try to find an instrumental variable.

Consider the model $Y=\gamma_0+\gamma_1 X+\delta$, where $\gamma_1$ is the causal effect of $X$. An instrumental variable is a variable $Z$ such that:

1. $Z$ is correlated with $X$ (preferably reasonably strongly correlated), and
2. $Z$ is uncorrelated with $\delta$

The instrumental variable (IV) estimator of $\gamma_1$
The instrumental variable estimator of $\gamma_1$ is given by
$$\hat{\gamma}1=\frac{\hat{\sigma}{z y}}{\hat{\sigma}{z x}}$$ Theorem 6.6: Consistency of the IV Estimator Assume the data pairs $\left(X_i, Y_i, Z_i\right)$ are sampled iid from $p(x, y, z)$, with all variances finite. Assume in addition that $Z$ is an instrumental variable to the causal model $Y=\gamma_0+\gamma_1 X+\delta$. Then $\hat{\gamma}_1=\hat{\sigma}{z y} / \hat{\sigma}_{z x}$ is a consistent estimator of $\gamma_1$.

## 计代写|回归分析代写Regression Analysis代考|The Causal Model

$$Y=\gamma_0+\gamma_1 X_1+\gamma_2 X_2+\ldots+\gamma_k X_k+\varepsilon^{\prime}$$

$X_1$ 是您感兴趣的主要$X$变量，您希望测量其对$Y$的因果影响。

$X_2$ 通过$X_k$，所有其他(通常未测量的)变量也会对$Y$产生因果影响。在这个模型中，当所有其他可能的因果变量$X_2-X_k$保持固定时，$X_1$的变化(操纵)导致$Y$分布的变化。

$\varepsilon^{\prime}$ 是随机误差项。这一项可能等于零，在这种情况下，因果模型是确定性模型，这不会改变下面的任何论点。否则，只要有足够的$X^{\prime}$ s，就可以合理地假设这一项与一切都不相关;例如，$\varepsilon^{\prime}$可能是亚原子量子噪声。

$$Y=\gamma_0+\gamma_1 X+\delta$$

$$\delta=\gamma_2 X_2+\ldots+\gamma_k X_k+\varepsilon^{\prime}$$

## 统计代写|回归分析代写Regression Analysis代考|The Instrumental Variable Method

$Z$ 与$X$相关(最好是合理的强相关)，并且

$Z$ 与 $\delta$

$\gamma_1$的工具变量(IV)估计量
$\gamma_1$的工具变量估计量由
$$\hat{\gamma}1=\frac{\hat{\sigma}{z y}}{\hat{\sigma}{z x}}$$定理6.6:IV估计量的一致性假设数据对$\left(X_i, Y_i, Z_i\right)$从$p(x, y, z)$中抽样iid，所有方差都是有限的。另外假设$Z$是因果模型$Y=\gamma_0+\gamma_1 X+\delta$的工具变量。那么$\hat{\gamma}1=\hat{\sigma}{z y} / \hat{\sigma}{z x}$是$\gamma_1$的一致估计量。

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|The $\ln (Y)$ Transformation and Its Use for Heteroscedastic Processes

If the regression model $\ln (Y)=\beta_0+\beta_1 X+\varepsilon$ is true, then (as discussed above) the conditional mean and variance of the untransformed $Y$ variable are given as follows:
$$\mathrm{E}(Y \mid X=x)=\exp \left(\beta_0+\beta_1 x\right) \times \exp \left(\sigma^2 / 2\right)$$
and
$$\operatorname{Var}(Y \mid X=x)=\left{\exp \left(\sigma^2\right)-1\right} \times \exp \left(\sigma^2\right) \times \exp \left(2 \beta_0+2 \beta_1 x\right)$$

Notice that when $\sigma^2=0$, then $\left{\exp \left(\sigma^2\right)-1\right}=0$, so that $\operatorname{Var}(Y \mid X=x)=0$, as expected. Notice also that if you assume the lognormal model, you also assume that the variance of $Y \mid X=x$ is nonconstant; i.e., heteroscedastic: The formula for $\operatorname{Var}(Y \mid X=x)$ in the lognormal case shows that, if $\beta_1>0$, then the variance increases for larger $X=x$.

Figures 5.9 and 5.10 demonstrate the heteroscedasticity of the lognormal regression model, as well as the homoscedasticity of the model for the log-transformed data.
R code for Figures 5.9 and 5.10
$\mathrm{n}=100 ;$ beta $=-2.00 ;$ betal $=0.05 ;$ sigma $=0.30$
set. seed $(12345) ; X=\operatorname{rnorm}(n, 70,10)$
$\ln Y=\operatorname{beta0}+\operatorname{beta} 1 * X+\operatorname{rnorm}(n, 0, \operatorname{sigma}) ; Y=\exp (\ln Y)$
$\operatorname{par}(m f r o w=c(1,2)) ;$ plot $(X, Y) ; \quad \operatorname{abline}(\mathrm{V}=\mathrm{c}(60,80)$, col=”gray”)
plot $(\mathrm{X}, \ln Y)$; abline $(\mathrm{v}=\mathrm{c}(60,80), \mathrm{col}=$ “gray” $)$
$y \cdot s e q=\operatorname{seq}(0.01,30, .01)$
$d y 1=d \ln \operatorname{drm}(\mathrm{y} \cdot$ seq, beta $0+$ beta $1 * 60$, sigma)
$d y 2=d \operatorname{lnorm}(y \cdot$ seq, beta $0+$ beta $1 \star 80$, sigma)
par (mfrow $=\mathrm{c}(1,2))$
plot (y.seq, dy1, type=”1″, $x \lim =c(0,20)$, yaxs=”in, ylim $=c(0, .6)$,
$y l a b=$ “lognormal density”, $x l a b=$ “Untransformed $y^{\prime \prime}$ )
points(y.seq, dy2, type=”l”, lty $=2$ )
legend (“topright”, $c(” X=60 “, ” X=80 “)$, lty $=c(1,2), c e x=0.8)$
$l y \cdot s e q=\log (y \cdot s e q)$
dly $1=\operatorname{dnorm}(1 y \cdot s e q$, beta0+beta $1 * 60$, sigma)
dly $2=\operatorname{dnorm}(l y \cdot$ seq, beta0+beta $1 \star 80$, sigma)
plot (ly.seq, dlyl, type=”l”, $x l i m=c(-.2,4)$, yaxs=”i”, ylim $=c(0,1.6)$,
$y l a b=$ “normal density”, $x l a b=$ “Log Transformed $y “)$
points(ly.seq, dly2, type=”l”, lty=2)
legend (“topright”, $c(” \mathrm{X}=60 “, ” \mathrm{X}=80 “)$, lty=c(1,2), cex=0.8)

## 统计代写|回归分析代写Regression Analysis代考|An Example Where the Inverse Transformation $1 / Y$ Is Needed

With ratio data, the units of measurement are ( $a$ per $b$ ), and the inverse transformation often makes sense simply because the measurements become ( $b$ per $a)$, which is just as easy to interpret. For example, a car that gets 30 miles per gallon of gasoline equivalently can be stated to take $(1 / 30)$ gallons per mile. You could use either measure in a statistical analysis, without question from any critical reviewer-miles per gallon and gallons per mile convey the same information. Which form to use? Simply choose the form that least violates the model assumptions.

The following code replicates the analyses shown in Figure 5.6, for these data, but using the $W=1 / Y$ transformation, which he called “speed”, because higher values indicate a speedier computer.
R code for Figure 5.11
URA-Datasets/master/compspeed.txt “)
attach (comp)
reg.orig $=1 \mathrm{~m}(t i m e \sim \mathrm{GB}) ;$ summary (reg.orig)
par(mfrow=c(2,2)); plot(GB, time); add.loess (GB, time)
qqnorm(reg.orig\$residuals); qqline(reg.orig\$residuals)
speed $=1 /$ time; reg.trans $=1 \mathrm{~m}$ (speed $\sim$ GB)
summary (reg.trans)
plot (GB, speed); add.loess (GB, speed)
qqnorm(reg.trans\$residuals); qqline(reg.trans\$residuals)
comp $=$ read.table (“https://raw.githubusercontent. com/andrea2 $719 /$
URA-Datasets/master/compspeed.txt “)
attach (comp)
reg.orig $=1 \mathrm{~m}($ time $\sim$ GB) ; $\operatorname{summary}($ reg.orig)
par (mfrow=c(2,2)); plot (GB, time); add.loess (GB, time)
qqnorm(reg.orig\$residuals); qqline (reg.orig\$residuals)
speed $=1 /$ time; reg.trans $=1 \mathrm{~m}($ speed $\sim \mathrm{GB})$
summary (reg.trans)
plot (GB, speed); add.loess (GB, speed)
qqnorm (reg.trans\$residuals) ; qqline (reg.trans\$residuals)

## 统计代写|回归分析代写Regression Analysis代考|The $\ln (Y)$ Transformation and Its Use for Heteroscedastic Processes

$$\mathrm{E}(Y \mid X=x)=\exp \left(\beta_0+\beta_1 x\right) \times \exp \left(\sigma^2 / 2\right)$$

$$\operatorname{Var}(Y \mid X=x)=\left{\exp \left(\sigma^2\right)-1\right} \times \exp \left(\sigma^2\right) \times \exp \left(2 \beta_0+2 \beta_1 x\right)$$

$\mathrm{n}=100 ;$ beta $=-2.00 ;$ betal $=0.05 ;$ sigma $=0.30$

$\ln Y=\operatorname{beta0}+\operatorname{beta} 1 * X+\operatorname{rnorm}(n, 0, \operatorname{sigma}) ; Y=\exp (\ln Y)$
$\operatorname{par}(m f r o w=c(1,2)) ;$ plot $(X, Y) ; \quad \operatorname{abline}(\mathrm{V}=\mathrm{c}(60,80)$, col=”gray”)

$y \cdot s e q=\operatorname{seq}(0.01,30, .01)$
$d y 1=d \ln \operatorname{drm}(\mathrm{y} \cdot$ seq， β $0+$ β $1 * 60$, sigma)
$d y 2=d \operatorname{lnorm}(y \cdot$ seq， β $0+$ β $1 \star 80$, sigma)
Par (mfrow $=\mathrm{c}(1,2))$)
Plot (y.seq, dy1, type=”1″， $x \lim =c(0,20)$, yaxs=”in, ylim $=c(0, .6)$，
$y l a b=$ “对数正态密度”，$x l a b=$ “未转换$y^{\prime \prime}$)

$l y \cdot s e q=\log (y \cdot s e q)$
Dly $1=\operatorname{dnorm}(1 y \cdot s e q$， β a0+ β $1 * 60$, sigma)
Dly $2=\operatorname{dnorm}(l y \cdot$ seq， β a0+ β $1 \star 80$, sigma)

$y l a b=$ “正常密度”，$x l a b=$ “对数变换$y “)$

## 统计代写|回归分析代写Regression Analysis代考|An Example Where the Inverse Transformation $1 / Y$ Is Needed

“URA-Datasets/master/compspeed.txt”)

reg。origin $=1 \mathrm{~m}(t i m e \sim \mathrm{GB}) ;$摘要(reg.origin)
par(mfrow=c(2,2));plot(GB，时间);添加黄土(GB，时间)
qqnorm(reg. origin ＄residuals);qqline(reg. origin ＄残差)

plot (GB，速度);添加黄土(GB，速度)
qqnorm(reg.trans＄residuals);腾讯网(reg.trans＄残差)
Comp $=$阅读。表(“https://raw.githubusercontent。com andrea2 $719 /$
“URA-Datasets/master/compspeed.txt”)

reg。origin $=1 \mathrm{~m}($ time $\sim$ GB);$\operatorname{summary}($ reg.org)
Par (mfrow=c(2,2));plot (GB，时间);添加黄土(GB，时间)
qqnorm(reg. origin ＄residuals);Qqline (reg. origin ＄残差)

plot (GB，速度);添加黄土(GB，速度)
Qqnorm (reg.trans＄残差);腾讯线(reg.trans＄残差)

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|Comparing Transformations of $X$ with the Car Sales Data

Consider the Car Sales data discussed in Chapter 4. The test for curvature using the quadratic function showed “significant” curvature, corroborating both the curved LOESS smooth, and the subject matter theory, which suggests that as interest rates increase, the mean sales function should flatten because cash sales are not affected by interest rates.
So, first of all, why bother transforming $X=$ Interest Rate? We already showed two different estimates of curvature, one using LOESS, and the other using the quadratic function. Why not use either the LOESS fit or the quadratic model? You could. But there are compelling reasons not to use either LOESS or quadratic models when you have curvature. These reasons are also compelling reasons for why you might wish to use a transformation to model curvature.
Problems with LOESS fit
The LOESS function cannot be written in a simple function such as linear, quadratic, exponential, etc. Having a simple function form such as $\mathrm{E}(Y \mid X=x)=\beta_0+\beta_1 \ln (x)$ makes the model easier to interpret and use.
Quadratics and higher-order polynomial functions are notoriously bad at the extreme low and high values of the $X$ data. In addition, quadratic models have an extra parameter $\left(\beta_2\right)$ that must be estimated, which can cause loss of accuracy.

Just because a model has a higher maximized likelihood $(L)$ than another model does not validate the model assumptions. In the case of the classical model, higher $L$ means that the $Y$ data have a smaller sum of squared deviations from the fitted function, but not that the assumptions of the model are valid. You still need to evaluate the assumptions of the transformed model, even when the model has a relatively high likelihood.
Checking the assumptions of the transformed model
To check assumptions of the model when you transform $X$, simply apply the techniques you learned in the previous chapter with the transformed $X$ variable. In other words, let $U=f(X)$ and check the assumptions of the $(U, Y)$ data in the same way that you check the assumptions with the $(X, Y)$ data.

## 统计代写|回归分析代写Regression Analysis代考|The Car Sales Data $\left(t, e_t\right)$ and $\left(e_{t-1}, e_t\right)$ Plots

The Car Sales data are pure time-series since the data are collected in 120 consecutive months. The following code shows the relevant plots to check for uncorrelated (specifically, non-autocorrelated) errors.
URA-DataSets/master/Cars.txt”)
attach(CarS); $\mathrm{n}=$ nrow(CarS)
fit $=$ lm(NSOLD $~$ INTRATE)
resid $=$ fit\$residuals par(mfrow=c(1,2)) plot($1: n$, resid, xlab=”month”, ylab=”residual”) points($1: n$, resid, type=”l”); abline(h=0) lag.resid = c(NA, resid[1:n-1]) plot(lag.resid, resid, xlab=”lagged residual”, ylab= “residual”) abline(lsfit(lag.resid, resid)) Cars$=$read.table$($“https://raw.githubusercontent. com/andrea$2719 /$URA-DataSets/master/Cars.txt”) attach (Cars);$n=\operatorname{nrow}(\operatorname{Cars})\mathrm{fit}=\operatorname{lm}(\mathrm{NSOLD} \sim$INTRATE$)$resid$=$fit\$residuals
par (mfrow=c $(1,2))$
plot ( $1: n$, resid, $x l a b=$ “month”, ylab=”residual”)
points $(1: \mathrm{n}$, resid, type=”I”); abline $(\mathrm{h}=0)$
lag.resid $=c(N A, r e s i d[1: n-1])$
plot (lag.resid, resid, $x l a b=$ “lagged residual”, ylab = “residual”)
abline(lsfit (lag.resid, resid))
The results are shown in Figure 4.8. There is overwhelming evidence of autocorrelation shown by both plots.

What are the consequences of such an extreme violation of assumptions? According to the mathematical theorems summarized in Chapter 3 , if the data-generating process is truly given by the regression model, then the confidence intervals and $p$-values behave precisely as advertised, with precisely 95\% confidence, and precisely 5\% significance levels. When the independence assumption is grossly violated as seen here, the true confidence levels may be far from 95\% and the true significance levels may be far from 5\%. How far? You guessed it: You can find out by using simulation.

## 统计代写|回归分析代写Regression Analysis代考|The Car Sales Data $\left(t, e_t\right)$ and $\left(e_{t-1}, e_t\right)$ Plots

Car Sales数据是纯时间序列，因为数据是连续120个月收集的。下面的代码显示了检查不相关(特别是非自相关)错误的相关图。
“URA-DataSets/master/Cars.txt”)
attach(CarS);$\mathrm{n}=$ nrow(CarS)
fit $=$ lm(NSOLD $~$ INTRATE)
Resid $=$ fit＄残差
par(mfrow=c(1,2))

abline(lsfit);残留，残留))

“URA-DataSets/master/Cars.txt”)

$\mathrm{fit}=\operatorname{lm}(\mathrm{NSOLD} \sim$ INTRATE $)$
Resid $=$ fit＄残差
Par (mfrow=c $(1,2))$

points $(1: \mathrm{n}$, resid, type=”I”);在线$(\mathrm{h}=0)$

Abline (lsfit (lag))残留，残留))

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|Evaluating the Uncorrelated Errors Assumption Using Graphical Methods

To evaluate the uncorrelated errors assumption, you first have to consider the type of data set you have, whether it is pure time-series, cross-sectional time-series, spatial, repeated measures, or multilevel (grouped) data. With pure time-series data, it is common to let $t$ denote the observation indicator rather than $i$, and it is common to let $T$ denote the number of time points in the data set rather than $n$, so the set of observations is indexed by $t=1,2, \ldots, T$, rather than by $i=1,2, \ldots, n$.

The uncorrelated errors assumption is often badly violated with pure time-series processes, because, e.g., today is similar to yesterday, but not so similar to five years ago. Thus, the potentially observable values of today’s error term, $\varepsilon_t$, are often highly correlated with potentially observable values of yesterday’s error term, $\varepsilon_{t-1}$, implying a violation of the uncorrelated errors assumption.

To diagnose correlated errors with pure time-series data, you should first examine the time-series residual graph, or $\left(t, e_t\right)$. Look for systematic, non-random patterns, such as trends or sinusoidal-type functional patterns to suggest failure of this assumption. A completely random appearance of this graph is consistent with uncorrelated errors.

The most common type of residual correlation is the correlation of the current error $\varepsilon_t$ with the previous error $\varepsilon_{t-1}$, which is called the “lagged” error term. Such correlation is called autocorrelation because it refers to the correlation of a variable with itself. Thus, the second graph you can view is the lag scatterplot, or $\left(e_{t-1}, e_t\right)$, upon which you can superimpose the OLS or LOESS fit to see the trend. A trend in this plot suggests dependence between the current residual and the immediately preceding residual, a violation of the uncorrelated errors assumption. A random scatter with no trend is consistent with uncorrelated errors.

A third kind of plot is the autocorrelation function of the residuals, which displays lag 1, $\operatorname{lag} 2$, lag 3, and more autocorrelations, thus you can use this plot to examine autocorrelations for lags greater than 1.

For data other than pure time-series data, different methods are needed. For spatial data (points in “space,” e.g., data with geographic coordinates), you can use a variogram to check for error correlation, in this case called “spatial autocorrelation.” With multilevel (grouped) data, you can examine scatterplots where data are labeled by group to diagnose correlation structure; Chapter 10 touches upon this issue. For now, we will discuss only pure time-series data.

## 统计代写|回归分析代写Regression Analysis代考|The Car Sales Data $\left(t, e_t\right)$ and $\left(e_{t-1}, e_t\right)$ Plots

The Car Sales data are pure time-series since the data are collected in 120 consecutive months. The following code shows the relevant plots to check for uncorrelated (specifically, non-autocorrelated) errors.
URA-DataSets/master/Cars.txt”)
attach(CarS); $\mathrm{n}=$ nrow(CarS)
fit $=$ lm(NSOLD $~$ INTRATE)
resid $=$ fit\$residuals par(mfrow=c(1,2)) plot($1: n$, resid, xlab=”month”, ylab=”residual”) points($1: n$, resid, type=”l”); abline(h=0) lag.resid = c(NA, resid[1:n-1]) plot(lag.resid, resid, xlab=”lagged residual”, ylab= “residual”) abline(lsfit(lag.resid, resid)) Cars$=$read.table$($“https://raw.githubusercontent. com/andrea$2719 /$URA-DataSets/master/Cars.txt”) attach (Cars);$n=\operatorname{nrow}(\operatorname{Cars})\mathrm{fit}=\operatorname{lm}(\mathrm{NSOLD} \sim$INTRATE$)$resid$=$fit\$residuals
par (mfrow=c $(1,2))$
plot ( $1: n$, resid, $x l a b=$ “month”, ylab=”residual”)
points $(1: \mathrm{n}$, resid, type=”I”); abline $(\mathrm{h}=0)$
lag.resid $=c(N A, r e s i d[1: n-1])$
plot (lag.resid, resid, $x l a b=$ “lagged residual”, ylab = “residual”)
abline(lsfit (lag.resid, resid))
The results are shown in Figure 4.8. There is overwhelming evidence of autocorrelation shown by both plots.

What are the consequences of such an extreme violation of assumptions? According to the mathematical theorems summarized in Chapter 3 , if the data-generating process is truly given by the regression model, then the confidence intervals and $p$-values behave precisely as advertised, with precisely 95\% confidence, and precisely 5\% significance levels. When the independence assumption is grossly violated as seen here, the true confidence levels may be far from 95\% and the true significance levels may be far from 5\%. How far? You guessed it: You can find out by using simulation.

## 统计代写|回归分析代写Regression Analysis代考|The Car Sales Data $\left(t, e_t\right)$ and $\left(e_{t-1}, e_t\right)$ Plots

Car Sales数据是纯时间序列，因为数据是连续120个月收集的。下面的代码显示了检查不相关(特别是非自相关)错误的相关图。
“URA-DataSets/master/Cars.txt”)
attach(CarS);$\mathrm{n}=$ nrow(CarS)
fit $=$ lm(NSOLD $~$ INTRATE)
Resid $=$ fit＄残差
par(mfrow=c(1,2))

abline(lsfit);残留，残留))

“URA-DataSets/master/Cars.txt”)

$\mathrm{fit}=\operatorname{lm}(\mathrm{NSOLD} \sim$ INTRATE $)$
Resid $=$ fit＄残差
Par (mfrow=c $(1,2))$

points $(1: \mathrm{n}$, resid, type=”I”);在线$(\mathrm{h}=0)$

Abline (lsfit (lag))残留，残留))

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|Practical Versus Statistical Significance

While the Car Sales example above is a case where statistically significant $(p<0.05)$ curvature corresponds to practically significant (which means practically important) curvature, it is not always the case that statistical significance corresponds to practical significance. This can easily happen in data sets where $n$ is large (e.g., with “big data”), because with large data sets you have the ability to estimate even slight curvature very precisely, with small standard error.

The following simulation illustrates this situation: There is statistically significant ( $p=0.000166$ ) curvature, as shown by the hypothesis test for the quadratic term. However, the curvature is practically insignificant, as can be seen by graphing the linear and quadratic fitted functions. The sample size in this simulation is large, $n=1,000,000$, but not unusually large for “big data” applications.

4.5.1 Simulation Study to Demonstrate Practical vs. Statistical Significance
set.seed(54321) # For perfect replicability of the random simulation.
$\mathrm{x}=10+2 * \operatorname{rnorm}(1000000) ; \mathrm{xsq}=\mathrm{x}^{\wedge} 2$
$\mathrm{y}=2+.6 * \mathrm{x}+.003 * x$ sq $+4 * \operatorname{rnorm}(1000000)$ #beta2 $=.003$ does not equal 0 !
fit. quad $=\operatorname{lm}(\mathrm{y} \sim \mathrm{x}+\mathrm{xsq})$
summary(fit.quad) # Significant curvature: p-value $=0.000166$
## A . 1\% random sample from the data set is selected to make the scatterplot
## more legible. Otherwise, the points are too dense to view.
select $=\operatorname{runif}(1000000) ; \mathrm{x} 1=\mathrm{x}$ [select<.001] ; $\mathrm{y} 1=\mathrm{y}[$ select<.001]
plot (x1, y1, main = “Scatterplot of a $0.1 \%$ Subsample”)
abline(lsfit( $x, y)$, col=”gray”)

Figure 4.5 shows that the linear fit (solid line) is adequate, even though the true model is quadratic, and not linear.

## 统计代写|回归分析代写Regression Analysis代考|Evaluating the Constant Variance (Homoscedasticity) Assumption Using Graphical Methods

The first graph you should use to evaluate the constant variance assumption is the $\left(\hat{y}_i, e_i\right)$ scatterplot. Look for changes in the pattern of vertical variability of the $e_i$ for different $\hat{y}_i$. The most common indications of constant variance assumption violation are shapes that indicate either increasing variability of $Y$ for larger $\mathrm{E}(Y \mid X=x)$, or shapes that indicate decreasing variability of $Y$ for larger $\mathrm{E}(Y \mid X=x)$. Increasing variability of $Y$ for larger $\mathrm{E}(Y \mid X=x)$ is indicated by greater variability in the vertical ranges of the $e_i$ when $\hat{y}_i$ is larger.
Recall again that the constant variance assumption (like all assumptions) refers to the data-generating process, not the data. The statement “the data are homoscedastic” makes no sense. By the same logic, the statements “the data are linear” and “the data are normally distributed” also are nonsense. Thus, whichever pattern of variability that you decide to claim based on the $\left(\hat{y}_i, e_i\right)$ scatterplot, you should try to make sense of it in the context of the subject matter that determines the data-generating process. As one example, physical boundaries on data force smaller variance when the data are closer to the boundary. As another, when income increases, people have more choice as to whether or not they choose to purchase an item. Thus, there should be more variability in expenditures among people with more money than among people with less money. Whatever pattern you see in the $\left(\hat{y}_i, e_i\right)$ scatterplot should make sense to you from a subject matter standpoint.

While the LOESS smooth to the $\left(\hat{y}_i, e_i\right)$ scatterplot is useful for checking the linearity assumption, it is not useful for checking the constant variance assumption. Instead, you should use the LOESS smooth over the plot of $\left(\hat{y}_i,\left|e_i\right|\right)$. When the variability in the residuals is larger, they will tend to be farther from zero, giving larger mean absolute residuals $\left|e_i\right|$. An increasing trend in the $\left(\hat{y}_i,\left|e_i\right|\right)$ plot suggests larger variability in $Y$ for larger $\mathrm{E}(Y \mid X=x)$, and a flat trend line for the $\left(\hat{y}_i,\left|e_i\right|\right)$ plot suggests that the variability in $Y$ is nearly unrelated to $\mathrm{E}(Y \mid X=x)$. However, as always, do not over-interpret. Data are idiosyncratic (random), so even if homoscedasticity is true in reality, the LOESS fit to the $\left(\hat{y}_i,\left|e_i\right|\right)$ graph will not be a perfectly flat line, due to chance alone. To understand “chance alone” in this case you can simulate data from a homoscedastic model, construct the $\left(\hat{y}_i, \mid e_i\right)$ graph, and add the LOESS smooth. You will see that the LOESS smooth is not a perfect flat line, and you will know that such deviations are explained by chance alone.

The hypothesis test for homoscedasticity will help you to decide whether the observed deviation from a flat line is explainable by chance alone, but recall that the test does not answer the real question of interest, which is “Is the heteroscedasticity so bad that we cannot use the homoscedastic model?” (That question is best answered by simulating data sets having the type of heteroscedasticity you expect with your real data, then by performing the types of analyses you plan to perform on your real data, then by evaluating the performance of those analyses.)

## 统计代写|回归分析代写Regression Analysis代考|Practical Versus Statistical Significance

4.5.1模拟研究以证明实际意义与统计意义
set.seed(54321) ＃用于随机模拟的完美可复制性。
$\mathrm{x}=10+2 * \operatorname{rnorm}(1000000) ; \mathrm{xsq}=\mathrm{x}^{\wedge} 2$
$\mathrm{y}=2+.6 * \mathrm{x}+.003 * x$平方$+4 * \operatorname{rnorm}(1000000)$ ＃beta2 $=.003$不等于0 !

summary(fit.quad) ＃显著曲率:p值$=0.000166$
＃＃ a。从数据集中随机抽取1％的样本制作散点图
＃＃更清晰。否则，点太密集而无法查看。

plot (x1, y1, main =“$0.1 \%$子样本的散点图”)
Abline (lsfit($x, y)$, col=”gray”)

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|Hypothesis Testing and $p$-Values: Is the Observed Effect of $X$ on $Y$ Explainable by Chance Alone?

Some researchers will do nearly anything to get a publication. The incentives are great: Fame, tenure, promotion, annual salary, raises, prime class assignments, and clout in one’s department are a function of quality and quantity of publications.

Historically, statistical results were required to be “statistically significant” to be publishable. In terms of confidence intervals, this means that the interval for the effect (e.g., the $\beta$ ) in question must exclude 0 so that you can confidently state the direction of the effect (positive or negative) of the given $X$ variable on $Y$.

Researchers used the $p$-values that are reported routinely by regression software to determine “statistical significance.” But $p$-values are easily manipulated, and unscrupulous researchers can analyze data “creatively” to get nearly any $p$-value they would like to see. This has led to an unfortunate practice known as p-hacking, where researchers try analyses many different ways until they get a $p$-value that is statistically significant, and then try to publish the results. Because of their potential for misuse, there is a strong movement in the scientific community away from use of $p$-values, as well as the phrase “statistical significance,” in favor of other statistics and characterizations.

When interpreted correctly and not misused, the $p$-value does provide interesting and somewhat useful information. Thus, we insist that you understand $p$-values very well, so that you can use them correctly and effectively, and so that you will not become a ” $p$-hacker.”

To interpret the $p$-value correctly, you must consider the question, “Is the estimate of the effect of $X$ on $Y$ explainable by chance alone?” But to answer that question, you must first understand what it means for an estimated effect to be explained by chance alone. The following example explains this concept.

## 统计代写|回归分析代写Regression Analysis代考|Is the Last Digit of a Person’s Identification Number Related to Their Height?

On the surface of it, this is a silly question. But it provides a great example to help you to understand what it means for a phenomenon to be “explained by chance alone,” which is the first thing you need to know before you can ascertain whether a phenomenon is “explainable by chance alone.”

Suppose you have a data set containing heights of 100 adult males in the United States, along with the last digit of their social security number (SSN). Since adult male heights are approximately normally distributed with mean 70 inches and standard deviation 4 inches, and since the last digit of the SSN is uniformly distributed on the numbers $0,1,2, \ldots, 9$, the following code simulates a quite realistic example of how such a data set would look.
## Simulation of data relating Height to SSN
$\mathrm{n}=100$
set. seed(12345) # so that the results will replicate perfectly
height = round(rnorm( $100,70,4))$
ssn = sample $(0: 9,100$, replace=T)
ssn. data = data.frame(ssn, height)
## Simulation of data relating Height to SSN
$\mathrm{n}=100$
set.seed(12345) # so that the results will replicate perfectly
height = round $(\operatorname{rnorm}(100,70,4))$
$\mathrm{ssn}=$ sample $(0: 9,100$, replace $=T)$
ssn.data = data.frame (ssn, height)
This code gives you the following (hypothetical but realistic) data on last digit of social security number (SSN) and Height:
$\begin{array}{lrr} & \sin & \text { height } \ 1 & 5 & 72 \ 2 & 8 & 73 \ 3 & 1 & 70 \ 4 & 5 & 68 \ 5 & 6 & 72 \ 6 & 7 & 63\end{array}$

## 统计代写|回归分析代写Regression Analysis代考|Is the Last Digit of a Person’s Identification Number Related to Their Height?

＃＃模拟高度与SSN相关的数据
$\mathrm{n}=100$

ssn = sample $(0: 9,100$, replace=T)
ssn. ssn.Data = Data .frame(ssn, height)

＃＃模拟高度与SSN相关的数据
$\mathrm{n}=100$

$\mathrm{ssn}=$样品$(0: 9,100$，替换$=T)$
ssn. ssn.Data = Data .frame (ssn, height)

$\begin{array}{lrr} & \sin & \text { height } \ 1 & 5 & 72 \ 2 & 8 & 73 \ 3 & 1 & 70 \ 4 & 5 & 68 \ 5 & 6 & 72 \ 6 & 7 & 63\end{array}$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|The Classical Model and Its Consequences

The classical regression model assumes normality, independence, constant variance, and linearity of the conditional mean function, and is (once again) stated as follows:
$$Y_i \mid X_i=x_i \quad \sim_{\text {independent }} \mathrm{N}\left(\beta_0+\beta_1 x_i, \sigma^2\right) \text {, for } i=1,2, \ldots, n .$$
Whether you like it or not, this model is also what your computer assumes when you ask it to analyze your data via standard regression methods. The parameter estimates you get from the computer are best under this model, and the inferences ( $p$-values and confidence intervals) are exactly correct under this model. If the assumptions of the model are not true, then the estimates are not best, and the inferences are incorrect. You might think we are saying that assumptions must be true in order to use statistical methods that make such assumptions, but we are not. As we noted in Chapter 1, it is not necessarily a problem that any or all of the assumptions of the model are wrong, depending on how badly violated is the assumption. And the easiest way to understand whether an assumption is violated “too badly” is to use simulation.

We have found that students in statistics classes often resist learning simulation. After all, the data that researchers use is usually real, and not simulated, so the students wonder, what is the point of using simulation? Here are some answers:

• Simulation shows you, clearly and concretely, how to interpret the regression analysis of your real (not simulated) data.
• Simulation helps you to understand how a regression model can be useful even when the model is wrong.
• Simulation models help you to understand the meaning of the regression model parameters.
• Simulation models help you to understand the meaning of the regression model assumptions.
• Simulation models help you to understand the meaning of a “research hy pothesis.”
• Simulation helps you to understand how to interpret your data in the presence of chance effects.
• Simulation helps you to understand all the commonly misunderstood concepts in statistics, like “unbiasedness,” “standard error,” ” $p$-value,” and “confidence interval.” $^{\prime \prime}$
• Simulation methods are commonly used in the analysis of real data; examples include the bootstrap and Markov Chain Monte Carlo.

## 统计代写|回归分析代写Regression Analysis代考|Unbiasedness

The Gauss-Markov (G-M) theorem states that, under certain model assumptions (the premise, ” $\mathrm{A}$ ” of the theorem), the OLS estimator has minimum variance among linear unbiased estimators (that is the consequence, the “condition B” of the theorem). To understand the G-M theorem, you first need to understand what “unbiasedness” means. Recall the view of regression data shown in Chapter 2, shown again in Table 3.1.

To be specific, please consider the Production Cost data set from Chapter 1. The actual data are shown in Table 3.2, along with the random data-generation assumption of the regression model.

In particular, the value 2,224 is assumed to be produced at random from a distribution of potentially observable Cost values among jobs having 1,500 widgets, the value 1,660 is assumed to be produced at random from a distribution of potentially observable Cost values among jobs having 800 widgets, and so on. If you are having trouble visualizing these different distributions, just have a look at Figure 1.7 again, and put yourself in the position of the job manager at this company: In two different jobs where the number of widgets is the same, will the costs also be the same? Of course not; see the first and third observations in the data set, for example. There is an entire distribution of potentially observable Cost values when Widgets $=1500$, and this is what is meant by $p(y \mid X=1500)$.

Now, use your imagination. Imagine another collection of 40 jobs, from the same process that produced the data above, with the widgets data exactly as observed, but with specific costs not observed. Further, imagine that the classical model is true so that the distribution $p(y \mid X=x)$ is the $\mathrm{N}\left(\beta_0+\beta_1 x, \sigma^2\right)$ distribution. The specific costs are not observed, but the potentially observable data will appear as shown in Table 3.3.

In Table 3.3, the $Y_i$ are random variables, coming from the same distributions that produced the original data. Again, use your imagination: There are infinitely many potentially observable data sets as shown in Table 3.3, because there are infinitely many sequences of potentially observable values for $Y_1$; infinitely many sequences of potentially observable values for $Y_2, \ldots$; and there are infinitely many sequences of potentially observable values for $Y_{40}$. Again, if you are having a hard time visualizing this, just look at Figure 1.7 again: There are an infinity of possible values under each of the normal curves shown there. The $n=40 Y_i$ values in Table 3.3 are one set of random selections from such distributions.

## 计代写|回归分析代写Regression Analysis代考|The Classical Model and Its Consequences

$$Y_i \mid X_i=x_i \quad \sim_{\text {independent }} \mathrm{N}\left(\beta_0+\beta_1 x_i, \sigma^2\right) \text {, for } i=1,2, \ldots, n .$$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Regression Analysis, 回归分析, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|回归分析代写Regression Analysis代考|The Linear Regression Function, and Why It Is Wrong

Usually, when people learn regression, they learn to understand the relationship between $Y$ and $X$ as a linear function. Specifically, the linearity assumption states that the means of the conditional distributions $p(y \mid x)$ fall precisely on a straight line of the form $\beta_0+\beta_1 x$, i.e., that $\mu_x=\mathrm{E}(Y \mid X=x)=\beta_0+\beta_1 x$.

See Figure 1.7 above for a graphic illustration of what this assumption tells you about the means of the conditional distributions: In that graph, four conditional distributions are shown, corresponding to four distinct values $X=x$. The linearity assumption states that the means of those four distributions, as well as the means for all other conditional distributions that are not shown in Figure 1.7, fall precisely on a straight line $\beta_0+\beta_1 x$, for some values of the parameters $\beta_0$ and $\beta_1$. The linearity assumption does not require that you know the numerical values of $\beta_0$ and $\beta_1$; rather, it simply states that the conditional means fall on some line $\beta_0+\beta_1 x$, for some (usually unknown) numerical values of the parameters $\beta_0$ and $\beta_1$.

The parameter $\beta_0$ is called the intercept of the line. When $\mathrm{E}(Y \mid X=x)=\beta_0+\beta_1 x$, it follows that $\mathrm{E}(Y \mid X=0)=\beta_0+\beta_1(0)=\beta_0$. In words, if the linearity assumption is true, then the mean of the distribution of $Y$ when $X=0$ is equal to $\beta_0$. Often, the range of $X$ does not include 0 , in which case that interpretation is not particularly useful. In such cases, you can vaguely interpret $\beta_0$ as a parameter related to the unconditional mean of $Y$ : If the mean of $Y$ is larger, then $\beta_0$ will be larger to reflect the vertical height, or distance from zero, of the regression function.

The parameter $\beta_1$ tells you something about the relationship between $Y$ and $X$. If the linearity assumption is true, then this parameter is the difference between the conditional means of the distributions of $Y$ where the $X$ variable differs by 1.0, which can be demonstrated as follows:
\begin{aligned} \mathrm{E}(Y \mid X=x+1)-\mathrm{E}(Y \mid X=x) & =\left{\beta_0+\beta_1(x+1)\right}-\left(\beta_0+\beta_1 x\right) \ & =\left{\beta_0+\beta_1 x+\beta_1\right}-\beta_0-\beta_1 x \ & =\beta_0+\beta_1 x+\beta_1-\beta_0-\beta_1 x \ & =\beta_1 \end{aligned}

## 统计代写|回归分析代写Regression Analysis代考|LOESS: An Estimate of the True (Curved) Mean Function

So, the linearity assumption $\mathrm{E}(Y \mid X=x)=\beta_0+\beta_1 x$ is wrong. What is right? What is right is that $\mathrm{E}(Y \mid X=x)=f(x)$, which is some function $f(x)$ that you do not know. However, data allow you to estimate such unknown quantities.

If your data set had lots of repeats on particular $x$ values, you could use the average of the $Y$ data values where $X=x$ to estimate the function $f(x)$. For example, consider the data in Table 1.6 below obtained from a survey of students in a class. The $Y$ variable is “rating of the instructor,” on a discrete 1 to 5 scale (where 5 means “best”), and the $X$ variable is “expected grade in course,” where $0=$ ” $\mathrm{F}^{\prime \prime}, 1=$ ” $\mathrm{D}$ “, $2=$ ” $\mathrm{C}$ “, $3=$ “B”, and $4=$ “A.”

Using the data shown in Table 1.6, an obvious estimate of $\mathrm{E}(Y \mid X=2)$ is $\hat{f}(2)=(2+3) / 2=2.5$ (the hat $\left(“{ }^{\prime \prime}\right)$ signifies that this is just an estimate, not the true expected value). Similar, intuitively obvious estimates are $\hat{f}(3)=(5+2+4+4) / 4=3.75$, and $\hat{f}(4)=(5+4+4+5) / 4=4.5$.

The data and the estimated mean function are shown in Figure 1.14. Notice that the function $\hat{f}(x)$ is not perfectly linear, as is expected since there are three distinct $X$ values.
$R$ code for Figure 1.14
$\mathrm{x}=\mathrm{c}(2,2,3,3,3,3,4,4,4,4)$
$y=c(2,3,5,2,4,4,5,4,4,5)$
$\mathrm{x} 1=\mathrm{c}(2,3,4)$
f. hat $=c(2.5,3.75,4.5)$
plot (x, jitter $(y, 5)$, $y l a b=$ “Rating of Instructor (jittered)”,
$x l a b=$ “Expected Grade”, cex. axis $=0.8$, cex. $l a b=0.8$ )
points $(x 1, f$. hat, pch $=” X “)$
points $(x 1, f$. hat, type=”1″, Ity=2)

## 统计代写|回归分析代写Regression Analysis代考|The Linear Regression Function, and Why It Is Wrong

\begin{aligned} \mathrm{E}(Y \mid X=x+1)-\mathrm{E}(Y \mid X=x) & =\left{\beta_0+\beta_1(x+1)\right}-\left(\beta_0+\beta_1 x\right) \ & =\left{\beta_0+\beta_1 x+\beta_1\right}-\beta_0-\beta_1 x \ & =\beta_0+\beta_1 x+\beta_1-\beta_0-\beta_1 x \ & =\beta_1 \end{aligned}

## 统计代写|回归分析代写Regression Analysis代考|LOESS: An Estimate of the True (Curved) Mean Function

$R$代码参见图1.14
$\mathrm{x}=\mathrm{c}(2,2,3,3,3,3,4,4,4,4)$
$y=c(2,3,5,2,4,4,5,4,4,5)$
$\mathrm{x} 1=\mathrm{c}(2,3,4)$
F. hat $=c(2.5,3.75,4.5)$
plot (x, jitter $(y, 5)$, $y l a b=$“教员评分(抖动)”，
$x l a b=$“期望成绩”，等。轴$=0.8$, cex。$l a b=0.8$)

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。