Remark: The overall fit of a linear regression model (simple or multiple) can be summarized by using an analysis of variance (ANOVA). The results of this analysis are often presented in a table to show how variability in the response data $Y_1, Y_2, \ldots, Y_n$ is partitioned into different sources. This partition allows us to assess the overall fit of the model.
Recall: Consider the linear regression model
$$Y_i=\beta_0+\beta_1 x_{i 1}+\beta_2 x_{i 2}+\cdots+\beta_k x_{i k}+\epsilon_i,$$

for $i=1,2, \ldots, n$, or, in matrix notation,
$$\mathbf{Y}=\mathbf{X} \boldsymbol{\beta}+\boldsymbol{\epsilon} .$$
Recall $\mathbf{H}=\mathbf{X}\left(\mathbf{X}^{\prime} \mathbf{X}\right)^{-1} \mathbf{X}^{\prime}$ is the hat matrix, and $\widehat{\mathbf{Y}}=\mathbf{H Y}$ and $\mathbf{e}=(\mathbf{I}-\mathbf{H}) \mathbf{Y}$ are the vectors of fitted values and residuals, respectively. The matrix $\mathbf{I}$ is the $n \times n$ identity matrix.

Approach: To create an analysis of variance partition, start with the simple quadratic form $\mathbf{Y}^{\prime} \mathbf{Y}=\mathbf{Y}^{\prime} \mathbf{I}$. Note that
\begin{aligned} \mathbf{Y}^{\prime} \mathbf{Y}=\mathbf{Y}^{\prime} \mathbf{I Y} & =\mathbf{Y}^{\prime}(\mathbf{H}+\mathbf{I}-\mathbf{H}) \mathbf{Y} \ & =\mathbf{Y}^{\prime} \mathbf{H Y}+\mathbf{Y}^{\prime}(\mathbf{I}-\mathbf{H}) \mathbf{Y} \ & =\mathbf{Y}^{\prime} \mathbf{H H} \mathbf{Y}+\mathbf{Y}^{\prime}(\mathbf{I}-\mathbf{H})(\mathbf{I}-\mathbf{H}) \mathbf{Y}=\widehat{\mathbf{Y}}^{\prime} \widehat{\mathbf{Y}}+\mathbf{e}^{\prime} \mathbf{e} \end{aligned}
because both $\mathbf{H}$ and $\mathbf{I}-\mathbf{H}$ are symmetric and idempotent. This equation can be expressed equivalently as
$$\sum_{i=1}^n Y_i^2=\sum_{i=1}^n \widehat{Y}i^2+\sum{i=1}^n\left(Y_i-\widehat{Y}i\right)^2 .$$ We use the following terminology: \begin{aligned} & \text { (uncorrected) total sum of squares } \longrightarrow \mathbf{Y}^{\prime} \mathbf{I Y}=\mathbf{Y}^{\prime} \mathbf{Y}=\sum{i=1}^n Y_i^2 \ & \text { (uncorrected) regression sum of squares } \longrightarrow \mathbf{Y}^{\prime} \mathbf{H Y}=\widehat{\mathbf{Y}}^{\prime} \widehat{\mathbf{Y}}=\sum_{i=1}^n \widehat{Y}i^2 \ & \text { error (residual) sum of squares } \rightarrow \mathbf{Y}^{\prime}(\mathbf{I}-\mathbf{H}) \mathbf{Y}=\mathbf{e}^{\prime} \mathbf{e}=\sum{i=1}^n\left(Y_i-\widehat{Y}_i\right)^2 . \end{aligned}

统计代写|统计推断代考Statistical Inference代写|Survival Analysis

Remark: The statistical analysis of lifetime data is important in many areas, including biomedical applications (e.g., clinical trials, etc.), engineering, and actuarial science. The term “lifetime” means “time to event,” where an event may refer to death, part failure, insurance claim, natural disaster, eradication of infection, etc.

• In chronic disease clinical trials; e.g., trials involving cancer, diabetes, cardiovascular disease, etc., the primary endpoint (variable) of interest may be time to death, time to relapse of disease, time to disease progression, etc. For such trials, we are usually interested in comparing the distribution of the time to event among two or more treatments.
• Typically, clinical trials occur over a finite period of time; therefore, the time to event is not measured on all patients in the study. This results in what is referred to as censored data. Also, because patients generally enter a clinical trial at different calendar times (staggered entry), the amount of follow-up time varies for different individuals.
• The combination of censoring and staggered entry creates challenges in the analysis of such data that do not allow basic statistical techniques to be used. This area of statistics is called survival analysis.

