Posted on Categories:Generalized linear model, 广义线性模型, 统计代写, 统计代考

# 统计代写|广义线性模型代写Generalized linear model代考|A Simulated Example

## 统计代写|广义线性模型代写Generalized linear model代考|A Simulated Example

Consider an NER model that can be expressed as
$$y_{i j}=\beta_0+\beta_1 x_{i j}+\alpha_i+\epsilon_{i j},$$

$i=1, \ldots, m, j=1, \ldots, n_i$, where the $\alpha_i$ ‘s are i.i.d. random effects with mean 0 and distribution $F_1$, and $\epsilon_{i j} \mathrm{~s}$ are i.i.d. errors with mean 0 and distribution $F_0$. The model might be associated with a survey, where $\alpha_i$ is a random effect related to the $i$ th family in the sample and $n_i$ is the sample size for the family (e.g., the family size, if all family members are surveyed). The $x_{i j}$ ‘s are covariates associated with the individuals sampled from the family and, in this case, correspond to people’s ages. The ages are categorized by the following groups: $0-4,5-9, \ldots, 55-59$, so that $x_{i j}=k$ if the person’s age falls into the $k$ th category (people whose ages are 60 or over are not included in the survey). The true parameters for $\beta_0$ and $\beta_1$ are 2.0 and 0.2 , respectively.

In the simulation, four combinations of the distributions $F_0, F_1$ are considered. These are Case I, $F_0=F_1=N(0,1)$; Case II, $F_0=F_1=t_3$; Case III, $F_0=$ logistic [the distribution of $\log {U /(1-U)}$, where $U \sim \operatorname{Uniform}(0,1)]$ and $F_1=$ centralized lognormal [the distribution of $e^X-\sqrt{e}$, where $X \sim N(0,1)$ ]; and Case IV, $F_0=$ double exponential [the distribution of $X_1-X_2$, where $X_1, X_2$ are independent $\sim$ exponential(1)] and $F_1=$ a mixture of $N(-4,1)$ and $N(4,1)$ with equal probability. Note that Cases II-IV are related to the following types of departure from normality: heavy-tail, asymmetry, and bimodal. In each case, the following sample size configuration is considered: $m=100, k_1=\cdots=k_{m / 2}=2$, and $k_{m / 2+1}=\cdots=k_m=6$. Finally, for each of the above cases, three prediction intervals are considered. The first is the prediction interval based on the LS estimator, or ordinary least squares (OLS) estimator of $\beta$; the second is that based on the EBLUE of $\beta$, where the variance components are estimated by REML (see Sect. 1.4.1); and the third is the linear regression (LR) prediction interval (e.g., Casella and Berger 2002, pp. 558), which assumes that the observations are independent and normally distributed. The third one is not related to the prediction interval developed here; it is considered for comparison.

For each of the four cases, 1,000 datasets are generated. First, the following are independently generated:
(i) $x_{i j}, 1 \leq i \leq m, 1 \leq j \leq k_i$, uniformly from the integers $1, \ldots, 12$ (12 age categories);
(ii) $\alpha_i, 1 \leq i \leq m$, from $F_1$;
(iii) $\epsilon_{i j}, 1 \leq i \leq m, 1 \leq j \leq k_i$, from $F_0$.

## 统计代写|广义线性模型代写Generalized linear model代考|CMMP of Mixed Effects

Suppose that we have a set of training data, $y_{i j}, i=1, \ldots, m, j=1, \ldots, n_i$ in the sense that their classifications are known, that is, one knows which group, $i$, that $y_{i j}$ belongs to. The assumed LMM for the training data is a logitudinal LMM (see Sect. 1.2.1.2):
$$y_i=X_i \beta+Z_i \alpha_i+\epsilon_i$$
where $y_i=\left(y_{i j}\right){1 \leq j \leq n_i}, X_i=\left(x{i j}^{\prime}\right)_{1 \leq j \leq n_i}$ is a matrix of known covariates, $\beta$ is a vector of unknown regression coefficients (the fixed effects), $Z_i$ is a known $n_i \times q$ matrix, $\alpha_i$ is a $q \times 1$ vector of group-specific random effects, and $\epsilon_i$ is an $n_i \times 1$ vector of errors. It is assumed that the $\alpha_i$ ‘s and $\epsilon_i$ ‘s are independent, with $\alpha_i \sim N(0, G)$ and $\epsilon_i \sim N\left(0, R_i\right)$, where the covariance matrices $G$ and $R_i$ depend on a vector $\psi$ of variance components.

Our goal is to make a classified prediction for a mixed effect associated with a set of new observations, $y_{\mathrm{n}, j}, 1 \leq j \leq n_{\text {new }}$ (the subscript $\mathrm{n}$ refers to “new”). Suppose that the new observations satisfy a similar LMM:
$$y_{\mathrm{n}, j}=x_{\mathrm{n}}^{\prime} \beta+z_{\mathrm{n}}^{\prime} \alpha_{\mathrm{I}}+\epsilon_{\mathrm{n}, j}, \quad 1 \leq j \leq n_{\mathrm{new}},$$
where $x_{\mathrm{n}}, z_{\mathrm{n}}$ are known vectors; the index $I$ is assumed to be one of $1, \ldots, m$, but one does not know which one it is, or even whether such an actual “match” exists (i.e., it may not be true, at all, that $I$ matches one of the indexes $1, \ldots, m$ ). Furthermore, $\epsilon_{\mathrm{n}, j}, 1 \leq j \leq n_{\text {new }}$ are new errors that are independent with $\mathrm{E}\left(\epsilon_{\mathrm{n}, j}\right)=$ 0 and $\operatorname{var}\left(\epsilon_{\mathrm{n}, j}\right)=R_{\mathrm{new}}$ and are independent with the $\alpha_i \mathrm{~s}$ and $\epsilon_i \mathrm{~s}$ associated with the training data. Note that the normality assumption is not always needed for the new errors, unless prediction interval is concerned (see below). Also, the variance $R_{\text {new }}$ of the new errors does not have to be the same as the variance of $\epsilon_{i j}$, the $j$ th component of $\epsilon_i$ associated with the training data. The mixed effect that we wish to predict is
$$\theta=\mathrm{E}\left(y_{\mathrm{n}, j} \mid \alpha_I\right)=x_{\mathrm{n}}^{\prime} \beta+z_{\mathrm{n}}^{\prime} \alpha_{\mathrm{I}}$$

## 统计代写|广义线性模型代写Generalized linear model代考|CMMP of Mixed Effects

$$y_i=X_i \beta+Z_i \alpha_i+\epsilon_i$$

$$y_{\mathrm{n}, j}=x_{\mathrm{n}}^{\prime} \beta+z_{\mathrm{n}}^{\prime} \alpha_{\mathrm{I}}+\epsilon_{\mathrm{n}, j}, \quad 1 \leq j \leq n_{\mathrm{new}},$$

