统计代写|广义线性模型代写Generalized linear model代考|Empirical BLUP

统计代写|广义线性模型代写Generalized linear model代考|Empirical BLUP

统计代写|广义线性模型代写Generalized linear model代考|Empirical BLUP

In practice, the fixed effects and variance components are typically unknown. Therefore, in most cases neither the best predictor nor the BLUP is computable, even though they are known to be the best in their respective senses. In such cases, it is customary to replace the vector of variance components, $\theta$, which is involved in the expression of BLUP, by a consistent estimator, $\hat{\theta}$. The resulting predictor is often called empirical BLUP, or EBLUP.

Kackar and Harville (1981) showed that, if $\hat{\theta}$ is an even and translation-invariant estimator and the data are normal, the EBLUP remains unbiased. An estimator $\hat{\theta}=\hat{\theta}(y)$ is even if $\hat{\theta}(-y)=\hat{\theta}(y)$, and it is translation invariant if $\hat{\theta}(y-X \beta)=$ $\hat{\theta}(y)$. Some of the well-known estimators of $\theta$, including ANOVA, ML, and REML estimators (see Sects. 1.3, 1.3.1, and 1.5), are even and translation invariant. In their arguments, however, Kackar and Harville had assumed the existence of the expected value of EBLUP, which is not obvious because, unlike BLUP, EBLUP is not linear in $y$. The existence of the expected value of EBLUP was proved by Jiang (1999b, 2000a). See Sect. 2.7 for details.

Harville (1991) considered the one-way random effects model of Example 1.1 and showed that, in this case, the EBLUP of the mixed effect, $\mu+\alpha_i$, is identical to a parametric empirical Bayes (PEB) estimator. In the meantime, Harville noted some differences between these two approaches, PEB and EBLUP. One of the differences is that much of the work on PEB has been carried out by professional statisticians and has been theoretical in nature. The work has tended to focus on relatively simple models, such as the one-way random effects model, because it is only these models that are tractable from a theoretical standpoint. On the other hand, much of the work on EBLUP has been carried out by practitioners, such as researchers in the animal breeding industry, and has been applied to relatively complex models.

A problem of practical interest is estimation of the MSPE of EBLUP. Such a problem arises, for example, in small area estimation (e.g., Rao and Molina 2015), where the EBLUP is used to estimate the small area means, which can be expressed as mixed effects under a mixed effects model. However, the MSPE of EBLUP is complicated. A naive estimator of the MSPE of EBLUP may be obtained by replacing $\theta$ by $\hat{\theta}$ in the expression of the MSPE of BLUP. However, this is an underestimation. To see this, let $\hat{\eta}=a^{\prime} \hat{\alpha}+b^{\prime} \hat{\beta}$ denote the EBLUP of a mixed effect $\eta=a^{\prime} \alpha+b^{\prime} \beta$, where $\hat{\alpha}$ and $\hat{\beta}$ are the BLUP of $\alpha$, given by (2.35), and BLUE of $\beta$, given by (2.33), respectively, with the variance components $\theta$ replaced by $\hat{\theta}$. Kackar and Harville (1981) showed that, under the normality assumption, one has

where $\tilde{\eta}$ is the BLUP of $\eta$ given by (2.34). It is seen that the MSPE of BLUP is only the first term on the right side of (2.38). In fact, it can be shown that $\operatorname{MSPE}(\tilde{\eta})=$ $g_1(\theta)+g_2(\theta)$, where
& g_1(\theta)=a^{\prime}\left(G-G Z^{\prime} V^{-1} Z G\right) a, \
& g_2(\theta)=\left(b-X^{\prime} V^{-1} Z G a\right)^{\prime}\left(X^{\prime} V^{-1} X\right)^{-1}\left(b-X^{\prime} V^{-1} Z G a\right)
(e.g., Henderson 1975). It is clear that using $g_1(\hat{\theta})+g_2(\hat{\theta})$ as an estimator would underestimate the MSPE of $\hat{\eta}$, because it does not take into account the additional variation associated with the estimation of $\theta$, represented by the second term on the right side of (2.38). Such a problem may become particularly important when, for example, large amounts of funds are involved. For example, over $\$ 7$ billion of funds were allocated annually based on EBLUP estimators of school-age children in poverty at the county and school district levels National Research Council 2000.

统计代写|广义线性模型代写Generalized linear model代考|Observed Best Prediction

A practical issue regarding prediction of mixed effects is robustness to model misspecification. Typically, the best predictor, (2.31) or (2.32), is derived under an assumed model. What if the assumed model is incorrect? Quite often, there is a consequence. Of course, one may try to avoid the model misspecification by carefully choosing the assumed model via a statistical model selection procedure. For example, if the plot of the data shows some nonlinear trend, then, perhaps, some nonlinear terms such as polynomial, or splines, can be added to the model (e.g., Jiang and Nguyen 2016, sec. 6.2). On the other hand, there are practical, sometimes even political, reasons that a simpler model, such as a linear model, is preferred. Such a model is simple to use and interpret, and it utilizes auxiliary information in a simple way. Note that the auxiliary data are often collected using taxpayers’ money; therefore, it might be “politically incorrect” not to use them, even if that is a result of the model selection. For such a reason, one often has little choice but to stay with the model that has been adopted to use. The question then is how to deal with the potential model misspecification.

Jiang et al. (2011) proposed a new method of predicting a mixed effect that “stands group” at the assumed model, even if it is potentially misspecified. It then considers how to estimate the model parameters in order to reduce the impact of model misspecification. The method is called observed best prediction, or OBP. For the most part, OBP entertains two models: one is the assumed model, and the other is a broader model that requires no assumptions, or very weak assumptions. The broader model is always, or almost always, correct; yet, it is useless in terms of utilizing the auxiliary information. The assumed model is used to derive the best predictor (BP) of the mixed effect, which is no longer the BP when the assumed model fails. The broader model, on the other hand, is only used to derive a criterion for estimating the parameters under the assumed model, and this criterion is not model-dependent. As a result, OBP is more robust than BLUP in case of model misspecification. Note that parameter estimation associated with the BLUP, such as the MLE of $\beta$ given by (2.33) when the variance components are known, and the ML or REML estimators of the variance components when the latter are unknown, are model-dependent.

Below we describe the OBP procedure for a special case of LMM, namely, the Fay-Herriot model. More details, and further developments, of OBP can be found in Chapter 5 of Jiang (2019).

统计代写|广义线性模型代写Generalized linear model代考|Empirical BLUP


统计代写|广义线性模型代写Generalized linear model代 考|Approximate Confidence Intervals for Variance Components

Satterthwaite (1946) 提出了一种方法,它扩展了 Smith (1936) 的早期方法,用于平衡方差分析 模型。目标是为以下形鿈的数量构建置信区间 $\zeta=\sum_{i=1}^h c_i \lambda_i$ ,在哪里 $\lambda_i=\mathrm{E}\left(S_i^2\right)$ 和 $S_i^2$ 是对应 于 $i$ 模型中的第 th 个因素 (固定或随机) (例如,Scheffé 1959)。请注意,许多方差分量都可以 用这种形式表示; 例如,方差 $y_{i j}, \sigma^2+\tau^2$ ,在例2.3中可表示为
$(1 / k) \mathrm{E}\left(S_1^2\right)+(1-1 / k) \mathrm{E}\left(S_2^2\right)$ , 在哪里 $S_1^2$ 是对应于 $\alpha$ 和 $S_2^2$ 对应于 $\epsilon$. 这个想法是为了找到一 个合适的“自由度”,比如哾, $d$, 这样随机变量的前两个时刻 $d \sum_{i=1}^h c_i S_i^2 / \zeta$ 匹配一个 $\chi_d^2$ 随机变量。 这种方法称为 Satterthwaite 程序。Graybill 和 Wang (1980) 提出了一种改进 Satterthwaite 程 序的方法。作者将他们的方法称为改进的大样本 (MLS) 方法。该方法为以下项的非负线性组合提供 近似置信区间 $\lambda_i \mathrm{~s}$ ,当线性组合中除一个系数外的所有系数均为零时,这是准确的。我们描述了 Graybill-Wang 方法用于平衡单向随机效应模型的特殊情况 (示例 2.2)。
假设有人对构建置信区间感兴趣 $\zeta=c_1 \lambda_1+c_2 \lambda_2$ ,在哪里 $c_1 \geq 0$ 和 $c_2>0$. 这个问题等同于构造 一个置信区间 $\zeta=c \lambda_1+\lambda_2$ , 在哪里 $c \geq 0$. 数量的一致最小方差无偏估计量 (UMVUE,例如, Lehmann 和 Casella 1998) 由下式给出 $\hat{\zeta}=c S_1^2+S_2^2$. 此外,可以证明 $\hat{\zeta}$ 是渐近正态的,使得 $(\hat{\zeta}-\zeta) / \sqrt{\operatorname{var}(\hat{\zeta})}$ 有一个限制 $N(0,1)$ 分布 (练习 2.16)。

统计代写|广义线性模型代写Generalized linear model代 z- Simultaneous Confidence Intervals

Hartley 和 Rao (1967) 导出了方差比的同时置信区域 $\gamma_r=\sigma_r^2 / \tau^2, r=1, \ldots, s$ (即方差分量的 Hartley-Rao 形式;参见第 节) 在基于最大似然估计的高斯混合方差分析模型中。HartleyRao 置信区域非常普遍,也就是说,它适用于一般混合方差分析模型,平衡或不平衡。另一方面, 在某些特殊情况下,不同的方法可能会导致更容易解释的置信区间。例如,Khuri (1981) 开发了一 种为平衡随机效应模型 (见第 1.2.1 节结尾) 中方差分量的所有连续函数构建同时置信区间的方法, 这是混合方差分析模型的一个特例。
需要注意的是,只要知道如何为各个方差分量构造置信区间,那么,通过 Bonferroni 不等式,总能 为方差分量构造一个保守的同时置信区间。假设 $\left[L_k, U_k\right]$ 是一个 $\left(1-\rho_k\right) \%$ 方差分量的置信区间 $\theta_k, k=1, \ldots, q$. 然后,根据 Bonferroni 不等式,区间集 $\left[L_k, U_k\right], k=1, \ldots, q$ 是 (保守的) 同时置信区间 $\theta_k, k=1, \ldots, q$ 置信系数大于或等于 $1-\sum_{k=1}^q \rho_k$.

统计代写|广义线性模型代写Generalized linear model代考

