Posted on Categories:Hypothesis Testing, 假设检验, 数据科学代写, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|假设检验代考Hypothesis Testing代考|R and S-PLUS Function

The function
$$\mathrm{qse}(\mathrm{x}, \mathrm{q}=.5, \mathrm{op}=3)$$
estimates the standard error of $\hat{x}_q$ using Eq. (3.11). As indicated, the default value for $q$ is .5. The argument op determines which density estimator is used to estimate $f\left(x_q\right)$. The choices are:

• $o p=1$, Rosenblatt’s shifted histograms,
• $o p=2$, expected frequence curve,
For example, storing the data in Table $3.2$ in the S-PLUS vector $x$, the command $q$ se $(x, o p=1)$ returns the value $64.3$. In contrast, using $o p=2$ and $o p=3$ yields the estimates $58.94$ and $47.95$, respectively. So the choice of density estimator can make a practical difference.

## 统计代写|假设检验代考Hypothesis Testing代考|The Maritz–Jarrett Estimate of the Standard Error of xˆq

Maritz and Jarrett (1978) derived an estimate of the standard error of sample median, which is easily extended to the more general case involving $\hat{x}_q$. That is, when using a single order statistic, its standard error can be estimated using the method outlined here. It is based on the fact that $E\left(\hat{x}_q\right)$ and $E\left(\hat{x}_q^2\right)$ can be related to a beta distribution. The beta probability density function, when $a$ and $b$ are positive integers, is
$$f(x)=\frac{(a+b+1) !}{a ! b !} x^a(1-x)^b, \quad 0 \leq x \leq 1 .$$

Details about the beta distribution are not important here. Interested readers can refer to N. L. Johnson and Kotz (1970, Ch. 24).

As before, let $m=[q n+.5]$. Let $Y$ be a random variable having a beta distribution with $a=m-1$ and $b=n-m$, and let
$$W_i=P\left(\frac{i-1}{n} \leq Y \leq \frac{i}{n}\right) .$$
Many statistical computing packages have functions that evaluate the beta distribution, so evaluating the $W_i$ values is relatively easy to do. In S-PLUS (and $\mathrm{R}$ ), there is the function pbeta $(\mathrm{x}, \mathrm{a}, \mathrm{b})$, which computes $P(Y \leq x)$. Thus, $W_i$ can be computed by setting $x=i / n, y=(i-1) / n$, in which case $W_i$ is pbeta $(x, m-1, n-m)$ minus pbeta $(y, m-1, n-m)$.
Let
$$C_k=\sum_{i=1}^n W_i X_{(i)}^k$$
When $k=1, C_k$ is a linear combination of the order statistics. Linear sums of order statistics are called L-estimators. Other examples of L-estimators are the trimmed and Winsorized means already discussed. The point here is that $C_k$ can be shown to estimate $E\left(X_{(m)}^k\right)$, the $k$ th moment of the $m$ th-order statistic. Consequently, the standard error of the $m$ th-order statistic, $X_{(m)}=\hat{x}_q$, is estimated with
$$\sqrt{C_2-C_1^2} .$$
Note that when $n$ is odd, this last equation provides an alternative to the McKean-Schrader estimate of the standard error of $M$ described in Section 3.3.4. Based on limited studies, it seems that when computing confidence intervals or testing hypotheses based on $M$, the McKean-Schrader estimator is preferable.

## 统计代写|假设检验代考Hypothesis Testing代考|R and S-PLUS Function

$$\mathrm{qse}(\mathrm{x}, \mathrm{q}=.5, \mathrm{op}=3)$$

$o p=1$, Rosenblatt 的移位直方图,

$o p=2$, 预期频率曲线

$o p=3$, 目适应核方法。

## 统计代写|假设检验代考Hypothesis Testing代考|The Maritz-Jarrett Estimate of the Standard Error of $x^{-} q$

Maritz 和 Jarrett (1978) 得出了样本中位数标准误差的估计值，这很容易扩展到更一般的情况，涉及 $\hat{x}{q \text {. }}$ 也就是说，当使用单阶 统计量时，可以使用此处概述的方法估算其标准误差。这是基于这样一个事实 $E\left(\hat{x}_q\right)$ 和 $E\left(\hat{x}_q^2\right)$ 可能与 beta 分布有关。beta概 率密度函数，当 $a$ 和 $b$ 是正整数，是 $$f(x)=\frac{(a+b+1) !}{a ! b !} x^a(1-x)^b, \quad 0 \leq x \leq 1 .$$ 关于 beta 分布的细节在这里并不重要。有兴趣的读者可以参考 NL Johnson 和 Kotz (1970, Ch. 24)。 和以前一样，让 $m=[q n+.5]$. 让 $Y$ 是具有 beta分布的随机栾量 $a=m-1$ 和 $b=n-m$ ，然后让 $$W_i=P\left(\frac{i-1}{n} \leq Y \leq \frac{i}{n}\right)$$ 许多统计计算包具有评估 beta 分布的函数，因此评估 $W_i$ 值是比较容易做到的。在 S-PLUS（和 $\left.\mathrm{R}\right)$ ，有函数 pbeta( $\left.\mathrm{x}, \mathrm{a}, \mathrm{b}\right)$ ，计算 $P(Y \leq x)$. 因此， $W_i$ 可以通过设置算 $x=i / n, y=(i-1) / n$ ， 在这种情况下 $W_i$ 是 $\beta \beta(x, m-1, n-m)$ 较少的 $\beta \beta$ $(y, m-1, n-m)$. 让 $$C_k=\sum{i=1}^n W_i X_{(i)}^k$$

$$\sqrt{C_2-C_1^2}$$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Hypothesis Testing, 假设检验, 数据科学代写, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|假设检验代考Hypothesis Testing代考|The Sample Trimmed Mean

As already indicated, the standard error of the sample mean can be relatively large when sampling from a heavy-tailed distribution, and the sample mean estimates a nonrobust measure of location, $\mu$. The sample trimmed mean addresses these problems.

The sample trimmed mean, which estimates the population trimmed $\mu_t$ (described in Section 2.2.3), is computed as follows. Let $X_1, \ldots, X_n$ be a random sample and let $X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}$ be the observations written in ascending order. The value $X_{(i)}$ is called the $i$ th-order statistic. Suppose the desired amount of trimming has been chosen to be $\gamma, 0 \leq \gamma<.5$. Let $g=[\gamma n]$, where $[\gamma n]$ is the value of $\gamma n$ rounded down to the nearest integer. For example, $[10.9]=10$. The sample trimmed mean is computed by removing the $g$ largest and $g$ smallest observations and averaging the values that remain. In symbols, the sample trimmed mean is
$$\bar{X}t=\frac{X{(g+1)}+\cdots+X_{(n-g)}}{n-2 g} .$$
In essence, the empirical distribution is trimmed in a manner consistent with how the probability density function was trimmed when defining $\mu_t$. As indicated in Chapter 2, two-sided trimming is assumed unless stated otherwise.
The definition of the sample trimmed mean given by Eq. (3.1) is the one most commonly used. However, for completeness, it is noted that the term trimmed mean sometimes refers to a slightly different estimator (e.g., Reed, 1998; cf. Hogg, 1974), namely,
$$\frac{1}{n(1-2 \gamma)}\left(\sum_{i=g+1}^{n-g} X_{(i)}+(g-\gamma n)\left(X_{(g)}+X_{(n-g+1)}\right)\right)$$

## 统计代写|假设检验代考Hypothesis Testing代考|R and S-PLUS Function tmean

Because it is common to use $20 \%$ trimming, for convenience the R (and S-PLUS) function
$$\operatorname{tmean}(\mathrm{x}, \mathrm{tr}=.2)$$
has been supplied, which computes a $20 \%$ trimmed mean by default using the data stored in the S-PLUS vector $\mathrm{x}$. Here, $\mathrm{x}$ can be any $\mathrm{R}$ or S-PLUS vector containing data. The amount of trimming can be altered using the argument $t r$. So tmean (blob) will compute a $20 \%$ trimmed mean for the data stored in blob, and tmean (blob, tr=.3) will use $30 \%$ trimming instead. For convenience, the function
$$11 \text { oc }(x, \text { est=tmean }, \ldots)$$
is supplied for computing a trimmed mean when data are stored in list mode or a matrix. If $x$ is a matrix, lloc computes the trimmed mean for each column. Other measures of location can be used via the argument est. (For example, est=median will compute the median.) The argument … means that any optional arguments associated with est can be used.

## 统计代写|假设检验代考Hypothesis Testing代考|The Sample Trimmed Mean

$\gamma, 0 \leq \gamma<.5$. 让 $g=[\gamma n]$ ，在哪里 $[\gamma n]$ 是价值 $\gamma n$ 四舍五入到最接近的整数。例如， $[10.9]=10$. 样本修剪均值是通过删除 $g$ 最 大的和 $g$ 最小的观测值并对剩余的值进行平均。在符号中，样本修剪平圴值是
$$\bar{X} t=\frac{X(g+1)+\cdots+X_{(n-g)}}{n-2 g} .$$

$$\frac{1}{n(1-2 \gamma)}\left(\sum_{i=g+1}^{n-g} X_{(i)}+(g-\gamma n)\left(X_{(g)}+X_{(n-g+1)}\right)\right)$$

## 统计代写|假设检验代考Hypothesis Testing代考|R and S-PLUS Function tmean

$$\operatorname{tmean}(\mathrm{x}, \mathrm{tr}=.2)$$

$$11 \text { oc }(x, \text { est }=\operatorname{tmean}, \ldots)$$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Hypothesis Testing, 假设检验, 数据科学代写, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 统计代写|假设检验代考Hypothesis Testing代考|A Bootstrap Estimate of a Standard Error

It is convenient to begin with a description of the most basic bootstrap method for estimating a standard error. Let $\hat{\theta}$ be any estimator based on a random sample of observations, $X_1, \ldots, X_n$. The goal is to estimate $\operatorname{VAR}(\hat{\theta})$, the squared standard error of $\hat{\theta}$. The strategy used by the bootstrap method is based on a very simple idea. Temporarily assume that observations are randomly sampled from some known distribution, $F$. Then for a given sample size, $n$, the sampling distribution of $\hat{\theta}$ could be determined by randomly generating $n$ observations from $F$, computing $\hat{\theta}$, randomly generating another set of $n$ observations, computing $\hat{\theta}$, and repeating this process many times. Suppose this is done $B$ times and the resulting values for $\hat{\theta}$ are labeled $\hat{\theta}1, \ldots, \hat{\theta}_B$. If $B$ is large enough, the values $\hat{\theta}_1, \ldots, \hat{\theta}_B$ provide a good approximation of the distribution of $\hat{\theta}$. In particular, they provide an estimate of the squared standard error of $\hat{\theta}$, namely, $$\frac{1}{B-1} \sum{b=1}^B\left(\hat{\theta}b-\bar{\theta}\right)^2,$$ where $$\bar{\theta}=\frac{1}{B} \sum{b=1}^B \hat{\theta}_b .$$
That is, $\operatorname{VAR}(\hat{\theta})$ is estimated with the sample variance of the values $\hat{\theta}_1, \ldots, \hat{\theta}_B$. If, for example, $\hat{\theta}$ is taken to be the sample mean, $\bar{X}$, then the squared standard error would be found to be $\sigma^2 / n$, approximately, provided $B$ is reasonably large. Of course when working with the mean, it is known that its squared standard error is $\sigma^2 / n$, so the method just described is unnecessary. The only point is that a reasonable method for estimating the squared standard error of $\hat{\theta}$ has been described.

## 统计代写|假设检验代考Hypothesis Testing代考|R and S-PLUS Function bootse

As explained in Section 1.7, $\mathrm{R}$ and S-PLUS functions have been written for applying the methods described in this book. The software written for this book is free, and a single command incorporates them into your version of $\mathrm{R}$ or S-PLUS. Included is the function
$$\text { bootse ( } x, n \text { boot }=1000 \text {, est=median }) \text {, }$$
which can be used to compute a bootstrap estimate of the standard error of virtually any estimator covered in this book. Here, $\mathrm{x}$ is any $\mathrm{R}$ (or S-PLUS) variable containing the data. The argument nboot represents $B$, the number of bootstrap samples, and defaults to 1000 if not specified. (As is done with all $\mathrm{R}$ and S-PLUS functions, optional arguments are indicated by $a \mathrm{n}=$ and they default to the value shown. Here, for example, the value of nboot is taken to be 1000 if no value is specified by the user.) The argument est indicates the estimator for which the standard error is to be computed. If not specified, est defaults to the median. That is, the standard error of the usual sample median will be estimated. So, for example, if data are stored in the R variable blob, the command bootse (blob) will return the estimated standard error of the usual sample median.

## 统计代写|假设检验代考Hypothesis Testing代考|R and S-PLUS Function bootse

$$\text { bootse }(x, n \text { boot }=1000 \text {, est=median }) \text {, }$$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Hypothesis Testing, 假设检验, 数据科学代写, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 数据科学代写|假设检验代考Hypothesis Testing代考|Basic Tools for Judging Robustness

There are three basic tools used to establish whether quantities such as measures of location and scale have good properties: qualitative robustness, quantitative robustness, and infinitesimal robustness. This section describes these tools in the context of location measures, but they are relevant to measures of scale, as will become evident. These tools not only provide formal methods for judging a particular measure, they can be used to help derive measures that are robust.

Before continuing, it helps to be more formal about what is meant by a measure of location. A quantity that characterizes a distribution, such as the population mean, is said to be a measure of location if it satisfies four conditions, and a fifth is sometimes added. To describe them, let $X$ be a random variable with distribution $F$, and let $\theta(X)$ be some descriptive measure of $F$. Then $\theta(X)$ is said to be a measure of location if for any constants $a$ and $b$,

The first condition is called location equivariance. It simply requires that if a constant $b$ is added to every possible value of $X$, a measure of location should be increased by the same amount. Let $E(X)$ denote the expected value of $X$. From basic principles, the population mean is location equivariant. That is, if $\theta(X)=E(X)=\mu$, then $\theta(X+b)=E(X+b)=\mu+b$. The first three conditions, taken together, imply that a measure of location should have a value within the range of possible values of $X$. The fourth condition is called scale equivariance. If the scale by which something is measured is altered by multiplying all possible values of $X$ by $a$, a measure of location should be altered by the same amount. In essence, results should be independent of the scale of measurement. As a simple example, if the typical height of a man is to be compared to the typical height of a woman, it should not matter whether the comparisons are made in inches or feet.

## 数据科学代写|假设检验代考Hypothesis Testing代考|Qualitative Robustness

To understand qualitative robustness, it helps to begin by considering any function $f(x)$, not necessarily a probability density function. Suppose it is desired to impose a restriction on this function so that it does not change drastically with small changes in $x$. One way of doing this is to insist that $f(x)$ be continuous. If, for example, $f(x)=0$ for $x \leq 1$, but $f(x)=10,000$ for any $x>1$, the function is not continuous, and if $x=1$, an arbitrarily small increase in $x$ results in a large increase in $f(x)$.

A similar idea can be used when judging a measure of location. This is accomplished by viewing parameters as functionals. In the present context, a functional is just a rule that maps every distribution into a real number. For example, the population mean can be written as
$$T(F)=E(X),$$
where the expected value of $X$ depends on $F$. The role of $F$ becomes more explicit if expectation is written in integral form, in which case this last equation becomes
$$T(F)=\int_{-\infty}^{\infty} x d F(x) .$$
If $X$ is discrete and the probability function corresponding to $F(x)$ is $f(x)$,
$$T(F)=\sum x f(x),$$
where the summation is over all possible values $x$ of $X$.
One advantage of viewing parameters as functionals is that the notion of continuity can be extended to them. Thus, if the goal is to have measures of location that are relatively unaffected by small shifts in $F$, a requirement that can be imposed is that when viewed as a functional, it is continuous. Parameters with this property are said to have qualitative robustness.

## 数据科学代写|假设检验代考Hypothesis Testing代考|Qualitative Robustness

$$T(F)=E(X),$$

$$T(F)=\int_{-\infty}^{\infty} x d F(x) .$$

$$T(F)=\sum x f(x),$$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Hypothesis Testing, 假设检验, 数据科学代写, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 数据科学代写|假设检验代考Hypothesis Testing代考|The Influence Curve

This section gives one more indication of why robust methods are of interest by introducing the influence curve, as described by Mosteller and Tukey (1977). It bears a close resemblance to the influence function, which plays an important role in subsequent chapters, but the influence curve is easier to understand. In general, the influence curve indicates how any statistic is affected by an additional observation having value $x$. In particular it graphs the value of a statistic versus $x$.

As an illustration, let $\bar{X}$ be the sample mean corresponding to the random sample $X_1, \ldots, X_n$. Suppose we add an additional value, $x$, to the $n$ values already available, so now there are $n+1$ observations. Of course this additional value will in general affect the sample mean, which is now $\left(x+\sum X_i\right) /(n+1)$. It is evident that as $x$ gets large, the sample mean of all $n+1$ observations increases. The influence curve plots $x$ versus
$$\frac{1}{n+1}\left(x+\sum X_i\right)$$
the idea being to illustrate how a single value can influence the value of the sample mean. Note that for the sample mean, the graph is a straight line with slope $1 /(n+1)$, the point being that the curve increases without bound. Of course, as $n$ gets large, the slope decreases, but in practice there might be two or more unusual values that dominate the value of $\bar{X}$.

Now consider the usual sample median, $M$. Let $X_{(1)} \leq \cdots \leq X_{(n)}$ be the observations written in ascending order. If $n$ is odd, let $m=(n+1) / 2$, in which case $M=X_{(m)}$, the $m$ th largest-order statistic. If $n$ is even, let $m=n / 2$, in which case $M=\left(X_{(m)}+X_{(m+1)}\right) / 2$. To be more concrete, consider the values

$\begin{array}{llllllllll}2 & 4 & 6 & 7 & 8 & 10 & 14 & 19 & 2128 .\end{array}$
Then $n=10$ and $M=(8+10) / 2=9$. Suppose an additional value, $x$, is added so that now $n=11$. If $x>10$, then $M=10$, regardless of how large $x$ might be. If $x<8, M=8$ regardless of how small $x$ might be. As $x$ increases from 8 to $10, M$ increases from 8 to 10 as well. The main point is that in contrast to the sample mean, the median has a bounded influence curve. In general, if the goal is to minimize the influence of a relatively small number of observations on a measure of location, attention might be restricted to those measures having a bounded influence curve. A concern with the median, however, is that its standard error is large relative to the standard error of the mean when sampling from a normal distribution, so there is interest in searching for other measures of location having a bounded influence curve but that have reasonably small standard errors when distributions are normal.

## 数据科学代写|假设检验代考Hypothesis Testing代考|The Central Limit Theorem

When working with means or least squares regression, certainly the bestknown method for dealing with nonnormality is to appeal to the central limit theorem. Put simply, under random sampling, if the sample size is sufficiently large, the distribution of the sample mean is approximately normal under fairly weak assumptions. A practical concern is the description sufficiently large. Just how large must $n$ be to justify the assumption that $\bar{X}$ has a normal distribution? Early studies suggested that $n=40$ is more than sufficient, and there was a time when even $n=25$ seemed to suffice. These claims were not based on wild speculations, but more recent studies have found that these early investigations overlooked two crucial aspects of the problem.

The first is that early studies looking into how quickly the sampling distribution of $\bar{X}$ approaches a normal distribution focused on very light-tailed distributions, where the expected proportion of outliers is relatively low. In particular, a popular way of illustrating the central limit theorem was to consider the distribution of $\bar{X}$ when sampling from a uniform or exponential distribution. These distributions look nothing like a normal curve, the distribution of $\bar{X}$ based on $n=40$ is approximately normal, so a natural speculation is that this will continue to be the case when sampling from other nonnormal distributions. But more recently it has become clear that as we move toward more heavy-tailed distributions, a larger sample size is required.

The second aspect being overlooked is that when making inferences based on Student’s $t$, the distribution of $t$ can be influenced more by nonnormality than the distribution of $\bar{X}$. Even when sampling from a relatively light-tailed distribution, practical problems arise when using Student’s $\mathrm{t}$, as will be illustrated in Section 4.1. When sampling from heavy-tailed distributions, even $n=300$ might not suffice when computing a $.95$ confidence interval.

## 数据科学代写|假设检验代考Hypothesis Testing代考|The Influence Curve

$$\frac{1}{n+1}\left(x+\sum X_i\right)$$

$\begin{array}{llllllllll}2 & 4 & 6 & 7 & 8 & 10 & 14 & 19 & 2128 .\end{array}$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

Posted on Categories:Hypothesis Testing, 假设检验, 数据科学代写, 统计代写, 统计代考

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 数据科学代写|假设检验代考Hypothesis Testing代考|Problems with Assuming Normality

To begin, distributions are never normal. For some this seems obvious, hardly worth mentioning. But an aphorism given by Cramér (1946) and attributed to the mathematician Poincaré remains relevant: “Everyone believes in the [normal] law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact.” Granted, the normal distribution is the most important distribution in all of statistics. But in terms of approximating the distribution of any continuous distribution, it can fail to the point that practical problems arise, as will become evident at numerous points in this book. To believe in the normal distribution implies that only two numbers are required to tell us everything about the probabilities associated with a random variable: the population mean $\mu$ and population variance $\sigma^2$. Moreover, assuming normality implies that distributions must be symmetric.

Of course, nonnormality is not, by itself, a disaster. Perhaps a normal distribution provides a good approximation of most distributions that arise in practice, and of course there is the central limit theorem, which tells us that under random sampling, as the sample size gets large, the limiting distribution of the sample mean is normal. Unfortunately, even when a normal distribution provides a good approximation to the actual distribution being studied (as measured by the Kolmogorov distance function, described later), practical problems arise. Also, empirical investigations indicate that departures from normality that have practical importance are rather common in applied work (e.g., M. Hill and Dixon, 1982; Micceri, 1989; Wilcox, 1990a). Even over a century ago, Karl Pearson and other researchers were concerned about the assumption that observations follow a normal distribution (e.g., Hand, 1998, p. 649). In particular, distributions can be highly skewed, they can have heavy tails (tails that are thicker than a normal distribution), and random samples often have outliers (unusually large or small values among a sample of observations). Outliers and heavy-tailed distributions are a serious practical problem because they inflate the standard error of the sample mean, so power can be relatively low when comparing groups. Modern robust methods provide an effective way of dealing with this problem. Fisher (1922), for example, was aware that the sample mean could be inefficient under slight departures from normality.

A classic way of illustrating the effects of slight departures from normality is with the contaminated, or mixed, normal distribution (Tukey, 1960). Let $X$ be a standard normal random variable having distribution $\Phi(x)=P(X \leq x)$. Then for any constant $K>0, \Phi(x / K)$ is a normal distribution with standard deviation $K$. Let $\epsilon$ be any constant, $0 \leq \epsilon \leq 1$. The contaminated normal distribution is
$$H(x)=(1-\epsilon) \Phi(x)+\epsilon \Phi(x / K),$$
which has mean 0 and variance $1-\epsilon+\epsilon K^2$. (Stigler, 1973, finds that the use of the contaminated normal dates back at least to Newcomb, 1896.) In other words, the contaminated normal arises by sampling from a standard normal distribution with probability $1-\epsilon$; otherwise sampling is from a normal distribution with mean 0 and standard deviation $K$.

## 数据科学代写|假设检验代考Hypothesis Testing代考|Transformations

Transforming data has practical value in a variety of situations. Emerson and Stoto (1983) provide a fairly elementary discussion of the various reasons one might transform data and how it can be done. The only important point here is that simple transformations can fail to deal effectively with outliers and heavy-tailed distributions. For example, the popular strategy of taking logarithms of all the observations does not necessarily reduce problems due to outliers, and the same is true when using Box-Cox transformations instead (e.g., Rasmussen, 1989; Doksum and Wong, 1983). Other concerns were expressed by G. L. Thompson and Amman (1990). Better strategies are described in subsequent chapters.

Perhaps it should be noted that when using simple transformations on skewed data, if inferences are based on the mean of the transformed data, then attempts at making inferences about the mean of the original data, $\mu$, have been abandoned. That is, if the mean of the transformed data is computed and we transform back to the original data, in general we do not get an estimate of $\mu$.

## 数据科学代写|假设检验代考Hypothesis Testing代考|Problems with Assuming

$$H(x)=(1-\epsilon) \Phi(x)+\epsilon \Phi(x / K)$$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。