# CS代写|计算机网络代写Computer Networking代考|CIS654 A Reality Check Approach

## CS代写|计算机网络代写Computer Networking代考|A Reality Check Approach

To avoid data snooping problems, it is possible to use the reality check as in White (2000) and the modification for nested models as proposed in Clark and McCracken (2012a, Clark and McCracken 2012b).

For a given loss function, the reality check tests the null hypothesis that a benchmark model (i.e., model 0) performs equal or better than all competitor models (i.e., models $1, \ldots, k$ ). The alternative is that at least one competitor performs better than the benchmark. Formally, we have
$$H_0: \max {j=1, \ldots, k} \theta_j \leqslant 0 \text { vs } H_1: \max {j=1, \ldots, k} \theta_j>0 .$$
Following a common practice often used to select the best predictive model, the sample of size $N$ is split into $N=R+P$ observations where $R$ observations are used for estimation and $P$ observations are used for predictive evaluation. Let $\hat{u}i=Y_i-f\left(\mathbf{x}_i^j, \hat{w}_R^j\right), i=R+1, \ldots, N$, where $f\left(\mathbf{x}_i^j, \hat{w}_R^j\right)$ is the model estimated on the data set $\left{\left(Y_i, \mathbf{X}_i^j\right), i=1, \ldots, R\right}$. Following White (2000) define the statistic $$S_P=\max {j=1, \ldots, k} S_P(0, j)$$
V=\lim {N \rightarrow \infty} \operatorname{var}\left(\frac{1}{\sqrt{P}} \sum{i=R+1}^N \mathbf{v}i\right) $$where the generic element of vector \mathbf{v}_i is defined as v{i, j}=h\left(u_{0, i}\right)-h\left(u_{i, i}\right). The matrix V is supposed to be positive semi-definite. ## CS代写|计算机网络代写Computer Networking代考|Numerical Examples by Using the Reality Check In order to evaluate the ability of the procedure to select a proper model for a given data generating process, we use simulated data sets with known structure. The first is a linear model (M1) with two regressors defined as:$$
Y=\mathbf{X} 1+\varepsilon
$$where \mathbf{X}=\left(X_1, X_2\right)^T are drawn from the uniform distribution, \varepsilon is a standard Gaussian and \mathbf{1} denotes a column vector of the ones of appropriate length. This model can be correctly modeled by using a network, with skip layer, two input units, and zero hidden units. Model M2 is the same model used in Tibshirani (1996) and Model M3 is the same model used in De Veaux et al. (1998). Both models have already been used in previous sections. We have considered N=600, R=400, P=200 and B=4999. In Table 1.2, we consider values of the test statistics for different input neurons, from X_1 to X_6, and different hidden layer size, from 1 to 6 . It is clear that for model M1 and M2, the proposed procedure is able to identify the correct data-generating process. In the first case, the p-values of the tests are all >0.50, and so the benchmark (i.e., the linear model) shows better expected predictive performance with respect to neural networks of all orders and sizes. In the case of model M2, the values of the test statistics do not change significantly starting from a neural network model with 4 inputs and 2 hidden layer neurons. In the case of model M3, clearly test statistics stabilize starting from a model with 3 inputs (as expected) and 4 hidden layer neurons. The small increases in some test statistics possibly are not significant. ## 计算机网络代写 ## CS代写|计算机网络代写Computer Networking代考|A Reality Check Approach 为了避免数据袴探问题，可以使用 White (2000) 中的现实检龺以及 Clark 和 McCracken (2012a, Clark and McCracken 2012 b ) 中提出的嵌镸模型的修改。 对于给定的损失函数，现实检囩测试基倠模型（即模型 0 ) 的性能等于或优于所有竞争对手模型（即模型 1, \ldots, k) 。另一种选择 是至少有一个竞争对手的表现优于基准。正式地，我们有$$
H_0: \max j=1, \ldots, k \theta_j \leqslant 0 \text { vs } H_1: \max j=1, \ldots, k \theta_j>0 .
$$估。让 \hat{u} i=Y_i-f\left(\mathbf{x}i^j, \hat{w}_R^j\right), i=R+1, \ldots, N ，在哪里 f\left(\mathbf{x}_i^j, \hat{w}_R^j\right) 是在数据集上估计的模型 〈left 的分隔符缺失或无法识别 继 White (2000) 之后定义统计$$ S_P=\max j=1, \ldots, k S_P(0, j) $$) \\其中 vector的通用元筙 \mathbf{v}_i 定义为 v i, j=h\left(u{0, i}\right)-h\left(u_{i, i}\right). 矩阵 V 应该是半正定的。 ## CS代写|计算机网络代写Computer Networking代考|Numerical Examples by Using the Reality Check 为了评估程序为给定数据生成过程选择合适模型的能力，我们使用具有已知结构的模拟数据集。 第一个是线性模型 (M1)，其中两个回归量定义为:$$
Y=\mathbf{X} 1+\varepsilon


M2 模型与 Tibshirani (1996) 中使用的模型相同，M3 模型与 De Veaux 等人使用的模型相同。(1998 年)。这两种模型已经 在前面的部分中使用过。

