经济代写|计量经济学代写Introduction to Econometrics代考|Hypothesis testing

To test a set of linear restrictions on the model coefficients, the most common approach is to use a Wald test. We write the set of $q$ linearly independent restrictions as
$$H_0: R \beta=r$$
with the alternative that at least one restriction is violated. Here, $R$ denotes a $q \times K$ matrix of constants, and $r$ a $q$-dimensional vector of constants. For example, if we wish to test that $\beta_2=0$ and $\beta_3=0$, we have
$$R=\left(\begin{array}{llll} 0 & 1 & 0 & \ldots \ 0 & 0 & 1 & \ldots \end{array}\right)$$
and $r=(0,0)^{\prime}$. The Wald test statistic is given by a quadratic form in $R \hat{\beta}-r$, weighted by the inverse of the corresponding estimated covariance matrix. That is,
$$\xi_W=(R \hat{\beta}-r)^{\prime}\left[R \hat{V}(\hat{\beta}) R^{\prime}\right]^{-1}(R \hat{\beta}-r)$$

经济代写|计量经济学代写Introduction to Econometrics代考|p-values and p-hacking

Most modern software provides $p$-values with any test that is done. A $p$-value denotes the probability, under the null hypothesis, to find the reported value of the test statistic or a more extreme one. If the $p$-value is smaller than the significance level (e.g., $5 \%$ ), the null hypothesis is rejected. Checking $p$-values allows researchers to draw their conclusions without consulting the appropriate critical values, making them a convenient piece of information. It also shows the sensitivity of the decision to reject the null hypothesis with respect to the choice of significance level. However, $p$-values are often misinterpreted or misused, as stressed by a recent statement of the American Statistical Association (Wasserstein and Lazar, 2016). For example, it is inappropriate (though a common mistake) to interpret a $p$-value as giving the probability that the null hypothesis is true.

Unfortunately, in empirical work some researchers are overly obsessed with obtaining “significant” results and finding $p$-values smaller than $0.05$ (and this also extends to journal editors). If publication decisions depend on the statistical significance of research findings, the literature as a whole will overstate the size of the true effect. This is referred to as publication bias (or “file drawer” bias). For example, investigating more than 50,000 tests published in three leading economic journals, Brodeur et al. (2016) conclude that the distribution of $p$-values indicates both selection by journals as well as a tendency of researchers to inflate the value of almost-rejected tests by choosing slightly more “significant” specifications. Their analysis is extended in Brodeur et al. (2020), with a focus on inference methods used in causal analysis.
The problem of publication bias relates to the broader problem of $p$-hacking. Even if the null hypothesis is correct, there is always a small probability of rejecting it (corresponding to the size of the test). Such type I errors are rather likely to happen if we use a sequence of many tests to select the regressors to include in the model. This process is referred to as data snooping, data mining or $p$-hacking (see Leamer, 1978; Lovell, 1983). As a result, an extensive specification search may pick up accidental patterns in the data and deliver a seemingly “significant” result with no genuine interpretation or meaning. This problem is potentially a serious issue in empirical finance, where many scholars are using the same databases (such as the Center for Research in Security Prices (CRSP) and Compustat). For example, Lo and MacKinlay (1990) analyse data snooping biases in tests of financial asset pricing models, while Sullivan et al. (2001) analyse the extent to which the presence of calendar effects in stock returns can be attributed to data snooping. Harvey et al. (2016) provide a critical account of the literature on factor models explaining the cross-section of asset returns. To accommodate for the inherent data mining, they suggest that a new factor needs to clear a much higher hurdle, with a $t$-statistic greater than 3.0. However, as argued by Harvey (2017), simply raising the threshold for significance is insufficient, and may unintendedly increase the amount of data mining and, in turn, publication bias. Recently, Mitton (2021) documents large variation in empirical methodology in corporate finance regressions in top finance journals, enabling selective reporting that results from $p$-hacking and publication bias.

