# Statistical Signifcance as an Indicator of Research Quality

## Statistical Signifcance as an Indicator of Research Quality

In recent years, a number of writers (e.g., Atkinson, Furlong, \& Wampold, 1982; Greenwald, 1975; Sterling, 1959; Walster \& Cleary, 1970) have drawn attention to the fact that we treat statistical significance as a desirable object in itself and as an indication of the quality of research. Sterling (1959) and Atkinson et al. (1982) have documented that journals overwhelmingly publish articles reporting statistically significant results. Atkinson et al. submitted a manuscript, less the discussion section, to 50 consulting editors, and found that the same piece of research was rather clearly accepted or rejected according to whether the results were significant or not. ${ }^{13}$ And certainly many graduate students can testify to having been sent back to collect more data for their dissertations or to revise their hypotheses when the results were not significant.
Logically one would think that the merit and informativeness of a piece of research depend on such characteristics as the importance of the question that was asked and the care taken in the design and measures, rather than whether the answer to the question turned out to be yes or no. The rationale for this curious value system was stated by no less portentous an authority than the APA Publication Manual:
Negative results lacking a theoretical context are basically uninterpretable. Even when the theoretical basis for the prediction is clear and defensible, the burden of methodological precision falls heavily on the investigator who reports negative results. .. . Failure to replicate results of a previous investigator, using the same method but a different sample, is generally of questionable value. A single failure may merely testify to sampling error or to the conclusion that one of the two samples had unique characteristics responsible for the reported effect, or the lack of effect. (American Psychological Association [APA], 1974, p. 21)
Atkinson et al. make the obvious but important point that sampling variability is an equally valid explanation for the original, significant result. This passage was deleted from the third edition of the Publication Manual, but the field as a whole has not responded so nimbly.

## 统计代写|统计推断代考Statistical Inference代写|Epistemic Versus Behavioral Orientation

Statements like Guilford’s and Kerlinger’s, quoted above, indicate very clearly their belief -and the sense of the statements is surely close to most psychologists’ understanding-that we need statistical inference fundamentally for epistemic purposes, for the evaluation of hypotheses, and hence, if either the Fisherian or the Neyman-Pearson rationale for significance testing were relevant to psychology, it would clearly be the former. And, indeed, one searches the literature in vain for any argument in favor of the Neyman-Pearson approach, as against the Fisherian, in psychological research. If such arguments were to be found, they would presumably be given in the statistics textbooks; but there, however careful the exposition of the Neyman-Pearson doctrine, the rationale presented seems inevitably to be Fisherian. Kempthorne (1972), in an essay in honor of George Snedecor (who was nearly 90 at the time), notes that the Neyman-Pearson theory is seldom practiced as it is preached, that research workers do not in fact take the decision orientation seriously.
From the viewpoint of the Neyman-Pearson theory of testing hypotheses-or as this author prefers, the Neyman-Pearson theory of accept-reject rules-an inspector is not permitted the following thought process. Suppose two particular data points $D_1$ and $D_2$ fall in the rejection region of size $\alpha=0.05$. Suppose also that $D_1$ falls in the region of size $\alpha=0.01$ and $\mathrm{D}_2$ does not. Then it is very natural to take the view that $\mathrm{D}_1$ disagrees with the null hypothesis more than $\mathrm{D}_2$. But to use phraseology that is becoming current, this would be an evidential conclusion. It appears that no such conclusions are permitted in the Neyman-Pearson theory. Indeed, it can happen that a sample point is in the rejection region of size 0.01 and is not in the rejection region of size 0.05 . It may be true that those who use the Neyman-Pearson theory will reach the evidential conclusion above, and indeed many of the ideas of the theory have been taken over and used in an evidential way. But nothing in the Neyman-Pearson theory permits this activity. (pp. 179-180) I hazard the opinion that Snedecor’s Statistical Methods has had some appeal to scientists and has not been modified in basic outlook by the development of decision theory, because decision theory deals with problems that are so simple (e.g., how to approach the problem of making scrambled eggs) and so simplified as to have no essential relevance to the problems of research and development. (p. 182)

## Epistemic Versus Behavioral Orientation

