# HEIGHTS OF UNION SOLDIERS

## 数学网课代修|概率统计代写PROBABILITY AND STATISTICS代考|HEIGHTS OF UNION SOLDIERS

Figure 2.9 shows the height of more than 20,000 Union soldiers at the time of the Civil War. The area in the histogram covering the interval of heights from $65.0$ inches to just below $67.0$ inches has been shaded dark grey. Of them 5578 soldiers were in this height range – that is, $27.6 \%$ of all the soldiers in the sample. The green area occupies $27.6 \%$ of the area of the histogram. The probability of any soldier chosen at random being within this height interval would be $0.276$. This is another empirical distribution, but it conforms very closely to what a Gaussian distribution would lead us to expect. The survey was organised by Benjamin Gould, one of the founders of the US National Academy of Science in 1863, an astronomer, and amongst his many other scientific roles, actuary to the US Sanitary Commission, who in the face of often-determined bureaucratic resistance collected and tabulated the vital statistics of troops in order to test ‘those hygienic and physiological laws which are already known’ and ‘to discover and apply such other laws as might affect the welfare and success of our soldiers’ (Comstock, 1922, p. 162). The data he compiled proved an immensely valuable resource in the study of health, demography and many other fields. Soldiers’ heights were measured to the nearest half inch. The mean height of the 20,207 soldiers he studied was 67 inches and the standard deviation found to be $2.58$ inches. We might therefore expect to see a total of around $0.05^{*} 20207 \approx 1010$ soldiers who were either taller than $67+1.96 * 2.58=72$ inches or shorter than $67-$ $1.96 * 2.58=62$ inches. Gould observed 1055 such soldiers, remarkably close to what we might expect if height followed a perfectly Gaussian distribution.

## 数学网课代修|概率统计代写PROBABILITY AND STATISTICS代考|PROBABILITY DISTRIBUTIONS AND VARIABLES

By now, it should have become clear that probability distributions and variables are the same beast. Each variable describes a trial with a sample space. The values of variables correspond to the trial outcomes in the sample space for that variable. Each observation or case is one instance of that trial. This raises the question ‘How do we ensure that the trials are identical and independent?’ If they were not identical and independent, there would be no point in their repetition because we would have no means either of comparing them or of calculating the results obtained from more than one trial. It would be rather like trying to add apples and bicycles together. If I flip a coin a few times, I can reasonably claim to be repeating an identical and independent trial. But in what way is asking a collection of different people a survey question the repetition of an identical and independent trial? I must be able to claim that the respondents share something in common that gives them the same status as the virtual coin I flipped, and that it is this that I can make probability statements about, just as I could about the coin. What they have in common is membership of a target population. Just as my probability statements about coin flips refer to the coin, and not to any individual outcome from a flip, so too do my statements about survey respondents, whether expressed as proportions or probabilities, refer to them as a group and not to individual members of that group. The target population can be $48 \%$ male. My probability of picking a man at random from it can be $0.48$, but there are no individuals in it who are $48 \%$ male, or $67 \%$ agree with a statement or possess precisely the mean age, income or height of the target population. Neither are there individual soldiers who have $2.5 \%$ of their height above 6 feet, or babies whose individual weight has any standard deviation.

