CS代写|机器学习代写Machine Learning代考|ACDL2022 Gaussian prior

## CS代写|机器学习代写Machine Learning代考|Gaussian prior

Looking at Figure 3.7(a-b), it seems clear that life-spans and movie run-times can be well-modeled by a Gaussian, $\mathcal{N}\left(T \mid \mu, \sigma^2\right)$. Unfortunately, we cannot compute the posterior median in closed form if we use a Gaussian prior, but we can still evaluate it numerically, by solving a $1 \mathrm{~d}$ integration problem. The resulting plot of $\hat{T}(t)$ vs $t$ is shown in Figure $3.8$ (bottom left). For values of $t$ much less than the prior mean, $\mu$, the predicted value of $T$ is about equal to $\mu$, so the left part of the curve is flat. For values of $t$ much greater than $\mu$, the predicted value converges to a line slightly above the diagonal, i.e., $\hat{T}(t)=t+\epsilon$ for some small (and decreasing) $\epsilon>0$.

To see why this behavior makes intuitive sense, consider encountering a man at age 18,39 or 51 : in all cases, a reasonable prediction is that he will live to about $\mu=75$ years. But now imagine meeting a man at age 80 : we probably would not expect him to live much longer, so we predict $\hat{T}(80) \approx 80+\epsilon$.

## CS代写|机器学习代写Machine Learning代考|Power-law prior

Looking at Figure 3.7(c-d), it seems clear that movie grosses and poem length can be modeled by a power law distribution of the form $p(T) \propto T^{-\gamma}$ for $\gamma>0$. (If $\gamma>1$, this is called a Pareto distribution, see ??.) Power-laws are characterized by having very long tails. This captures the fact that most movies make very little money, but a few blockbusters make a lot. The number of lines in various poems also has this shape, since there are a few epic poems, such as Homer’s Odyssey, but most are short, like haikus. Wealth has a similarly skewed distribution in many countries, especially in plutocracies such as the USA (see e.g., inequality.org).

In the case of a power-law prior, $p(T) \propto T^{-\gamma}$, we can compute the posterior median analytically. We have
$$p(t) \propto \int_t^{\infty} T^{-(\gamma+1)} d T=-\left.\frac{1}{\gamma} T^{-\gamma}\right|t ^{\infty}=\frac{1}{\gamma} t^{-\gamma}$$ Hence the posterior becomes $$p(T \mid t)=\frac{T^{-(\gamma+1)}}{\frac{1}{\gamma} t^{-\gamma}}=\frac{\gamma t^\gamma}{T \gamma+1}$$ for values of $T \geq t$. We can derive the posterior median as follows: $$p\left(T>T_M \mid t\right)=\int{T_M}^{\infty} \frac{\gamma t^\gamma}{T^{\gamma+1}} d T=-\left.\left(\frac{t^\gamma}{T}\right)\right|_{T_M} ^{\infty}=\left(\frac{t}{T_M}\right)^\gamma$$
Solving for $T_M$ such that $P\left(T>T_M \mid t\right)=0.5$ gives $T_M=2^{1 / \gamma} t$.
This is plotted in Figure $3.8$ (bottom middle). We see that the predicted duration is some constant multiple of the observed duration. For the particular value of $\gamma$ that best fits the empirical distribution of movie grosses, the optimal prediction is about $50 \%$ larger than the observed quantity. So if we observe that a movie has made $\$ 40 \mathrm{M}$to date, we predict that it will make$\$60 \mathrm{M}$ in total.

As Griffiths and Tenenbaum point out, this rule is inappropriate for quantities that follow a Gaussian prior, such as people’s ages. As they write, “Upon meeting a 10-year-old girl and her 75-year-old grandfather, we would never predict that the girl will live a total of 15 years $(1.5 \times 10)$ and that the grandfather will live to be $112(1.5 \times 75)$.” This shows that people implicitly know what kind of prior to use when solving prediction problems of this kind.

## CS代写|机器学习代写Machine Learning代考|幂律先验

$$p(t) \propto \int_t^{\infty} T^{-(\gamma+1)} d T=-\left.\frac{1}{\gamma} T^{-\gamma}\right|t ^{\infty}=\frac{1}{\gamma} t^{-\gamma}$$因此，对于$T \geq t$的值，后验变成$$p(T \mid t)=\frac{T^{-(\gamma+1)}}{\frac{1}{\gamma} t^{-\gamma}}=\frac{\gamma t^\gamma}{T \gamma+1}$$。我们可以推导出后验中位数如下:$$p\left(T>T_M \mid t\right)=\int{T_M}^{\infty} \frac{\gamma t^\gamma}{T^{\gamma+1}} d T=-\left.\left(\frac{t^\gamma}{T}\right)\right|_{T_M} ^{\infty}=\left(\frac{t}{T_M}\right)^\gamma$$

Griffiths和Tenenbaum指出，这个规则不适用于遵循高斯先验的量，比如人的年龄。正如他们所写的，“在遇到一个10岁的女孩和她75岁的祖父时，我们永远不会预测这个女孩将会活到15年$(1.5 \times 10)$，而祖父将会活到$112(1.5 \times 75)$ .”这表明人们在解决这类预测问题时隐含地知道使用什么样的先验

