## 计算机代写|机器学习代写Machine Learning代考|Parameter estimation

Quite often, we are interested in finding a single estimate of the value of an unknown parameter, even if this means discarding all uncertainty. This is called estimation: determining the values of some unknown variables from observed data. In this chapter, we outline the problem, and describe some of the main ways to do this, including Maximum A Posteriori (MAP), and Maximum Likelihood (ML). Estimation is the most common form of learning – given some data from the world, we wish to “learn” how the world behaves, which we will describe in terms of a set of unknown variables.

Strictly speaking, parameter estimation is not justified by Bayesian probability theory, and can lead to a number of problems, such as overfitting and nonsensical results in extreme cases. Nonetheless, it is widely used in many problems.

## 计算机代写|机器学习代写Machine Learning代考|MAP, ML, and Bayes’ Estimates

We can now define the MAP learning rule: choose the parameter value $\theta$ that maximizes the posterior, i.e.,
\begin{aligned} \hat{\theta} & =\arg \max \theta p(\theta \mid \mathcal{D}) \ & =\arg \max \theta P(\mathcal{D} \mid \theta) p(\theta) \end{aligned}
Note that we don’t need to be able to evaluate the evidence term $p(\mathcal{D})$ for MAP learning, since there are no $\theta$ terms in it.

Very often, we will assume that we have no prior assumptions about the value of $\theta$, which we express as a uniform prior: $p(\theta)$ is a uniform distribution over some suitably large range. In this case, the $p(\theta)$ term can also be ignored from MAP learning, and we are left with only maximizing the likelihood. Hence, the Maximum Likelihood (ML) learning principle (i.e., estimator) is
$$\hat{\theta}{M L}=\arg \max \theta P(\mathcal{D} \mid \theta)$$
It often turns out that it is more convenient to minimize the negative-log of the objective function. Because “- $\ln$ ” is a monotonic decreasing function, we can pose MAP estimation as:
\begin{aligned} \hat{\theta}{\text {MAP }} & =\arg \max \theta P(\mathcal{D} \mid \theta) p(\theta) \ & =\arg \min \theta-\ln (P(\mathcal{D} \mid \theta) p(\theta)) \ & =\arg \min \theta-\ln P(\mathcal{D} \mid \theta)-\ln p(\theta) \end{aligned}

