# 机器学习代考_Machine Learning代考_COMP5318 Density Estimation

As we have seen in the discussion of the statistical data-modeling procedure, before we can apply the plug-in MAP decision rule, the fundamental problem is how to estimate the unknown data distribution based on a finite set of training samples that are presumably drawn from this distribution. This corresponds to a standard problem in statistics, namely, density estimation. As we have seen, we normally take the so-called parametric approach to this problem. In other words, we first choose some parametric probabilistic models, and then the associated parameters are estimated from the finite set of training samples. The advantage of this approach is that we can convert an extremely challenging problem of density estimation into a relatively simple parameter-estimation problem. By estimating the parameters, we find the best fit to the unknown data distribution in the family of some prespecified generative models. Similar to discriminative models, parameter estimation for generative models can also be formulated as a standard optimization problem. The major difference here is that we need to rely on different criteria to construct the objective function for generative models. In the following, we will explore the most popular method for parametric density estimation, namely, maximum-likelihood estimation (MLE).

## 机器学习代考_Machine Learning代考_Maximum-Likelihood Estimation

Assume that we are interested in estimating an unknown data distribution $p(\mathbf{x})$ based on some samples randomly drawn out of this distribution; that is, $\mathscr{D}N=\left{\mathbf{x}_1, \mathbf{x}_2 \cdots, \mathbf{x}_N\right}$, where each sample $\mathbf{x}_i \sim p(\mathbf{x})(\forall i=1,2 \cdots, N)$. An important assumption in density estimation is that we assume these samples are independent and identically distributed (i.i.d.), which means that all these samples are drawn from the same probability distribution, and all of them are mutually independent. As we will see later, the i.i.d. assumption will significantly simplify the parameter-estimation problem in density estimation. In a parametric density-estimation method, we first choose a probabilistic model, $\hat{p}{\boldsymbol{\theta}}(\mathbf{x})$, to approximate this unknown distribution $p(\mathbf{x})$, where $\boldsymbol{\theta}$ denotes the parameters of the chosen model. The unknown model parameters $\theta$ are then estimated from the collected training samples $\mathscr{D}_N$. The most popular method for this parameter estimation problem is the so-called MLE. The basic idea of MLE is to estimate the unknown parameters $\theta$ by maximizing the joint probability of observing all training samples in $D_N$ based on the presumed probabilistic model. That is,

\begin{aligned} \boldsymbol{\theta}{\mathrm{MLE}} &=\arg \max {\boldsymbol{\theta}} \hat{p}{\boldsymbol{\theta}}\left(\mathscr{D}_N\right) \ &=\arg \max {\boldsymbol{\theta}} \hat{p}{\boldsymbol{\theta}}\left(\mathbf{x}_1, \mathbf{x}_2, \cdots, \mathbf{x}_N\right) \ &=\arg \max {\boldsymbol{\theta}} \prod_{i=1}^N \hat{p}_{\boldsymbol{\theta}}\left(\mathbf{x}_i\right) . \end{aligned}

