# 计算机代写|机器学习代写Machine Learning代考|ENGG3300 Probability Density Functions (PDFs)

## 计算机代写|机器学习代写Machine Learning代考|Probability Density Functions (PDFs)

In many cases, we wish to handle data that can be represented as a real-valued random variable, or a real-valued vector $\mathbf{x}=\left[x_1, x_2, \ldots, x_n\right]^T$. Most of the intuitions from discrete variables transfer directly to the continuous case, although there are some subtleties.

We describe the probabilities of a real-valued scalar variable $x$ with a Probability Density Function (PDF), written $p(x)$. Any real-valued function $p(x)$ that satisfies:
$$\begin{array}{rlr} p(x) & \geq 0 \quad \text { for all } x \ \int_{-\infty}^{\infty} p(x) d x & =1 \end{array}$$
is a valid PDF. I will use the convention of upper-case $P$ for discrete probabilities, and lower-case $p$ for PDFs.

With the PDF we can specify the probability that the random variable $x$ falls within a given range:
$$P\left(x_0 \leq x \leq x_1\right)=\int_{x_0}^{x_1} p(x) d x$$
This can be visualized by plotting the curve $p(x)$. Then, to determine the probability that $x$ falls within a range, we compute the area under the curve for that range.

The PDF can be thought of as the infinite limit of a discrete distribution, i.e., a discrete distribution with an infinite number of possible outcomes. Specifically, suppose we create a discrete distribution with $N$ possible outcomes, each corresponding to a range on the real number line. Then, suppose we increase $N$ towards infinity, so that each outcome shrinks to a single real number; a PDF is defined as the limiting case of this discrete distribution.

There is an important subtlety here: a probability density is not a probability per se. For one thing, there is no requirement that $p(x) \leq 1$. Moreover, the probability that $x$ attains any one specific value out of the infinite set of possible values is always zero, e.g. $P(x=5)=$ $\int_5^5 p(x) d x=0$ for any PDF $p(x)$. People (myself included) are sometimes sloppy in referring to $p(x)$ as a probability, but it is not a probability – rather, it is a function that can be used in computing probabilities.

## 计算机代写|机器学习代写Machine Learning代考|Mathematical expectation, mean, and variance

Some very brief definitions of ways to describe a PDF:
Given a function $f(\mathbf{x})$ of an unknown variable $\mathbf{x}$, the expected value of the function with repect to a PDF $p(\mathbf{x})$ is defined as:
$$E_{p(\mathbf{x})}[f(\mathbf{x})] \equiv \int f(\mathbf{x}) p(\mathbf{x}) d \mathbf{x}$$
Intuitively, this is the value that we roughly “expect” $\mathrm{x}$ to have.
The mean $\boldsymbol{\mu}$ of a distribution $p(\mathbf{x})$ is the expected value of $\mathbf{x}$ :
$$\boldsymbol{\mu}=E_{p(\mathbf{x})}[\mathbf{x}]=\int \mathbf{x} p(\mathbf{x}) d \mathbf{x}$$
The variance of a scalar variable $x$ is the expected squared deviation from the mean:
$$E_{p(x)}\left[(x-\mu)^2\right]=\int(x-\mu)^2 p(x) d x$$
The variance of a distribution tells us how uncertain, or “spread-out” the distribution is. For a very narrow distribution $E_{p(x)}\left[(x-\mu)^2\right]$ will be small.
The covariance of a vector $\mathrm{x}$ is a matrix:
$$\boldsymbol{\Sigma}=\operatorname{cov}(\mathbf{x})=E_{p(\mathbf{x})}\left[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^T\right]=\int(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^T p(x) d \mathbf{x}$$
By inspection, we can see that the diagonal entries of the covariance matrix are the variances of the individual entries of the vector:
$$\boldsymbol{\Sigma}{i i}=\operatorname{var}\left(x{i i}\right)=E_{p(\mathbf{x})}\left[\left(x_i-\mu_i\right)^2\right]$$

