# 数学代写|信息论代写Information Theory代考|General Model for a Channel Code

## 数学代写|信息论代写Information Theory代考|General Model for a Channel Code

We first generalize some of the ideas in the above example. We still have Alice trying to communicate with Bob, but this time, she wants to be able to transmit a larger set of messages with asymptotically perfect reliability, rather than merely sending “0” or “1.” Suppose that she selects messages from a message set $[M]$ that consists of $M$ messages:
$$[M] \equiv{1, \ldots, M}$$
Suppose furthermore that Alice chooses a particular message $m$ with uniform probability from the set $[M]$. This assumption of a uniform distribution for Alice’s messages indicates that we do not really care much about the content of the actual message that she is transmitting. We just assume total ignorance of her message because we only really care about her ability to send any message reliably. The message set $[M]$ requires $\log (M)$ bits to represent it, where the logarithm is again base two. This number becomes important when we calculate the rate of a channel code.

The next aspect of the model that we need to generalize is the noisy channel that connects Alice to Bob. We used the bit-flip channel before, but this channel is not general enough for our purposes. A simple way to extend the channel model is to represent it as a conditional probability distribution involving an input random variable $X$ and an output random variable $Y$ :
$$\mathcal{N}: \quad p_{Y \mid X}(y \mid x)$$
We use the symbol $\mathcal{N}$ to represent this more general channel model. One assumption that we make about random variables $X$ and $Y$ is that they are discrete, but the respective sizes of their outcome sets do not have to match. The other assumption that we make concerning the noisy channel is that it is i.i.d. Let $X^n \equiv X_1 X_2 \cdots X_n$ and $Y^n \equiv Y_1 Y_2 \cdots Y_n$ be the random variables associated with respective sequences $x^n \equiv x_1 x_2 \cdots x_n$ and $y^n \equiv y_1 y_2 \cdots y_n$. If Alice inputs the sequence $x^n$ to the $n$ inputs of $n$ respective uses of the noisy channel, a possible output sequence may be $y^n$. The i.i.d. assumption allows us to factor the conditional probability of the output sequence $y^n$ :
\begin{aligned} p_{Y^n \mid X^n}\left(y^n \mid x^n\right) & =p_{Y_1 \mid X_1}\left(y_1 \mid x_1\right) p_{Y_2 \mid X_2}\left(y_2 \mid x_2\right) \cdots p_{Y_n \mid X_n}\left(y_n \mid x_n\right) \ & =p_{Y \mid X}\left(y_1 \mid x_1\right) p_{Y \mid X}\left(y_2 \mid x_2\right) \cdots p_{Y \mid X}\left(y_n \mid x_n\right) \ & =\prod_{i=1}^n p_{Y \mid X}\left(y_i \mid x_i\right) . \end{aligned}

## 数学代写|信息论代写Information Theory代考|Proof Sketch of Shannon’s Channel Coding Theorem

We are now ready to present an overview of Shannon’s technique for proving the existence of a code that can achieve the capacity of a given noisy channel. Some of the methods that Shannon uses in his outline of a proof are similar to those in the first coding theorem. We again use the channel a large number of times so that the law of large numbers from probability theory comes into play and allow for a small probability of error that vanishes as the number of channel uses becomes large. If the notion of typical sequences is so important in the first coding theorem, we might suspect that it should be important in the noisy channel coding theorem as well. The typical set captures a certain notion of efficiency because it is a small set when compared to the set of all sequences, but it is the set that has almost all of the probability. Thus, we should expect this efficiency to come into play somehow in the channel coding theorem.

The aspect of Shannon’s technique for proving the noisy channel coding theorem that is different from the other ideas in the first theorem is the idea of random coding. Shannon’s technique adds a third layer of randomness to the model given above (recall that the first two are Alice’s random message and the random nature of the noisy channel).

The third layer of randomness is to choose the codewords themselves in a random fashion according to a random variable $X$, where we choose each letter $x_i$ of a given codeword $x^n$ independently according to the distribution $p_X\left(x_i\right)$. It is for this reason that we model the channel inputs as a random variable. We can then write each codeword as a random variable $X^n(m)$. The probability distribution for choosing a particular codeword $x^n(m)$ is
\begin{aligned} \operatorname{Pr}\left{X^n(m)=x^n(m)\right} & =p_{X_1, X_2, \ldots, X_n}\left(x_1(m), x_2(m), \ldots, x_n(m)\right) \ & =p_X\left(x_1(m)\right) p_X\left(x_2(m)\right) \cdots p_X\left(x_n(m)\right) \ & =\prod_{i=1}^n p_X\left(x_i(m)\right) . \end{aligned}

## 数学代写|信息论代写Information Theory代考|General Model for a Channel Code

$$[M] \equiv{1, \ldots, M}$$

$$\mathcal{N}: \quad p_{Y \mid X}(y \mid x)$$

\begin{aligned} p_{Y^n \mid X^n}\left(y^n \mid x^n\right) & =p_{Y_1 \mid X_1}\left(y_1 \mid x_1\right) p_{Y_2 \mid X_2}\left(y_2 \mid x_2\right) \cdots p_{Y_n \mid X_n}\left(y_n \mid x_n\right) \ & =p_{Y \mid X}\left(y_1 \mid x_1\right) p_{Y \mid X}\left(y_2 \mid x_2\right) \cdots p_{Y \mid X}\left(y_n \mid x_n\right) \ & =\prod_{i=1}^n p_{Y \mid X}\left(y_i \mid x_i\right) . \end{aligned}

## 数学代写|信息论代写Information Theory代考|Proof Sketch of Shannon’s Channel Coding Theorem

\begin{aligned} \operatorname{Pr}\left{X^n(m)=x^n(m)\right} & =p_{X_1, X_2, \ldots, X_n}\left(x_1(m), x_2(m), \ldots, x_n(m)\right) \ & =p_X\left(x_1(m)\right) p_X\left(x_2(m)\right) \cdots p_X\left(x_n(m)\right) \ & =\prod_{i=1}^n p_X\left(x_i(m)\right) . \end{aligned}

