数学代写|信息论代写Information Theory代考|SYMMETRIC CHANNELS

The capacity of the binary symmetric channel is $C=1-H(p)$ bits per transmission, and the capacity of the binary erasure channel is $C=1-$ $\alpha$ bits per transmission. Now consider the channel with transition matrix:
$$p(y \mid x)=\left[\begin{array}{lll} 0.3 & 0.2 & 0.5 \ 0.5 & 0.3 & 0.2 \ 0.2 & 0.5 & 0.3 \end{array}\right] .$$
Here the entry in the $x$ th row and the $y$ th column denotes the conditional probability $p(y \mid x)$ that $y$ is received when $x$ is sent. In this channel, all the rows of the probability transition matrix are permutations of each other and so are the columns. Such a channel is said to be symmetric. Another example of a symmetric channel is one of the form
$$Y=X+Z \quad(\bmod c),$$

where $Z$ has some distribution on the integers ${0,1,2, \ldots, c-1}, X$ has the same alphabet as $Z$, and $Z$ is independent of $X$.

In both these cases, we can easily find an explicit expression for the capacity of the channel. Letting $\mathbf{r}$ be a row of the transition matrix, we have
\begin{aligned} I(X ; Y) & =H(Y)-H(Y \mid X) \ & =H(Y)-H(\mathbf{r}) \ & \leq \log |\mathcal{Y}|-H(\mathbf{r}) \end{aligned}
with equality if the output distribution is uniform. But $p(x)=1 /|\mathcal{X}|$ achieves a uniform distribution on $Y$, as seen from
$$p(y)=\sum_{x \in \mathcal{X}} p(y \mid x) p(x)=\frac{1}{|\mathcal{X}|} \sum p(y \mid x)=c \frac{1}{|\mathcal{X}|}=\frac{1}{|\mathcal{Y}|},$$
where $c$ is the sum of the entries in one column of the probability transition matrix.
Thus, the channel in (7.17) has the capacity
$$C=\max _{p(x)} I(X ; Y)=\log 3-H(0.5,0.3,0.2)$$
and $C$ is achieved by a uniform distribution on the input.
The transition matrix of the symmetric channel defined above is doubly stochastic. In the computation of the capacity, we used the facts that the rows were permutations of one another and that all the column sums were equal.

数学代写|信息论代写Information Theory代考|PROPERTIES OF CHANNEL CAPACITY

1. $C \geq 0$ since $I(X ; Y) \geq 0$.
2. $C \leq \log |\mathcal{X}|$ since $C=\max I(X ; Y) \leq \max H(X)=\log |\mathcal{X}|$.
3. $C \leq \log |\mathcal{Y}|$ for the same reason.
4. $I(X ; Y)$ is a continuous function of $p(x)$.
5. $I(X ; Y)$ is a concave function of $p(x)$ (Theorem 2.7.4). Since $I(X ; Y)$ is a concave function over a closed convex set, a local maximum is a global maximum. From properties 2 and 3 , the maximum is finite, and we are justified in using the term maximum rather than supremum in the definition of capacity. The maximum can then be found by standard nonlinear optimization techniques such as gradient search. Some of the methods that can be used include the following:
• Constrained maximization using calculus and the Kuhn-Tucker conditions.
• The Frank-Wolfe gradient search algorithm.
• An iterative algorithm developed by Arimoto [25] and Blahut [65]. We describe the algorithm in Section 10.8.

In general, there is no closed-form solution for the capacity. But for many simple channels it is possible to calculate the capacity using properties such as symmetry. Some of the examples considered earlier are of this form.

So far, we have defined the information capacity of a discrete memoryless channel. In the next section we prove Shannon’s second theorem, which gives an operational meaning to the definition of capacity as the number of bits we can transmit reliably over the channel. But first we will try to give an intuitive idea as to why we can transmit $C$ bits of information over a channel. The basic idea is that for large block lengths, every channel looks like the noisy typewriter channel (Figure 7.4) and the channel has a subset of inputs that produce essentially disjoint sequences at the output.
For each (typical) input $n$-sequence, there are approximately $2^{n H(Y \mid X)}$ possible $Y$ sequences, all of them equally likely (Figure 7.7). We wish to ensure that no two $X$ sequences produce the same $Y$ output sequence. Otherwise, we will not be able to decide which $X$ sequence was sent.
The total number of possible (typical) $Y$ sequences is $\approx 2^{n H(Y)}$. This set has to be divided into sets of size $2^{n H(Y \mid X)}$ corresponding to the different input $X$ sequences. The total number of disjoint sets is less than or equal to $2^{n(H(Y)-H(Y \mid X))}=2^{n I(X ; Y)}$. Hence, we can send at most $\approx 2^{n I(X ; Y)}$ distinguishable sequences of length $n$.

Although the above derivation outlines an upper bound on the capacity, a stronger version of the above argument will be used in the next section to prove that this rate $I$ is achievable with an arbitrarily low probability of error.

Before we proceed to the proof of Shannon’s second theorem, we need a few definitions.

