## 数学代写|信息论代写Information Theory代考|ARITHMETIC CODING

We could alleviate this loss by using blocks of input symbols – however, the complexity of this approach increases exponentially with block length. We now describe a method of encoding without this inefficiency. In arithmetic coding, instead of using a sequence of bits to represent a symbol, we represent it by a subinterval of the unit interval.

The code for a sequence of symbols is an interval whose length decreases as we add more symbols to the sequence. This property allows us to have a coding scheme that is incremental (the code for an extension to a sequence can be calculated simply from the code for the original sequence) and for which the codeword lengths are not restricted to be integral. The motivation for arithmetic coding is based on Shannon-Fano-Elias coding (Section 5.9) and the following lemma:

Lemma 13.3.1 Let $Y$ be a random variable with continuous probability distribution function $F(y)$. Let $U=F(Y)$ (i.e., $U$ is a function of $Y$ defined by its distribution function). Then $U$ is uniformly distributed on $[0,1]$.
Proof: Since $F(y) \in[0,1]$, the range of $U$ is $[0,1]$. Also, for $u \in[0,1]$,
\begin{aligned} F_U(u) & =\operatorname{Pr}(U \leq u) \ & =\operatorname{Pr}(F(Y) \leq u) \ & =\operatorname{Pr}\left(Y \leq F^{-1}(u)\right) \ & =F\left(F^{-1}(u)\right) \ & =u, \end{aligned}
which proves that $U$ has a uniform distribution in $[0,1]$.

## 数学代写|信息论代写Information Theory代考|LEMPEL-ZIV CODING

In Section 13.3 we discussed the basic ideas of arithmetic coding and mentioned some results on worst-case redundancy for coding a sequence from an unknown distribution. We now discuss a popular class of techniques for source coding that are universally optimal (their asymptotic compression rate approaches the entropy rate of the source for any stationary ergodic source) and simple to implement. This class of algorithms is termed Lempel-Ziv, named after the authors of two seminal papers $[603,604]$ that describe the two basic algorithms that underlie this class. The algorithms could also be described as adaptive dictionary compression algorithms.

The notion of using dictionaries for compression dates back to the invention of the telegraph. At the time, companies were charged by the number of letters used, and many large companies produced codebooks for the frequently used phrases and used the codewords for their telegraphic communication. Another example is the notion of greetings telegrams that are popular in India-there is a set of standard greetings such as “25:Merry Christmas” and “26:May Heaven’s choicest blessings be showered on the newly married couple.” A person wishing to send a greeting only needs to specify the number, which is used to generate the actual greeting at the destination.

The idea of adaptive dictionary-based schemes was not explored until Ziv and Lempel wrote their papers in 1977 and 1978. The two papers describe two distinct versions of the algorithm. We refer to these versions as LZ77 or sliding window Lempel-Ziv and LZ78 or tree-structured Lempel-Ziv. (They are sometimes called LZ1 and LZ2, respectively.)
We first describe the basic algorithms in the two cases and describe some simple variations. We later prove their optimality, and end with some practical issues. The key idea of the Lempel-Ziv algorithm is to parse the string into phrases and to replace phrases by pointers to where the same string has occurred in the past. The differences between the algorithms is based on differences in the set of possible match locations (and match lengths) the algorithm allows.

## 数学代写|信息论代写Information Theory代考|ARITHMETIC CODING

