机器学习代写Machine Learning代考|Linear Machines

机器学习代写Machine Learning代考|Linear Machines

The natural generalization of a (two-category) TLU to an $R$-category classifier is the structure, shown in Fig. 4.8, called a linear machine. Here, to use more familiar notation, the $\mathbf{W}$ and $\mathbf{X}$ are meant to be augmented vectors (with an $(n+1)$-st component). Such a structure is also sometimes called a “competitive” net or a “winner-take-all” net. The output of the linear machine is one of the numbers, ${1, \ldots, R}$, corresponding to which dot product is largest. Note that when $R=2$, the linear machine reduces to a TLU with weight vector $\mathbf{W}=\left(\mathbf{W}_1-\mathbf{W}_2\right)$.

The diagram in Fig. 4.9 shows the character of the regions in a 2dimensional space created by a linear machine for $R=5$. In $n$ dimensions, every pair of regions is either separated by a section of a hyperplane or is non-adjacent.

To train a linear machine, there is a straightforward generalization of the 2-category error-correction rule. Assemble the patterns in the training set into a sequence as before.

1. If the machine classifies a pattern correctly, no change is made to any of the weight vectors.
2. If the machine mistakenly classifies a category $u$ pattern, $\mathbf{X}_i$, in category $v(u \neq v)$, then:
$$\mathbf{W}_u \longleftarrow \mathbf{W}_u+c_i \mathbf{X}_i$$

and
$$\mathbf{W}_v \longleftarrow \mathbf{W}_v-c_i \mathbf{X}_i$$
and all other weight vectors are not changed.
This correction increases the value of the $u$-th dot product and decreases the value of the $v$-th dot product. Just as in the 2-category fixed increment procedure, this procedure is guaranteed to terminate, for constant $c_i$, if there exists weight vectors that make correct separations of the training set. Note that when $R=2$, this procedure reduces to the ordinary TLU error-correction procedure. A proof that this procedure terminates is given in [Nilsson, 1990, pp. 88-90] and in [Duda \& Hart, 1973, pp. 174-177].

计算机代写|机器学习代写Machine Learning代考|Motivation and Examples

To classify correctly all of the patterns in non-linearly-separable training sets requires separating surfaces more complex than hyperplanes. One way to achieve more complex surfaces is with networks of TLUs. Consider, for example, the 2-dimensional, even parity function, $f=x_1 x_2+\overline{x_1} \overline{x_2}$. No single line through the 2-dimensional square can separate the vertices $(1,1)$ and $(0,0)$ from the vertices $(1,0)$ and $(0,1)$-the function is not linearly separable and thus cannot be implemented by a single TLU. But, the network of three TLUs shown in Fig. 4.10 does implement this function. In the figure, we show the weight values along input lines to each TLU and the threshold value inside the circle representing the TLU.

The function implemented by a network of TLUs depends on its topology as well as on the weights of the individual TLUs. Feedforward networks have no cycles; in a feedforward network no TLU’s input depends (through zero or more intermediate TLUs) on that TLU’s output. (Networks that are not feedforward are called recurrent networks). If the TLUs of a feedforward network are arranged in layers, with the elements of layer $j$ receiving inputs only from TLUs in layer $j-1$, then we say that the network is a layered, feedforward network. The network shown in Fig. 4.10 is a layered, feedforward network having two layers (of weights). (Some people count the layers of TLUs and include the inputs as a layer also; they would call this network a three-layer network.) In general, a feedforward, layered network has the structure shown in Fig. 4.11. All of the TLUs except the “output” units are called hidden units (they are “hidden” from the output).
Implementing DNF Functions by Two-Layer Networks
We have already defined $k$-term DNF functions-they are DNF functions having $k$ terms. A $k$-term DNF function can be implemented by a two-layer network with $k$ units in the hidden layer-to implement the $k$ terms-and one output unit to implement the disjunction of these terms. Since any Boolean function has a DNF form, any Boolean function can be implemented by some two-layer network of TLUs. As an example, consider the function $f=x_1 x_2+x_2 \overline{x_3}+x_1 \overline{x_3}$. The form of the network that implements this function is shown in Fig. 4.12. (We leave it to the reader to calculate appropriate values of weights and thresholds.) The 3-cube representation of the function is shown in Fig. 4.13. The network of Fig. 4.12 can be designed so that each hidden unit implements one of the planar boundaries shown in Fig. 4.13.

