Posted on Categories:CS代写, Machine Learning, 机器学习, 计算机代写

# 计算机代写|机器学习代写Machine Learning代考|The Widrow-Hoff Procedure

avatest™

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 计算机代写|机器学习代写Machine Learning代考|The Widrow-Hoff Procedure

The Widrow-Hoff procedure (also called the $L M S$ or the delta procedure) attempts to find weights that minimize a squared-error function between the

pattern labels and the dot product computed by a TLU. For this purpose, the pattern labels are assumed to be either +1 or -1 (instead of 1 or 0 ). The squared error for a pattern, $\mathbf{X}i$, with label $d_i$ (for desired output) is: $$\varepsilon_i=\left(d_i-\sum{j=1}^{n+1} x_{i j} w_j\right)^2$$
where $x_{i j}$ is the $j$-th component of $\mathbf{X}i$. The total squared error (over all patterns in a training set, $\Xi$, containing $m$ patterns) is then: $$\varepsilon=\sum{i=1}^m\left(d_i-\sum_{j=1}^{n+1} x_{i j} w_j\right)^2$$
We want to choose the weights $w_j$ to minimize this squared error. One way to find such a set of weights is to start with an arbitrary weight vector and move it along the negative gradient of $\varepsilon$ as a function of the weights. Since $\varepsilon$ is quadratic in the $w_j$, we know that it has a global minimum, and thus this steepest descent procedure is guaranteed to find the minimum. Each component of the gradient is the partial derivative of $\varepsilon$ with respect to one of the weights. One problem with taking the partial derivative of $\varepsilon$ is that $\varepsilon$ depends on all the input vectors in $\Xi$. Often, it is preferable to use an incremental procedure in which we try the TLU on just one element, $\mathbf{X}_i$,of $\Xi$ at a time, compute the gradient of the single-pattern squared error. $\varepsilon_i$, make the appropriate adjustment to the weights, and then try another member of $\Xi$. Of course, the results of the incremental version can only approximate those of the batch one, but the approximation is usually quite effective. We will be describing the incremental version here.
The $j$-th component of the gradient of the single-pattern error is:
$$\frac{\partial \varepsilon_i}{\partial w_j}=-2\left(d_i-\sum_{j=1}^{n+1} x_{i j} w_j\right) x_{i j}$$
An adjustment in the direction of the negative gradient would then change each weight as follows:
$$w_j \leftarrow w_j+c_i\left(d_i-f_i\right) x_{i j}$$
where $f_i=\sum_{j=1}^{n+1} x_{i j} w_j$, and $c_i$ governs the size of the adjustment. The entire weight vector (in a ugmented, or $\mathbf{V}$, notation) is thus adjusted according to the following rule:
$$\mathbf{V} \leftarrow \mathbf{V}+c_i\left(d_i-f_i\right) \mathbf{Y}_i$$
where, as before, $\mathbf{Y}_i$ is the $i$-th augmented pattern vector.

## 计算机代写|机器学习代写Machine Learning代考|Training a TLU on Non-Linearly-Separable Training Sets

When the training set is not linearly separable (perhaps because of noise or perhaps inherently), it may still be desired to find a “best” separating hyperplane. Typically, the error-correction procedures will not do well on non-linearly-separable training sets because they will continue to attempt to correct inevitable errors, and the hyperplane will never settle into an acceptable place.

Several methods have been proposed to deal with this case. First, we might use the Widrow-Hoff procedure, which (although it will not converge to zero error on non-linearly separable problems) will give us a weight vector that minimizes the mean-squared-error. A mean-squared-error criterion often gives unsatisfactory results, however, because it prefers many small errors to a few large ones. As an alternative, error correction with a continuous decrease toward zero of the value of the learning rate constant, $c$, will result in ever decreasing changes to the hyperplane. Duda [Duda, 1966] has suggested keeping track of the average value of the weight vector during error correction and using this average to give a separating hyperplane that performs reasonably well on non-linearly-separable problems. Gallant [Gallant, 1986] proposed what he called the “pocket algorithm.” As described in [Hertz, Krogh, \& Palmer, 1991, p. 160]:
. . the pocket algorithm .. consists simply in storing (or “putting in your pocket”) the set of weights which has had the longest unmodified run of successes so far. The algorithm is stopped after some chosen time $t$. . .
Introduction to Machine Learning $@ 1996$ Nils J. Nilsson. All rights reserved.

After stopping, the weights in the pocket are used as a set that should give a small number of errors on the training set. Error-correction proceeds as usual with the ordinary set of weights.

## 计算机代写|机器学习代写Machine Learning代考|The Widrow-Hoff Procedure

Widrow-Hoff过程(也称为$L M S$或delta过程)试图找到最小的平方误差函数的权重

$$w_j \leftarrow w_j+c_i\left(d_i-f_i\right) x_{i j}$$

$$\mathbf{V} \leftarrow \mathbf{V}+c_i\left(d_i-f_i\right) \mathbf{Y}_i$$

## 计算机代写|机器学习代写Machine Learning代考|Training a TLU on Non-Linearly-Separable Training Sets

……口袋算法…简单地说就是储存(或者“放进你的口袋”)到目前为止拥有最长时间未修改成功的一组权重。算法在选定的时间后停止$t$…

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。