Posted on Categories:Optimization Theory, 优化理论, 优化理论代写, 数学代写, 机器学习代写, 机器学习代考

# 数学代写|机器学习中的优化理论代写Optimization for Machine Learning代考|Steepest Descent Direction

avatest™

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 数学代写|机器学习中的优化理论代写Optimization for Machine Learning代考|Steepest Descent Direction

The Taylor expansion (7) computes an affine approximation of the function $f$ near $x$, since it can be written as
$$f(z)=T_x(z)+o(|x-z|) \quad \text { where } \quad T_x(z) \stackrel{\text { def. }}{=} f(x)+\langle\nabla f(x), z-x\rangle,$$
see Fig. 8. First order methods operate by locally replacing $f$ by $T_x$.
The gradient $\nabla f(x)$ should be understood as a direction along which the function increases. This means that to improve the value of the function, one should move in the direction $-\nabla f(x)$. Given some fixed $x$, let us look as the function $f$ along the 1-D half line
$$\tau \in \mathbb{R}^{+}=[0,+\infty[\longmapsto f(x-\tau \nabla f(x)) \in \mathbb{R}$$

If $f$ is differentiable at $x$, one has
$$f(x-\tau \nabla f(x))=f(x)-\tau\langle\nabla f(x), \nabla f(x)\rangle+o(\tau)=f(x)-\tau|\nabla f(x)|^2+o(\tau) .$$
So there are two possibility: either $\nabla f(x)=0$, in which case we are already at a minimum (possibly a local minimizer if the function is non-convex) or if $\tau$ is chosen small enough,
$$f(x-\tau \nabla f(x))<f(x)$$
which means that moving from $x$ to $x-\tau \nabla f(x)$ has improved the objective function.
Remark 2 (Orthogonality to level sets). The level sets of $f$ are the sets of point sharing the same value of $f$, i.e. for any $s \in \mathbb{R}$
$$\mathcal{L}_s \stackrel{\text { def. }}{=}{x ; f(x)=s} .$$

## 数学代写|机器学习中的优化理论代写Optimization for Machine Learning代考|Gradient Descent

The gradient descent algorithm reads, starting with some $x_0 \in \mathbb{R}^p$
$$x_{k+1} \stackrel{\text { def. }}{=} x_k-\tau_k \nabla f\left(x_k\right)$$
where $\tau_k>0$ is the step size (also called learning rate). For a small enough $\tau_k$, the previous discussion shows that the function $f$ is decaying through the iteration. So intuitively, to ensure convergence, $\tau_k$ should be chosen small enough, but not too small so that the algorithm is as fast as possible. In general, one use a fix step size $\tau_k=\tau$, or try to adapt $\tau_k$ at each iteration (see Fig. 9).

Remark 4 (Greedy choice). Although this is in general too costly to perform exactly, one can use a “greedy” choice, where the step size is optimal at each iteration, i.e.
$$\tau_k \stackrel{\text { def. }}{=} \underset{\tau}{\operatorname{argmin}} h(\tau) \stackrel{\text { def. }}{=} f\left(x_k-\tau \nabla f\left(x_k\right)\right) .$$
Here $h(\tau)$ is a function of a single variable. One can compute the derivative of $h$ as
$$h(\tau+\delta)=f\left(x_k-\tau \nabla f\left(x_k\right)-\delta \nabla f\left(x_k\right)\right)=f\left(x_k-\tau \nabla f\left(x_k\right)\right)-\left\langle\nabla f\left(x_k-\tau \nabla f\left(x_k\right)\right), \nabla f\left(x_k\right)\right\rangle+o(\delta) .$$
One note that at $\tau=\tau_k, \nabla f\left(x_k-\tau \nabla f\left(x_k\right)\right)=\nabla f\left(x_{k+1}\right)$ by definition of $x_{k+1}$ in (13). Such an optimal $\tau=\tau_k$ is thus characterized by
$$h^{\prime}\left(\tau_k\right)=-\left\langle\nabla f\left(x_k\right), \nabla f\left(x_{k+1}\right)\right\rangle=0 .$$

## 数学代写|机器学习中的优化理论代写Optimization for Machine Learning代 考|Steepest Descent Direction

$$f(z)=T_x(z)+o(|x-z|) \quad \text { where } \quad T_x(z) \stackrel{\text { def. }}{=} f(x)+\langle\nabla f(x), z-x\rangle,$$

$$\tau \in \mathbb{R}^{+}=[0,+\infty[\longmapsto f(x-\tau \nabla f(x)) \in \mathbb{R}$$

$$f(x-\tau \nabla f(x))=f(x)-\tau\langle\nabla f(x), \nabla f(x)\rangle+o(\tau)=f(x)-\tau|\nabla f(x)|^2+o(\tau) .$$

## 数学代写|机器学习中的优化理论代写Optimization for Machine Learning代考|Gradient Descent

$$f(x-\tau \nabla f(x))s \stackrel{\text { def. }}{=} x ; f(x)=s .$$ # to |Gradient Descent 梯度下降算法读取，从一些开始 $x_0 \in \mathbb{R}^p$ $$x{k+1} \stackrel{\text { def. }}{=} x_k-\tau_k \nabla f\left(x_k\right)$$ 在哪里 $\tau_k>0$ 是步长 (也称为学习率) 。对于足够小的 $\tau_k$ ，前面的讨论表明函数 $f$ 通过迭代詚减。所以直觉上，为了确保收敛， $\tau_k$ 应该选择足够小，但又不能太小，以便算法层可能仜。一般来说，一个人使用固定的步长 $\tau_k=\tau$ ，或尝试适应 $\tau_k$ 在每次迭代中 (见图 9)。

$$\tau_k \stackrel{\text { def. }}{=} \underset{\tau}{\operatorname{argmin}} h(\tau) \stackrel{\text { def. }}{=} f\left(x_k-\tau \nabla f\left(x_k\right)\right) \text {. }$$

$$h(\tau+\delta)=f\left(x_k-\tau \nabla f\left(x_k\right)-\delta \nabla f\left(x_k\right)\right)=f\left(x_k-\tau \nabla f\left(x_k\right)\right)-\left\langle\nabla f\left(x_k-\tau \nabla f\left(x_k\right)\right), \nabla f\left(x_k\right)\right\rangle+o(\delta) .$$

$$h^{\prime}\left(\tau_k\right)=-\left\langle\nabla f\left(x_k\right), \nabla f\left(x_{k+1}\right)\right\rangle=0 .$$

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。