数据分析代写Data Analysis代考|Strongly Convex Case

数据科学代写|数据分析代写Data Analysis代考|Strongly Convex Case

Recall from (2.19) that the smooth function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is strongly convex with modulus $m$ if there is a scalar $m>0$ such that
$$f(z) \geq f(x)+\nabla f(x)^T(z \quad x)+\frac{m}{2}|z \quad x|^2 .$$
Strong convexity asserts that $f$ can be lower bounded by quadratic functions. These functions change from point to point, but only in the linear term. It also tells us that the curvature of the function is bounded away from zero. Note that if $f$ is strongly convex and $L$-smooth, then $f$ is bounded above and below by simple quadratics (see (2.9) and (2.19)). This “sandwiching” effect enables us to prove the linear convergence of the steepest-descent method.

The simplest strongly convex function is the squared Euclidean norm $|x|^2$. Any convex function can be perturbed to form a strongly convex function by adding any small positive multiple of the squared Euclidean norm. In fact, if $f$ is any $L$-smooth function, then
$$f_\mu(x)=f(x)+\mu|x|^2$$
is strongly convex for $\mu$ large enough. (Exercise: Prove this!)
As another canonical example, note that a quadratic function $f(x)=$ $\frac{1}{2} x^T Q x$ is strongly convex if and only if the smallest eigenvalue of $Q$ is strictly positive. We saw in Theorem 2.8 that a strongly convex $f$ has a unique minimizer, which we denote by $x^*$.

Strongly convex functions are, in essence, the “easiest” functions to optimize by first-order methods. First, the norm of the gradient provides useful information about how far away we are from optimality. Suppose we minimize both sides of the inequality (3.9) with respect to $z$. The minimizer on the lefthand side is clearly attained at $z=x^$, while on the right-hand side, it is attained at $x-\nabla f(x) / m$. By plugging these optimal values into (3.9), we obtain \begin{aligned} f\left(x^\right) & \geq f(x) \quad \nabla f(x)^T\left(\frac{1}{m} \nabla f(x)\right)+\frac{m}{2}\left|\frac{1}{m} \nabla f(x)\right|^2 \ & =f(x) \quad \frac{1}{2 m}|\nabla f(x)|^2 . \end{aligned}
By rearrangement, we obtain
$$|\nabla f(x)|^2 \geq 2 m\left[f(x) \quad f\left(x^\right)\right]$$ If $|\nabla f(x)|<\delta$, we have $$f(x) \quad f\left(x^\right) \leq \frac{|\nabla f(x)|^2}{2 m} \leq \frac{\delta^2}{2 m} .$$

数据科学代写|数据分析代写Data Analysis代考|Comparison between Rates

It is straightforward to convert these convergence expressions into complexities using the techniques of Appendix A.2. We have, from (3.7), that an iteration $k$ will be found such that $\left|\nabla f\left(x^k\right)\right| \leq \epsilon$ for some $k \leq T$, where
$$T \geq \frac{2 L\left(f\left(x^0\right) \quad f^\right)}{\epsilon^2}$$ For the general convex case, we have from (3.8) that $f\left(x^k\right) \quad f^ \leq \epsilon$ when
$$k \geq \frac{L\left|x^0 \quad x^\right|^2}{2 \epsilon}$$ For the strongly convex case, we have from (3.15) that $f\left(x^k\right)-f^ \leq \epsilon$ for all $k$ satisfying
$$k \geq \frac{L}{m} \log \left(\left(f\left(x^0\right) \quad f^*\right) / \epsilon\right)$$

Note that in all three cases, we can get bounds in terms of the initial distance to optimality $\left|x^0 \quad x^\right|$ rather than the initial optimality gap $f\left(x^0\right) \quad f^$ by using the inequality
$$f\left(x^0\right) \quad f^* \leq \frac{L}{2}\left|x^0 \quad x^*\right|^2 .$$
The linear rate (3.17) depends only logarithmically on $\epsilon$, whereas the sublinear rates depend on $1 / \epsilon$ or $1 / \epsilon^2$. When $\epsilon$ is small (for example, $\epsilon=$ $10^{-6}$ ), the linear rate would appear to be dramatically faster, and, indeed, this is usually the case. The only exception would be when $m$ is extremely small, so that $m / L$ is of the same order as $\epsilon$. The problem is extremely ill conditioned in this case, and there is little difference between the linear rate (3.17) and the sublinear rate (3.16).

All of these bounds depend on knowledge of $L$. What happens when we do not know $L$ ? Even when we do know it, is the steplength $\alpha_k \equiv 1 / L$ good in practice? We have reason to suspect not, since the inequality (3.5) on which it is based uses the conservative global upper bound $L$ on curvature. (A sharper bound could be obtained in terms of the curvature in the neighborhood of the current iterate $x^k$.) In the remainder of this chapter, we expand our view to more general choices of search directions and steplengths.

数据科学代写|数据分析代写Data Analysis代考|Strongly Convex Case

$$f(z) \geq f(x)+\nabla f(x)^T(z \quad x)+\frac{m}{2}|z \quad x|^2$$

$$f_\mu(x)=f(x)+\mu|x|^2$$

数据科学代写|数据分析代写Data Analysis代考|Comparison between Rates

$$k \geq \frac{L}{m} \log \left(\left(f\left(x^0\right) \quad f^\right) / \epsilon\right)$$ 请注意，在所有这三种情况下，我们都可以根据与最优性的初始距离获得界限 缺少 \left 或额外的 \right 而不是最初的最优性差距缺少上标或下标参数 通 过使用不等式 $$f\left(x^0\right) \quad f^ \leq \frac{L}{2}\left|x^0 \quad x^*\right|^2$$

