Consider the function
f:R^(n)->R
defined by
f(x)=10(x_(1)-1)^(2) \sum_(i=2)^n x_(i)^(2)
The optimal minimer of
f
is
x^(**)=(1,0,0,dots,0)
. Show that the GD algorithm with an arbitrary starting point
x_(0)
and suitable step-size (choose the step-size based on smoothness), satisfies the following: After
k
iterations, we have
||x_(k)-x^(**)||_(2)^(2)<=||x_(0)-x^(**)||_(2)^(2)(1 c)^(-k)
for some universal constant
c>0
(it doesn't matter what constant
c
you get to receive fullcredit). In other words, the distance of the
k
'th iterate to the optimal decreases exponentially in the number of iterations. [4 points] [Hint: Show that the function is smooth and see if you can show a complementary inequality:
f(x)-f(x^(**))>=c||x-x^(**)||_(2)^(2)
for some constant
c
. Now, use this with the main inequality that we used in the analysis of GD (the one we obtained by combining smoothness upperbound and convexity). You can use these to get a multiplicative decrease in the distance to optimum at every step.]