Assignment 4 — Inverse Kinematics

Due Thursday, 10/26 at 23:59:59. You must work individually.

Goal

In this assignment, you will be implementing inverse kinematics with your own nonlinear optimizers:

GD: Gradient Descent
BFGS: Broyden–Fletcher–Goldfarb–Shanno
NM: Newton’s Method

The first part of the assignment (Tasks A.\(1\)-A.\(5\)) will be on writing the optimizers and testing them with the Rosenbrock function. The second part of the assignment (Tasks B.\(1\)-B.\(5\)) will be on using these optimizers on inverse kinematics.

For IK (Task B), grades will be given out depending on how far you get along in the assignment. For example, if your program works with \(4\) links, then you will get full points for the \(1\)- and \(2\)-link tasks. Here is an example with \(10\) links:

You may work in C++ or Python.

C++
- The skeleton code uses Eigen for linear algebra. You’ll have to install Eigen in the same way you installed GLM in lab 0. You’ll need to create a new environment variable called EIGEN3_INCLUDE_DIR that points to the Eigen directory. Once you have it installed, you may want to look at this quick tutorial.
- The command line arguments should be as follows.
  - To run tasks A.1-A.5 (Rosenbrock):
```
> ./A4 <RESOURCE_DIR> A.x
```
    The last argument can be A.1 through A.5.
  - To run tasks B.1-B.5 (IK):
```
> ./A4 <RESOURCE_DIR> B.x
```
    The last argument can be B.1 through B.5.
  - Optionally, you can run the built-in OpenGL visualizer for IK by supplying an additional argument at the end:
```
> ./A4 <RESOURCE_DIR> B.x 1
```
- Use double instead of float.
Python
- There is no skeleton code provided, but it might be helpful to look at the C++ skeleton code to get a sense of how to organize the code. In particular, it is very important to be able to reuse the optimizer code for both the Rosenbrock task and the IK task.
- The command line arguments should be as follows.
  - To run tasks A.1-A.5 (Rosenbrock)
```
> python A4.py <RESOURCE_DIR> A.x
```
    The last argument can be A.1 through A.5.
  - To run tasks B.1-B.5 (IK)
```
> python A4.py <RESOURCE_DIR> B.x
```
    The last argument can be B.1 through B.5.
- Use Python’s built-in float type, which has 64 bits of precision.

Task A.\(1\): Rosenbrock Function

We will be using the Rosenbrock function as the objective: \[ \displaylines{ f(\vec{x}) = (a-x_0)^2 + b(x_1-x_0^2)^2\\ \vec{g}(\vec{x}) = 2 \begin{pmatrix} -(a-x_0) - 2bx_0(x_1-x_0^2)\\ b(x_1-x_0^2) \end{pmatrix}\\ H(\vec{x}) = 2 \begin{pmatrix} 2b(3x_0^2 - x_1) + 1 & -2bx_0\\ -2bx_0 & b \end{pmatrix} } \] with \(a=1\) and \(b=1\). The minimum is at \(\vec{x} = (1 \;\; 1)^\top\).

There are three eval() methods in the class ObjectiveRosenbrock. The first one returns just the objective value, the second one returns the objective and the gradient, and the third one returns the objective, gradient, and Hessian. Fill these methods to compute the expression given above. With the input argument \(\vec{x} = (-1 \;\; 0)^\top\), the returned values should be: \[ \displaylines{ f(\vec{x}) = 5\\ \vec{g}(\vec{x}) = \begin{pmatrix} -8\\ -2 \end{pmatrix}\\ H(\vec{x}) = \begin{pmatrix} 14 & 4\\ 4 & 2 \end{pmatrix}. } \] Running A4 <RESOURCE_DIR> A.1 should write these values to resources/outputA1.txt:

The first line is for \(f(\vec{x})\), the next two lines are for \(\vec{g}(\vec{x})\), and the last two lines are for \(H(\vec{x})\). With Eigen, the vector and matrix classes have operator<< defined, so you can simply write output << x << endl; to write the variable x to the stream output.

Task A.\(2\): Gradient Descent

Implement the following pseudocode in OptimizerGD.cpp:

for iter = 1 : iterMax
    % Search direction
    Evaluate f and g at x
    p = -g
    % Step
    Δx = α*p
    x += Δx
    if |g| < ε
        break
    end
end

Use \(\varepsilon=10^{-6}\), and \(\alpha=10^{-1}\). Start the search at \(\vec{x} = (-1 \;\; 0)^\top\). After one iteration, \(\vec{x}\) should be \((-0.2 \;\; 0.2)^\top\). Now set iterMax=50. Running the command A4 <RESOURCE_DIR> A.2 should write the number of iterations and the computed \(\vec{x}\) to resources/outputA2.txt:

50
0.943086
0.864814

(The number of iterations may be off by \(1\) depending on \(0\)-indexing or \(1\)-indexing is used.)

The choice of \(\alpha\) matters a lot. Setting it to be \(1.0\), for example, makes the optimizer diverge. We thus turn our attention next to line search.

Task A.\(3\): Line Search

Modify OptimizerGD.cpp to implement the following pseudocode:

for iter = 1 : iterMax
    % Search direction
    Evaluate f and g at x
    p = -g
    % Line search
    α = α0
    for iterLS = 1 : iterMaxLS
        Δx = α*p
        Evaluate f1 at x + Δx
        if f1 < f
            break
        end
        α *= γ
    end
    % Step
    x += Δx
    if |g| < ε
        break
    end
end

Use iterMax=50, \(\varepsilon=10^{-6}\), \(\alpha_{0}=1\), \(\gamma=0.8\), and iterMaxLS=20. Start the search at \(\vec{x} = (-1 \;\; 0)^\top\). Running A4 <RESOURCE_DIR> A.3 should write the number of outer iterations and the computed \(\vec{x}\) to resources/outputA3.txt:

50
0.990237
0.981615

(The number of iterations may be off by \(1\) depending on \(0\)-indexing or \(1\)-indexing is used.)

Gradient descent is slow to converge, especially near the solution. We thus turn to Newton’s Method next.

Task A.\(4\): Newton’s Method

Implement the following pseudocode in OptimizerNM.cpp:

for iter = 1 : iterMax
    % Search direction
    Evaluate f, g, and H at x
    p = -H\g
    % Line search
    α = α0
    for iterLS = 1 : iterMaxLS
        Δx = α*p
        Evaluate f1 at x + Δx
        if f1 < f
            break
        end
        α *= γ
    end
    % Step
    x += Δx
    if |g| < ε
        break
    end
end

The notation x = A\b is the MATLAB notation for solving the linear system \(Ax=b\). It is important to use an appropriate numerical solver for this operation. (Do not use the inverse!) Assuming that Hessian is symmetric positive definite (nice and convex), we can use Eigen’s LDLT solver: x = A.ldlt().solve(b).

Use iterMax=50, \(\varepsilon=10^{-6}\), \(\alpha_{0}=1\), \(\gamma=0.8\), and iterMaxLS=20. Start the search at \(\vec{x} = (-1 \;\; 0)^\top\). Running A4 <RESOURCE_DIR> A.4 should write the number of outer iterations and the computed \(\vec{x}\) to resources/outputA4.txt:

8
1.000000
1.000000

(The number of iterations may be off by \(1\) depending on \(0\)-indexing or \(1\)-indexing is used.)

Newton’s Method is fast to converge, especially near the solution. However, forming the Hessian matrix and solving with it can become the bottleneck. We thus turn to BFGS next.

Task A.\(5\): BFGS

Implement the following pseudocode in OptimizerBFGS.cpp:

A = I
x0 = 0
g0 = 0
for iter = 1 : iterMax
    % Search direction
    Evaluate f and g at x
    if iter > 1
        s = x - x0
        y = g - g0
        ρ = 1/(y'*s)
        A = (I - ρ*s*y')*A*(I - ρ*y*s') + ρ*s*s'
    end
    p = -A*g
    % Line search
    α = α0
    for iterLS = 1 : iterMaxLS
        Δx = α*p
        Evaluate f1 at x + Δx
        if f1 < f
            break
        end
        α *= γ
    end
    % Step
    x0 = x
    g0 = g
    x += Δx
    if |g| < ε
        break
    end
end

Use iterMax=50, \(\varepsilon=10^{-6}\), \(\alpha_{0}=1\), \(\gamma=0.8\), and iterMaxLS=20. Start the search at \(\vec{x} = (-1 \;\; 0)^\top\). Running A4 <RESOURCE_DIR> A.5 should write the number of outer iterations and the computed \(\vec{x}\) to resources/outputA5.txt:

12
1.000000
1.000000

(The number of iterations may be off by \(1\) depending on \(0\)-indexing or \(1\)-indexing is used.)

From this point on, do not modify your optimizers. The goal of the next set of tasks is all about coming up with a new objective function for IK.

Task B.\(1\): \(1\)-link IK

Take a look at the Link class. It has the following member variables:

A pointer to the parent link.
A pointer to the child link. For this assignment, we will assume that we have a serial chain of links without any branching.
An int for the “depth” of the link, with the root having index \(0\).
A Vector2d representing the XY position \(\vec{t}\) of the link joint wrt the parent joint. For this assignment, we will assume that \(\vec{t} = (1 \;\; 0)^\top\) for all joints other than the root. The root is at the origin, so \(\vec{t} = (0 \;\; 0)^\top\).
A double for the current angle, \(\theta\). This is what IK will optimize. In other words, this is the \(\vec{x}\) that the optimizer takes in.
A Matrix4d, \(^i_{Mi}E\), for the transform of the mesh with respect to this link joint. This is only used for visualizing the mesh.

The length of each link is implicitly set to \(1\). Since this is a 2D planar mechanism, only a single scalar is needed for the rotation. For convenience, the origin of each link is going to be where the joint is. For example, for the “upper arm” link, the origin will be at the shoulder, and for the “lower arm” link the origin is going to be at the elbow. Using the position \(\vec{p}\) and the rotation \(\theta\) member variables, the transformation matrix from the parent link’s joint to this link’s joint is: \[ ^p_iE(\theta) = T(\vec{t}) R(\theta). \] The leading superscript \(p\) is for “parent” (not “position”), and the leading subscript \(i\) is for the ith body.

We will start slowly. First, we will solve inverse kinematics with only \(1\) joint using GD or BFGS. We will define the objective function \(f(\theta)\) and its gradient \(g(\theta)\), and then use the optimizers you wrote in the previous task. The Hessian \(H(\theta)\) takes a bit of extra work, so we will leave it for later.

Each link is of length \(1.0\), so place an end-effector \(\vec{r}\) in the local coordinate space of the link at \((1,0)\). Keep in mind that there should be a homogeneous coordinate of \(1\) for this point. In your code, it is a good idea to use Vector3d instead of Vector2d and explicitly store the homogeneous coordinate. Once \(\vec{p}\), \(\vec{p}'\), and \(\vec{p}''\) are computed, it is ok to drop the homogeneous coordinate from these quantities.

The world-space position, \(\vec{p}\), of the end-effector and its derivatives are: \[ \displaylines{ \vec{p}(\theta) = T R(\theta) \vec{r}\;\\ \vec{p}'(\theta) = T R'(\theta) \vec{r}\;\\ \vec{p}''(\theta) = T R''(\theta) \vec{r}, } \] where \(T\) is the usual 2D translation matrix: \[ T = \begin{pmatrix} 1 & 0 & \vec{t}_x\\ 0 & 1 & \vec{t}_y\\ 0 & 0 & 1 \end{pmatrix}, \] and \(R(\theta)\) is the usual rotation matrix about the Z-axis: \[ \displaylines{ R(\theta) = \begin{pmatrix} \cos(\theta) & -\sin(\theta) & 0\\ \sin(\theta) & \cos(\theta) & 0\\ 0 & 0 & 1 \end{pmatrix}\;\;\,\\ R'(\theta) = \begin{pmatrix} -\sin(\theta) & -\cos(\theta) & 0\\ \cos(\theta) & -\sin(\theta) & 0\\ 0 & 0 & 0 \end{pmatrix}\,\\ R''(\theta) = \begin{pmatrix} -\cos(\theta) & \sin(\theta) & 0\\ -\sin(\theta) & -\cos(\theta) & 0\\ 0 & 0 & 0 \end{pmatrix}. } \] The second derivative, \(\vec{p}''(\theta)\) and \(R''(\theta)\), will not be needed until we use Newton’s Method in Task B.5.

In the original rest state (figure above), the numerical values for the \(1\)-link system are (homogeneous coordinate not shown): \[ \displaylines{ \vec{p}(\theta) = \begin{pmatrix} 1 \\ 0 \end{pmatrix}\;\;\\ \vec{p}'(\theta) = \begin{pmatrix} 0 \\ 1 \end{pmatrix}\;\;\,\\ \vec{p}''(\theta) = \begin{pmatrix} -1 \\ 0 \end{pmatrix}. } \]

For Gradient Descent and BFGS, we need the objective function value \(f(\theta)\) and its gradient, \(\vec{g}(\theta)\). (Remember, we don’t need the Hessian, \(H(\theta),\) until we get to Newton’s Method.) The objective value is \[ \displaylines{ f(\theta) = \frac{1}{2} w_\text{tar} \Delta \vec{p}^\top \Delta \vec{p} + \frac{1}{2} w_\text{reg} \theta^2\\ \Delta \vec{p} = \vec{p}(\theta) - \vec{p}_\text{target}.\qquad\qquad\;\; } \]

The two weights, \(w_\text{tar}\) and \(w_\text{reg}\), can be set to arbitrary positive values, depending on the desired strength of the “targeting” energy and the “regularizer” energy.

The targeting energy causes the optimizer to minimize the squared distance between the end-effector and the target position.
The regularizer energy causes the optimizer to prefer small angles.

For now, you can use \(w_\text{tar} = 1e3\) and \(w_\text{reg} = 1e0\).

The gradient is a scalar for this \(1\)-link task: \[ g(\theta) = w_\text{tar} \Delta \vec{p}^\top \vec{p}' + w_\text{reg} \theta. \] The Hessian (not needed yet) is also a scalar for this \(1\)-link task: \[ H(\theta) = w_\text{tar} (\vec{p}'^\top \vec{p}' + \Delta \vec{p}^\top \vec{p}'') + w_\text{reg}. \] With the initial angle set to \(\theta = 0\), and with the goal position set to \(\vec{p}_\text{target} = (0 \;\; 1)^\top\), you should get the following values: \[ f(\theta) = 1000, \quad g(\theta) = -1000, \quad H(\theta) = 1. \]

Run BFGS with the following settings:

Weights
- Target weight: 1e3
- Regularizer weight: 1e0
BFGS
- Iterations: 5
- Tolerance: 1e-6
- Line search iterations: 20
- \(\alpha_\text{init}\): 1.0
- \(\gamma\): 0.5

The returned answer should be close to \(\theta = \pi/2\). (See Wrapping \(\theta\) below.) Write the final angle to resources/outputB1.txt:

1.56295

Note that Gradient Descent takes longer to converge. For example, it gives 1.49157 after 5 iterations.

User Interface

If you’re using C++, the skeleton code comes with a built-in visualizer. The code is set up so that pressing the space bar toggles between forward kinematics and inverse kinematics.

While in forward kinematics mode, you should be able to increase/decrease the joint angle(s) by pressing the . and , keys.
While in inverse kinematics mode, you should be able to move the mouse and drive the joint angles.
In either mode, pressing the r key should reset all the angles to 0.0.

Wrapping \(\theta\)

Since the minimization variables are joint angles, these variables may wrap around beyond (clockwise or counterclockwise) many times over. To prevent this, wrap the angles to be between \(-\pi\) and \(+\pi\) after each call to the optimizer with the following pseudocode. (Remember that you should not modify your optimizers after finishing Task A.)

while angle > pi
    angle -= 2*pi
while angle < -pi
    angle += 2*pi

Debugging the Derivatives

One of the biggest sources of bugs is the derivatives, so it is helpful to have a routine that tests them against finite differences. To test the gradient vector, use the following pseudocode inside the main loop of GD. (This testing code should be commented out once testing is finished.)

e = 1e-7;
g_ = zeros(n,1);
for i = 1 : n
    theta_ = theta;
    theta_(i) += e;
    f_ = objective(theta_);
    g_(i) = (f_ - f)/e;

This pseudocode works in any number of dimensions (including the scalar case \(n=1\)). The finite differenced version of the gradient, g_, should closely match the analytical version of the gradient, g (at least within 1e-3 or so).

Once the gradient has been tested, use the following pseudocode inside the main loop of NM to test the Hessian matrix. (This testing code should be commented out once testing is finished.)

e = 1e-7;
H_ = zeros(n, n);
for i = 1 : n
    theta_ = theta;
    theta_(i) += e;
    [f_,g_] = objective(theta_);
    H_(:,i) = (g_ - g)/e;

The H_(:,i) notation above means the ith column of the matrix. With Eigen, this can be written as H_.col(i) = .... The finite differenced version of the Hessian, H_, should closely match the analytical version of the Hessian, H.

Task B.\(2\): \(2\)-link IK

Next, we’re going to write a version that supports \(2\) links. If you’re feeling ambitious, you can skip to the \(4\)-link or the \(n\)-link tasks.

With two links, the minimization variable, \(\vec{\theta}\), is now a vector. The world-space position, \(\vec{p}\), of the end-effector is: \[ \vec{p}(\vec\theta) = T_0 R_0(\theta_0) T_1 R_1(\theta_1) \vec{r}. \] The derivatives are no longer scalars. For the gradient, compute the derivatives of \(\vec{p}\), drop the homogeneous coordinate, and then form a matrix \(P'\) of size \(2 \times 2\) (or \(1 \times 2\) block vector of \(2 \times 1\) vectors): \[ \displaylines{ \vec{p}'_0 = T_0 R'_0(\theta_0) T_1 R_1(\theta_1) \vec{r}\\ \vec{p}'_1 = T_0 R_0(\theta_0) T_1 R'_1(\theta_1) \vec{r}\\ P'(\vec\theta) = \begin{pmatrix} \vec{p}'_0 & \vec{p}'_1 \end{pmatrix} \in \mathbb{R}^{2 \times 2}.\qquad\; } \] Similarly, for the Hessian, compute the second derivatives of \(\vec{p}\), drop the homogeneous coordinate, and then form a matrix \(P''\) of size \(4 \times 2\) (or \(2 \times 2\) block matrix of \(2 \times 1\) vectors): \[ \displaylines{ \vec{p}''_{00} = T_0 R''_0(\theta_0) T_1 R_1(\theta_1) \vec{r}\\ \vec{p}''_{01} = T_0 R'_0(\theta_0) T_1 R'_1(\theta_1) \vec{r}\\ \vec{p}''_{10} = T_0 R'_0(\theta_0) T_1 R'_1(\theta_1) \vec{r}\\ \vec{p}''_{11} = T_0 R_0(\theta_0) T_1 R''_1(\theta_1) \vec{r}\\ P''(\vec\theta) = \begin{pmatrix} \vec{p}''_{00} & \vec{p}''_{01} \\ \vec{p}''_{10} & \vec{p}''_{11} \end{pmatrix} \in \mathbb{R}^{4 \times 2}.\;\;\, } \]

The objective function, \(f(\vec\theta)\), is still a scalar. In fact, the objective function for an optimization problem always returns a scalar. Compared to the \(1\)-link case, the only difference is that \(\vec\theta\), the deviation of the current joint angles from the rest joint angles, is now a vector. Therefore, instead of squaring this difference, we take the dot product by itself (i.e., transpose and multiply). The second term is also an inner product but with a regularization matrix sandwiched in between: \[ f(\vec\theta) = \frac{1}{2} w_\text{tar} \Delta \vec{p}^\top \Delta \vec{p} + \frac{1}{2} \vec\theta^\top W_\text{reg} \vec\theta, \] where \(W_\text{reg}\) is a diagonal matrix of regularization weights \[ W_\text{reg} = \begin{pmatrix} w_{\text{reg}_0} & 0 \\ 0 & w_{\text{reg}_1} \end{pmatrix}. \] For now, set \(w_{\text{reg}_0}\) and \(w_{\text{reg}_1}\) to 1e0.

Now the gradient is a \(2 \times 1\) vector: \[ \displaylines{ \vec{g}(\vec\theta) = w_\text{tar} \left( \Delta \vec{p}^\top P' \right)^\top + W_\text{reg} \vec\theta \qquad\qquad\quad\;\\ = w_\text{tar} \begin{pmatrix} \Delta \vec{p}^\top \vec{p}'_0\\ \Delta \vec{p}^\top \vec{p}'_1 \end{pmatrix} + W_\text{reg} \begin{pmatrix} \theta_0\\ \theta_1 \end{pmatrix}. } \] And the Hessian is a \(2 \times 2\) matrix: \[ \displaylines{ H(\vec\theta) = w_\text{tar} \left( P'^\top P' + \Delta \vec{p}^\top \odot P'' \right)^\top + W_\text{reg}\qquad\qquad\qquad\qquad\qquad\qquad\quad\\ = w_\text{tar} \left( \begin{pmatrix} \vec{p}_0^{'\top} \vec{p}'_0 & \vec{p}_0^{'\top} \vec{p}'_1\\ \vec{p}_1^{'\top} \vec{p}'_0 & \vec{p}_1^{'\top} \vec{p}'_1 \end{pmatrix} + \begin{pmatrix} \Delta \vec{p}^\top \vec{p}''_{00} & \Delta \vec{p}^\top \vec{p}''_{01}\\ \Delta \vec{p}^\top \vec{p}''_{10} & \Delta \vec{p}^\top \vec{p}''_{11} \end{pmatrix} \right) + W_\text{reg}. } \]

The \(\odot\) symbol above is used to mean that \(\Delta \vec{p}\) is dotted with each vector element of \(P''\).

With the initial angle set to \(\vec{\theta} = (0 \;\; 0)^\top\), and with the goal position set to \(\vec{p}_\text{target} = (1 \;\; 1)^\top\), you should get the following values: \[ \vec{p}(\vec{\theta}) = \begin{pmatrix} 2\\ 0 \end{pmatrix}, \quad P'(\vec{\theta}) = \begin{pmatrix} 0 & 0\\ 2 & 1\\ \end{pmatrix}, \quad P''(\vec{\theta}) = \begin{pmatrix} -2 & -1\\ 0 & 0\\ -1 & -1\\ 0 & 0 \end{pmatrix}, \quad f(\vec{\theta}) = 1000, \quad \vec{g}(\vec{\theta}) = \begin{pmatrix} -2000\\ -1000 \end{pmatrix}, \quad H(\vec{\theta}) = \begin{pmatrix} 2001 & 1000\\ 1000 & 1 \end{pmatrix}. \]

Run BFGS with the same settings as before:

Weights
- Target weight: 1e3
- Regularizer weights: 1e0 and 1e0
BFGS
- Iterations: 5
- Tolerance: 1e-6
- Line search iterations: 20
- \(\alpha_\text{init}\): 1.0
- \(\gamma\): 0.5

Write the final angles to resources/outputB2.txt. The output should be:

-0.000994772
1.54115

Task B.\(3\): \(4\)-link IK

In this task, you’re going to hard code a \(4\)-link system, which follows the same steps as in the \(2\)-link system. If you’re feeling ambitious, you can skip this task and go straight to the \(n\)-link system.

If you skipped \(2\)-link IK, make sure to read the Wrapping and Debugging sections.

The minimization variable is now \(\vec\theta = (\theta_0 \;\; \theta_1 \;\; \theta_2 \;\; \theta_3)^\top\). The world-space position, \(\vec{p}\), of the end-effector is: \[ \vec{p}(\vec\theta) = T_0 R_0(\theta_0) T_1 R_1(\theta_1) T_2 R_2(\theta_2) T_3 R_3(\theta_3) \vec{r}. \] For the gradient, compute the derivatives of \(\vec{p}\), drop the homogeneous coordinate, and then form a matrix \(P'\) of size \(2 \times 4\) (or \(1 \times 4\) block vector of \(2 \times 1\) vectors): \[ \displaylines{ \vec{p}'_0 = T_0 R'_0(\theta_0) T_1 R_1(\theta_1) T_2 R_2(\theta_2) T_3 R_3(\theta_3) \vec{r}\\ \vec{p}'_1 = T_0 R_0(\theta_0) T_1 R'_1(\theta_1) T_2 R_2(\theta_2) T_3 R_3(\theta_3) \vec{r}\\ \vec{p}'_2 = T_0 R_0(\theta_0) T_1 R_1(\theta_1) T_2 R'_2(\theta_2) T_3 R_3(\theta_3) \vec{r}\\ \vec{p}'_3 = T_0 R_0(\theta_0) T_1 R_1(\theta_1) T_2 R_2(\theta_2) T_3 R'_3(\theta_3) \vec{r}\\ P'(\vec\theta) = \begin{pmatrix} \vec{p}'_0 & \vec{p}'_1 & \vec{p}'_2 & \vec{p}'_3 \end{pmatrix}.\qquad\qquad\qquad\qquad\quad\;\; } \] Similarly, for the Hessian, compute the second derivatives of \(\vec{p}\), drop the homogeneous coordinate, and then form a matrix \(P''\) of size \(8 \times 4\) (or \(4 \times 4\) block matrix of \(2 \times 1\) vectors): \[ \displaylines{ \vec{p}''_{00} = T_0 R''_0(\theta_0) T_1 R_1(\theta_1) T_2 R_2(\theta_2) T_3 R_3(\theta_3) \vec{r}\\ \vec{p}''_{01} = T_0 R'_0(\theta_0) T_1 R'_1(\theta_1) T_2 R_2(\theta_2) T_3 R_3(\theta_3) \vec{r}\\ \vec{p}''_{02} = T_0 R'_0(\theta_0) T_1 R_1(\theta_1) T_2 R'_2(\theta_2) T_3 R_3(\theta_3) \vec{r}\\ \vdots\\ \vec{p}''_{32} = T_0 R_0(\theta_0) T_1 R_1(\theta_1) T_2 R'_2(\theta_2) T_3 R'_3(\theta_3) \vec{r}\\ \vec{p}''_{33} = T_0 R_0(\theta_0) T_1 R_1(\theta_1) T_2 R_2(\theta_2) T_3 R''_3(\theta_3) \vec{r}\\ P''(\vec\theta) = \begin{pmatrix} \vec{p}''_{00} & \vec{p}''_{01} & \vec{p}''_{02} & \vec{p}''_{03}\\ \vec{p}''_{10} & \vec{p}''_{11} & \vec{p}''_{12} & \vec{p}''_{13}\\ \vec{p}''_{20} & \vec{p}''_{21} & \vec{p}''_{22} & \vec{p}''_{23}\\ \vec{p}''_{30} & \vec{p}''_{31} & \vec{p}''_{32} & \vec{p}''_{33} \end{pmatrix}.\qquad\qquad\qquad\quad\, } \]

The scalar objective function is: \[ f(\vec\theta) = \frac{1}{2} w_\text{tar} \Delta \vec{p}^\top \Delta \vec{p} + \frac{1}{2} \vec\theta^\top W_\text{reg} \vec\theta. \] The gradient, expressed as a column vector is: \[ \displaylines{ \vec{g}(\vec\theta) = w_\text{tar} \left( \Delta \vec{p}^\top P' \right)^\top + W_\text{reg} \vec\theta \qquad\qquad\qquad\quad\,\\ = w_\text{tar} \begin{pmatrix} \Delta \vec{p}^\top \vec{p}'_0\\ \Delta \vec{p}^\top \vec{p}'_1\\ \Delta \vec{p}^\top \vec{p}'_2\\ \Delta \vec{p}^\top \vec{p}'_3 \end{pmatrix} + W_\text{reg} \begin{pmatrix} \theta_0\\ \theta_1\\ \theta_2\\ \theta_3 \end{pmatrix}. } \] And the Hessian is a matrix: \[ \displaylines{ H(\vec\theta) = w_\text{tar} \left( P'^\top P' + \Delta \vec{p}^\top \odot P'' \right)^\top + W_\text{reg}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\;\\ = w_\text{tar} \left( \begin{pmatrix} \vec{p}_0^{'\top} \vec{p}'_0 & \vec{p}_0^{'\top} \vec{p}'_1 & \vec{p}_0^{'\top} \vec{p}'_2 & \vec{p}_0^{'\top} \vec{p}'_3\\ \vec{p}_1^{'\top} \vec{p}'_0 & \vec{p}_1^{'\top} \vec{p}'_1 & \vec{p}_1^{'\top} \vec{p}'_2 & \vec{p}_1^{'\top} \vec{p}'_3\\ \vec{p}_2^{'\top} \vec{p}'_0 & \vec{p}_2^{'\top} \vec{p}'_1 & \vec{p}_2^{'\top} \vec{p}'_2 & \vec{p}_2^{'\top} \vec{p}'_3\\ \vec{p}_3^{'\top} \vec{p}'_0 & \vec{p}_3^{'\top} \vec{p}'_1 & \vec{p}_3^{'\top} \vec{p}'_2 & \vec{p}_3^{'\top} \vec{p}'_3 \end{pmatrix} + \begin{pmatrix} \Delta \vec{p}^\top \vec{p}''_{00} & \Delta \vec{p}^\top \vec{p}''_{01} & \Delta \vec{p}^\top \vec{p}''_{02} & \Delta \vec{p}^\top \vec{p}''_{03}\\ \Delta \vec{p}^\top \vec{p}''_{10} & \Delta \vec{p}^\top \vec{p}''_{11} & \Delta \vec{p}^\top \vec{p}''_{12} & \Delta \vec{p}^\top \vec{p}''_{13}\\ \Delta \vec{p}^\top \vec{p}''_{20} & \Delta \vec{p}^\top \vec{p}''_{21} & \Delta \vec{p}^\top \vec{p}''_{22} & \Delta \vec{p}^\top \vec{p}''_{23}\\ \Delta \vec{p}^\top \vec{p}''_{30} & \Delta \vec{p}^\top \vec{p}''_{31} & \Delta \vec{p}^\top \vec{p}''_{32} & \Delta \vec{p}^\top \vec{p}''_{33} \end{pmatrix} \right) + W_\text{reg}. } \]

The \(\odot\) symbol above is used to mean that \(\Delta \vec{p}\) is dotted with each vector element of \(P''\).

With the initial angle set to \(\vec{\theta} = (0 \;\; 0 \;\; 0 \;\; 0)^\top\), and with the goal position set to \(\vec{p}_\text{target} = (3 \;\; 1)^\top\), you should get the following values: \[ \displaylines{ \vec{p}(\vec{\theta}) = \begin{pmatrix} 4\\ 0 \end{pmatrix}, \quad P'(\vec{\theta}) = \begin{pmatrix} 0 & 0 & 0 & 0\\ 4 & 3 & 2 & 1 \end{pmatrix}, \quad P''(\vec{\theta}) = \begin{pmatrix} -4 & -3 & -2 & -1\\ 0 & 0 & 0 & 0\\ -3 & -3 & -2 & -1\\ 0 & 0 & 0 & 0\\ -2 & -2 & -2 & -1\\ 0 & 0 & 0 & 0\\ -1 & -1 & -1 & -1\\ 0 & 0 & 0 & 0 \end{pmatrix},\\ f(\vec{\theta}) = 1000, \quad \vec{g}(\vec{\theta}) = \begin{pmatrix} -4000\\ -3000\\ -2000\\ -1000 \end{pmatrix}, \quad H(\vec{\theta}) = \begin{pmatrix} 12000 & 9000 & 6000 & 3000\\ 9000 & 6001 & 4000 & 2000\\ 6000 & 4000 & 2001 & 1000\\ 3000 & 2000 & 1000 & 1 \end{pmatrix}. } \]

Run BFGS with the following settings. The only difference from before are the regularizer weights and the number of BFGS iterations.

Weights
- Target weight: 1e3
- Regularizer weight for the root joint: 0
- Regularizer weights for the other joints: 1e0
BFGS
- Iterations: 50
- Tolerance: 1e-6
- Line search iterations: 20
- \(\alpha_\text{init}\): 1.0
- \(\gamma\): 0.5

Write the final angles to resources/outputB3.txt. The output should be:

Task B.\(4\): \(n\)-link IK

Convert your code to support an arbitrary number of links. Note that as you increase \(n\), the code will get slower and slower. Make sure to run your code in release mode. It is recommended that you first work on GD/BFGS and then NM.

If you skipped \(2\)-link IK, make sure to read the Wrapping and Debugging sections.

Run your code with the following settings. The only difference from before is the number of BFGS iterations:

Number of links: 10
Target position: (9.0, 1.0)
Weights
- Target weight: 1e3
- Regularizer weight for root joint: 0
- Regularizer weights for other joints: 1e0
BFGS
- Iterations: 150
- Tolerance: 1e-6
- Line search iterations: 20
- \(\alpha_\text{init}\): 1.0
- \(\gamma\): 0.5

Your code should generate resources/outputB4.txt with the values for the 10 angles:

Task B.\(5\): \(n\)-link IK with Newton (bonus for 450, required for 689)

Implement the Hessian calculation. Then run 2 iterations of BFGS followed by NM:

Number of links: 10
Target position: (9.0, 1.0)
Weights
- Target weight: 1e3
- Regularizer weight for root joint: 0
- Regularizer weights for other joints: 1e0
Optimizer
- BFGS
  - Iterations: 2
  - Tolerance: 1e-6
  - Line search iterations: 20
  - \(\alpha_\text{init}\): 1.0
  - \(\gamma\): 0.5
- NM
  - Iterations: 30
  - Tolerance: 1e-6
  - Line search iterations: 20
  - \(\alpha_\text{init}\): 1.0
  - \(\gamma\): 0.5

Write the result to resources/outputB5.txt. This should produce the same result as the previous task but with fewer iterations. Observe that if we don’t start with BFGS, NM gets stuck in a local minimum.

The following plots compare the three optimizers. The left plot shows how the objective decreases, and the right plot shows how the norm of the gradient decreases.

Rubric

CSCE 450

[10] Task A.\(1\): Rosenbrock function
[10] Task A.\(2\): Gradient descent
[10] Task A.\(3\): Line search
[10] Task A.\(4\): Newton’s Method
[10] Task A.\(5\): BFGS
[10] Task B.\(1\): Hard-coded \(1\)-link IK
[10] Task B.\(2\): Hard-coded \(2\)-link IK
[10] Task B.\(3\): Hard-coded \(4\)-link IK
[15] Task B.\(4\): \(n\)-link IK
[5] Coding style and general execution
[10] BONUS Task B.\(5\): \(n\)-link IK with Newton

Total: 100 + 10 points

You can skip \(1\)-, \(2\)-, and \(4\)-link IK and go straight to \(n\)-link IK. Grades for lower tasks will be automatically given if the higher task is completed.

CSCE 689

[5] Task A.\(1\): Rosenbrock function
[5] Task A.\(2\): Gradient descent
[10] Task A.\(3\): Line search
[10] Task A.\(4\): Newton’s Method
[10] Task A.\(5\): BFGS
[10] Task B.\(1\): Hard-coded \(1\)-link IK
[10] Task B.\(2\): Hard-coded \(2\)-link IK
[10] Task B.\(3\): Hard-coded \(4\)-link IK
[15] Task B.\(4\): \(n\)-link IK
[10] Task B.\(5\): \(n\)-link IK with Newton
[5] Coding style and general execution

Total: 100 points

You can skip \(1\)-, \(2\)-, and \(4\)-link IK and go straight to \(n\)-link IK. Grades for lower tasks will be automatically given if the higher task is completed.

What to hand in

Failing to follow these points may decrease your “general execution” score.

Do not hand in any of the output files.

If you’re using Mac/Linux, make sure that your code compiles and runs by typing:

> mkdir build
> cd build
> cmake ..
> make
> ./A4 ARGUMENTS

If you’re using Windows, make sure your code builds using the steps described in Lab 0.

Put your name on the program window.
Include an ASCII README that includes
- Your name
- The highest stage you’ve completed
- Any bonus?
- Anything else of note
Make sure you don’t get any compiler warnings.
Remove unnecessary debug printouts.
Remove unnecessary debug code, including commented out code.
Do not hand in:
- The build directory
- The executable
- Old save files (*.~)
- Object files (*.o)
- Visual Studio files (.vs)
- Git folder (.git)
Create a single zip file of all the required files.
- The filename of this zip file should be UIN.zip (e.g., 12345678.zip).
- The zip file should extract a single top-level folder named UIN/ (e.g. 12345678/).
- This top-level folder should contain your README, src/, CMakeLists.txt, etc.
- Use the standard .zip format (not .gz, .7z, .rar, etc.).

Generated on Sat Oct 21 09:52:29 CDT 2023