Visual SLAM on Jetbots using Extended Kalman Filters

Sunset at Joshua Tree National Park. Shot on Mamiya C33 with Portra 400.

November, 2021

This is a project for CSE 276A - "Introduction to Robotics" at UC San Diego. It's taught by professor Henrik I. Christensen and is definitely one of the most challenging class I took that quarter. And like they said, "What doesn't kill you make you stronger". After surviving this quarter, I definitely learned a lot more.

1. The Objective

For this project, we implemented Simultaneous Localization and Mapping (SLAM) using Extended Kalman Filter (EKF). The only sensor data available is the camera module on the Jetbot differential drive robot. In the current setting, we are updating our estimation of values that we cannot directly measure (the robot position) from the measurement of values that we can observe (the landmarks). The landmarks are a set of 12 April Tags . To make things more challenging, only two types of tags are included, this means that we'll need to implement data association to figure out which of the same tags we are seeing now.

The layout of the landmark we uses (the number is the April Tag ID)

2. Formulation of the SLAM Problem

In this section, we are using the notations and algorithm depicted in the Robot Mapping course from University of Freiburg [1] .

At each timestep, given the robot control signal $u_{t}$ and the observations $z_{t}$ , we try to estimate the state $\mu_{t}$ consisting of the robot's pose in 2D, plus the x, y coordinates for each landmark:

\mu_{t} = \{ x_{robot}, y_{robot}, \theta_{robot}, x_{tag1}, y_{tag2}, ... x_{tagN}, y_{tagN} \}

3. Extended Kalman Filter

The EKF process can be broken down into initialization, prediction and the update step. The following pseudo code describes the EKF algoritm.

The pseudo code is from the slides of the course in [1]

3.1. Initialization

We initialize the robot's position to be at the origin in world coordinate (x, y, $\theta$ ) = (0, 0, 0). Initially all the landmarks were unobserved so we initialize them to (x, y) =(0, 0).

For the noise covariance matrix $\Sigma_{0}$ , since we have no uncertainty about the robot's initial position, we set the corresponding variance to 0. As for the unseen landmarks, the uncertainty is a very large number that represents infinity (we chose 1e6)

\Sigma_{0}= \begin{pmatrix} 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & ... & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & ... & \infty & 0 & 0 & 0 \\ 0 & 0 & 0 & ... & 0 & \infty & 0 & 0 \\ 0 & 0 & 0 & ... & 0 & 0 & \infty & 0 \\ 0 & 0 & 0 & ... & 0 & 0 & 0 & \infty \end{pmatrix}

As for the motion and measurement noise $LR_{t}$ and , $Q_{t}$ we simply set them to diagonal matrices (assuming independent noise) with arbitrary variance values that are reasonable.

3.2. Prediction

In the prediction step, we use the motion model to calculate the estimation of the robot's position. Here, we are using a velocity based model. The control signal consists of the linear velocity

v

and angular velocity

\omega

. The equations are:

x_{t+1} = x_{t-1} + -\frac{v_{t}}{\omega_{t}}\sin\theta + \frac{v_{t}}{\omega_{t}}\sin(\theta + \omega_{t}\Delta t) \\ y_{t+1} = y_{t-1} + \frac{v_{t}}{\omega_{t}}\cos\theta - \frac{v_{t}}{\omega_{t}}\cos(\theta + \omega_{t}\Delta t) \\ \theta_{t} = \theta_{t-1} + \omega_{t}\Delta t

Since the motion model is non-linear, we would have to use the function's Jacobian for the calculation of prediction uncertainty.

\overline{\Sigma}_{t} = G_{t}\Sigma_{t-1}G_{t}^T + R_{t}

Where

G_{t}=\begin{pmatrix} J_{t}^x & 0 \\ 0 & I \end{pmatrix}

And the Jacobian

J_{t}^x = I + \begin{pmatrix} 0 & 0 & -\frac{v_{t}}{\omega_{t}}\cos\theta + \frac{v_{t}}{\omega_{t}}\cos(\theta + \omega_{t}\Delta t) \\ 0 & 0 & -\frac{v_{t}}{\omega_{t}}\sin\theta + \frac{v_{t}}{\omega_{t}}\sin(\theta + \omega_{t}\Delta t) \\ 0 & 0 & 0 \end{pmatrix}

3.3. Update

The detected AprilTags' poses are first transformed into the robot's frame, then converted to observations in the format $(r, phi, i)$ . $i$ denotes the position of the tag in the state vector . It is obtained in the data association step, which we will go over in it's own section.

The estimation for the observations are calculated from estimated position of the robot:

\delta = \begin{pmatrix} \overline{\mu}_{i, x} - \overline{\mu}_{robot x} \\ \overline{\mu}_{i, y} - \overline{\mu}_{robot y} \end{pmatrix}

\hat{z}_{t}^i = \begin{pmatrix} \sqrt{\delta^T \delta} \\ \tan^{-1}(\delta_{y}, \delta_{x} - \overline{\mu}_{robot \theta}) \end{pmatrix}

If it is the first time we've observed a landmark, the covariance matrix at $i$ should have variance of $\infty$ (in other words, we have no estimation of the landmark's position). We simply make the observation our estimation.

Since the estimation calculation is also non-linear, we need the jacobian to calculate the Kalman Gain (the matrix $H_{t}$ in step 4. of the algorithm)

H_{t}^i= J_{obs}^i F_{x, i}

Where

J_{obs}^i = \frac{1}{q} \begin{pmatrix} -\sqrt{q} \delta_{x} & -\sqrt{q} \delta_{y} & 0 & \sqrt{q} \delta_{x} & \sqrt{q} \delta_{y} \\ \delta_{y} & -\delta_{x} & -q & \delta_{y} & \delta_{x} \end{pmatrix}

q = \delta^T \delta

And $F_{x, i}$ is a matrix that maps the observation back into the vector space

F_{x, i} = \begin{pmatrix} 1 & 0 & 0 & 0 & ... & 0 & 0 & ... & 0 \\ 0 & 1 & 0 & 0 & ... & 0 & 0 & ... & 0 \\ 0 & 0 & 1 & 0 & ... & 0 & 0 & ... & 0 \\ 0 & 0 & 0 & 0 & ... & 1 & 0 & ... & 0 \\ 0 & 0 & 0 & 0 & ... & 0 & 1 & ... & 0 \\ \end{pmatrix}

The position where to put the $1$ on the 4th and 5th row of the matrix $F_{x, i}$ is $(2*i + 3, 2*i+4)$ respectively (recall that $i$ is the index of the landmark in the state vector.)

Plugging these in the algorithm, we can iteratively update $\mu$ and $\Sigma$ to get a better estimation.

4. Data Association

As mentioned earlier, since there are multiple landmarks with the same April Tags patterns, we need to perform data association to determine which estimation we are mapping the obeservation to.

At each timestep, when we recieved a list of detected Apriltags, we scan through the position of the seen tags. If the estimated position of a tag is close enough to the tag we are observing, we *assume that they are the same tag*. The seen tags are kept in different lists according to their Apriltag ID to further reduce the posibility of confusion and for landmark detection.

The criteria of "close enough" is simply the euclidian distance. Since the x, y position is assumed to be independent, we didn't implement the Mahalanobis distance.

When implementing data association, we kept running into problems where landmarks cannot be effective distinguished. As a we to combat this, we ended up with an implementation that assumes we know the number of each tags. And when we have the correct number of a tag_id, any subsequent tag is associated with one of the seen tags that has the lowest distance.

5. Results

For the experiments, we planned two movement patterns for the Jetbot, and let it draw out the map using the SLAM slgorithm we implemented.

5.1. Single Circular Path

In this experiment, the robot goes around in a circle in the center of the map for one time. For our circular motion, we went with a very naive control by fixing the linear and angular velocity.

The following image is a plot of the resulting map after exactly one full circle (the blue 'x' denotes tag_id=0 and green 'x' have tag_id=42 )

We can see that after one iteration and seeing all the tags, we can come up with a map that represent the correct relative positions of the robot starting point and the landmarks. However, due to the noise in driving and measuring, the exact position of the landmarks are off a couple of centimeters.

5.2. Multiple Circular Path

The following image is a plot of the resulting map after looping three times:

Surprisingly, the result was worse despite the use of Kalman Filters. We suspect that the reason is due to our naive driving model. We have observed that the circle trajectory of the robot has the tendency to gradually drift to a certain direction (in this particular case, it has shifted in the y direction). This introduced a structural noise into the system, which cannot be dealt with by Kalman Filters. That is why as iteration increases, the noise build up and pushes the measurement further away from their true position.

5.3. Figure 8 Path

For the figure-8 motion, we also went with a very simple motion, where we go in a circle using the control described in section 2. and turn once we think we're back to the origin. The following image is the result of the figure-8 drive.

As expected, the generated maps have more errors. Again we suspect that this is due to the non-random noise in our driving.

6. Results & Discussion

In this assignment, we implemented SLAM using Apritags and EKF. While we did manage to successfully come up with several maps, the results were generally quite inconsistent. Due to the occational burst of inaccurate readings from the Apriltag detections, we could have very bad initializtion of the landmark positions. This could cause the robot to get lost completely.

Furthermore, the naive, open loop motion model proves to limit the performance as well. A more sophisticated motion model is needed for a more consistent result.

7. References

[1] The lecture slides of Cyrill Stachniss on EKF SLAM

< Home.