1 minute read

Multiview Geometry

I’ve been learning about Multiview Geometry recently and the thing that hasn’t helped is starting from Hartley and Zisserman’s bible on the first go 😉.

Some things I’d wish I had gotten a stronger grasp on the coordinate systems and the homogeneous/ inhomogeneous representations of primitives.

As has been dealt with many-a-times, the camera matrix $\textbf{P}$ is simply a mapping of the world coordinate system into the camera coordinate system.

\[\mathbf{P} = \mathbf{K}[\mathbf{R}|\mathbf{t}]\]

It first comprises of the extrinsics \([\mathbf{R} \vert \mathbf{t}]\), which is a rototranslation.

Additionally there are the camera intrinsics \(\mathbf{K}\), which first handles the focal-length induced scaling (due to similar triangles) and then shifts the coordinate system to positive image coordinates (origin at top left corner of image).

I found that coding and visualising the objects (using something like NumPy and Open3D for visualisation) also forced me to think about the coordinate systems and was beneficial for understanding.

Epipolar Geometry

Simulation I made Two cameras

In this simulation, I generated several points, and then simulated the view of these points from the perspective of two pinhole cameras. We can find the Fundamental matrix of the two cameras using the image coordinates of the points in the two frames. After that, given the camera intrinsics, we can compute the essential matrix and subsequently decompose it to find the rototranslation from camera 1 to camera 2.

locations of points in camera 1 Result of forward projection into a single pinhole camera

This means that we have the relative pose of camera 2 to camera 1. By applying a triangulation operation, we are then able to back-project the locations of the original coordinates, up to a scale.

Code can be found here on GitHub.

Further resources

This is also the first article with mathjax :)

Updated: