Course Note: Tübingen-CVML-Image formation 1

Image Formation 1: Primitives and Transformations

Credits: Tübingen Machine Learning | Computer Vision - Andreas Geiger
Computer Vision - Lecture 2.1 (Image Formation: Primitives and Transformations) - YouTube

1.1 Primitives and Transformations

2D points

2D points can be written in inhomogeneous coordinates as

x=(xy)R2

or in homogeneous coordinates as

x~=(x~y~w~)P2

where p2=R3/(0,0,0) is called projective space.

An inhomogeneous vector x is converted to a homogeneous vector x as followed:

x~=(x~y~w~)=(xy1)=(x1)=x¯

with augmented vector x¯. To convert in the opposite direction, we divide by w~.

x¯=(x1)=(xy1)=1w~x~=1w~(x~y~w~)=(x~/w~y~/w~1)

Homogeneous points whose last element is w~=0 are called ideal points or points at infinity. These points can't be represented with inhomogeneous coordinates.

2D lines

2D lines can also be expressed using homogeneous coordinates l~=(a,b,c):

{x¯|l~x¯=0}{x,y|ax+by+c=0}

We can normalize l~ so that l~=(nx,ny,d)=(n,d) with n2=1. In this case, n is the normal vector perpendicular to the line and d is its distance to the origin.

An exception is the line at infinity l~=(0,0,1) which passes through all ideal points.

Cross Product

Cross product expressed as the product of a skew-symmetric matrix and a vector:

a×b=[a]×b=[0a3a2a30a1a210](b1b2b3)=(a2b3a3b2a3b1a1b3a1b2a2b1)

Remark: Squared brackets are matrices.

2D Line Arithmetic

In homogeneous coordinates, the intersection of two lines is given by:

x~=l~1×l~2

Similarly, the line joining two points can be compactly written as:

l~=x¯1×x¯2

The symbol × denotes the cross product.

2D Conics

More complex algebraic objects can be represented using polynomial homogeneous equations. For example, conic sections (arising as the intersection of a plane and a3D cone) can be written using quadric equations:

{x¯|x¯Qx¯=0}

3D Points

3D points can be written in inhomogeneous coordinates as

x=(xyz)R3

or in homogeneous coordinates as

x~=(x~y~z~w~)P3

with Projective space P3=R4/(0,0,0,0).

3D Planes

3D planes can also be represented as homogeneous coordinates m~=(a,b,c,d):

{x¯|m~x¯=0}{x,y,z|ax+by+cz+d=0}

Again, we can normalize m~ so that m~=(nx,ny,nz,d)=(n,d) with n2=1. In this case, n is the normal perpendicular to the plane and d is its distance to the origin.

An exception is the plane at infinity m~=(0,0,0,1) which passes through all ideal points (=points at infinity) for which w~=0.

3D lines

3D lines are less elegant than either 2D lines or 3D planes. One possible representation is to express points on a line as a linear combination of two points p and q on the line:

{x|x=(1λ)p+λqλR}

However, this representation uses 6 parameters for 4 degrees of freedom.
Alternative minimal representations are the two-plane parameterization or Pluecker coordinates. See Szeliski, Chapter 2.1.

3D Quadrics

The 3D analog of 2D conics is a quadric surface:

{x¯|x¯Qx¯=0}

Useful in the study of multi-view geometry. Also serves as useful modeling primitives (spheres, ellipsoids, cylinders).

2D Transformations

Translation: (2D Translation of the Input, 2 DF)

x=x+tx¯=[It01]x¯

Euclidean: (2D Translation + 2D Rotation, 3 DF)

x=Rx+tx¯=[Rt01]x¯

Affine: (2D Linear Transformation, 6 DF)

x=Ax+tx¯=[At01]x¯

Perspective: (Homography, 8DF)

x~=H~x~(x¯=1w~x~)

2D Transformations on Co-Vectors

Considering any perspective 2D transformation

x~=H~x~

the transformed 2D line equation is given by:

l~x~=l~H~x~=(H~l~)x~=l~x~=0

Therefore, we have:

l~=H~l~

Thus, the action of a projective transformation on a co-vector such as a 2D line or 3D normal can be represented by the transposed inverse of the matrix.

Overview of 2D Transformation

Transformation Matrix DF Preserves
translation [It]2×3 2 orientation
rigid [Rt]2×3 3 lengths
similarity [sRt]2×3 4 angles
affine [A]2×3 6 parallelism
projective [H~]3×3 8 straight lines

Overview of 3D Transformation

Transformation Matrix DF Preserves
translation [It]3×4 3 orientation
rigid [Rt]3×4 6 lengths
similarity [sRt]3×4 7 angles
affine [A]3×4 12 parallelism
projective [H~]4×4 15 straight lines

Direct Linear Transform for Homography Estimation

Q: How can we estimate a homography from a set of 2D correspondences?
Let X={xi~,xi~}i=1N denote a set of N 2D-to-2D correspondences related by xi~=H~xi~. As the correspondence vectors are homogeneous, they have the same direction but differ in magnitude. Thus, the equation above can be expressed asxi~×H~xi~=0.
Using hk~ to denote the k'th row of H~, this can be rewritten as a linear equation in h~:

[0wi~xi~y~xi~w~xi~0x~xi~y~xi~x~xi~0]Ai[h1~h2~h3~]h~=0

Each point correspondence yields two equations. Stacking all equations int a 2N×9 dimensional matrix A leads to the following constrained least squares problem:

h~=argminh~ Ah~22+λ(h~221)=argminh~ h~AAh~+λ(h~h~1)

where we have fixed h~22=1 as H~ is homogeneous (i.e., defined only up to scale) and the trivial solution to h~=0 is not of interest. The solution to the above optimization problem is the singular vector corresponding to the smallest singular value of A (i.e., the last column of V when decomposing A=UDV, see also Deep Learning lecture 11.2). The resulting algorithm is called Direct Linear Transformation.

1.2 Geometric Image Formation

Origins of the Pinhole Camera

Projection Models

Orthographic Projection

Orthographic projection of a 3D point xcR3 to pixel coordinates xsR3

An orthographic projection simply drops the z component of the 3D point in camera coordinates xc to obtain the corresponding 2D point on the image plane (=screen) xs.

xs=[100010]xcxs¯[100001000001]xc¯

Orthography is exact for telecentric lenses and an approximation for telephoto lenses. After projection, the distance of the 3D point from the image can't be recovered.

We usually scale orthography:

xs=[s000s0]xcxs¯[s0000s00000s]xc¯

Here, the unit for s is px/m or px/mm to convert metric 3D points into pixels.

Perspective Projection

Perspective projection of a 3D point xcR3 to pixel coordinates xsR3

In perspective projection, 3D points in camera coordinates are mapped to the image plane by dividing their z component and multiplying with the focal length:

(xsys)=(fxc/zcfyc/zc)x~s=[f0000f000010]x¯c

Note that this projection is linear when using homogeneous coordinates. After the projection, it is not possible to recover the distance of the 3D point from the image.

The complete perspective projection model is given by:

(xsys)=(fxc/zc+sxc/zc+cxfyc/zc+cy)x~s=[fxscx00fycy00010]x¯c

Chaining Transformations

Let K be the calibration matrix (intrinsics) and [R|t] the camera pose (extrinsics).
We chain both transformations to project a point in world coordinates to the image:

x~s=[K0]x¯c=[K0][K001]x¯w=K[Rt]x¯w=Px¯w

Full Rank Representation

It is sometimes preferable to use a full rank 4×4 projection matrix:

x~s=[K001][Rt01]x¯w=Px¯w

Now, the homogeneous vector x~s is a 4D vector and must be normalized wrt. Its 3rd entry to obtain inhomogeneous image pixels:

x¯s=x~s/zs=(xs/zs,ys/zs,1,1/zs)

Note that the 4th component of the inhomogeneous 4D vector is the inverse depth. If the inverse depth is known, a 3D point can be retrieved from its pixel coordinates via x~w=P~1x¯w and subsequent normalization of x~w wrt. its 4th entry.

Lens Distortion

The assumption of linear projection is violated in practice due to the properties of the camera lens which introduces distortions. Both radial and tangential distortion effects can be modeled relatively easily: Let x=xc/zc,y=yc/zc and r2=x2+y2. The distorted point is obtained as:

x=(1+k1r2+k2r4)Radial Distortion(xy)+(2k3xy+k4(r2+2x2)2k4xy+k3(r2+2y2))Tangential Distortionxs=(fxx+cxfyy+cy)

Images can be undistorted such that the perspective projection model applies. More complex distortion models must be used for wide-angle lenses (e.g., fisheye).