Course Note: Tübingen-CVML-Image formation 1
Image Formation 1: Primitives and Transformations
Credits: Tübingen Machine Learning | Computer Vision - Andreas Geiger
Computer Vision - Lecture 2.1 (Image Formation: Primitives and Transformations) - YouTube
1.1 Primitives and Transformations
- Geometric primitives are the basic building blocks used to describe 3D shapes
- Introduction of points, lines and planes
- Introduction of the most basic transformations
2D points
2D points can be written in inhomogeneous coordinates as
or in homogeneous coordinates as
where
An inhomogeneous vector x is converted to a homogeneous vector x as followed:
with augmented vector
Homogeneous points whose last element is
2D lines
2D lines can also be expressed using homogeneous coordinates
We can normalize
An exception is the line at infinity
Cross Product
Cross product expressed as the product of a skew-symmetric matrix and a vector:
Remark: Squared brackets are matrices.
2D Line Arithmetic
In homogeneous coordinates, the intersection of two lines is given by:
Similarly, the line joining two points can be compactly written as:
The symbol
2D Conics
More complex algebraic objects can be represented using polynomial homogeneous equations. For example, conic sections (arising as the intersection of a plane and a3D cone) can be written using quadric equations:
3D Points
3D points can be written in inhomogeneous coordinates as
or in homogeneous coordinates as
with Projective space
3D Planes
3D planes can also be represented as homogeneous coordinates
Again, we can normalize
An exception is the plane at infinity
3D lines
3D lines are less elegant than either 2D lines or 3D planes. One possible representation is to express points on a line as a linear combination of two points
However, this representation uses 6 parameters for 4 degrees of freedom.
Alternative minimal representations are the two-plane parameterization or Pluecker coordinates. See Szeliski, Chapter 2.1.
3D Quadrics
The 3D analog of 2D conics is a quadric surface:
Useful in the study of multi-view geometry. Also serves as useful modeling primitives (spheres, ellipsoids, cylinders).
2D Transformations
Translation: (2D Translation of the Input, 2 DF)
- Using homogeneous representations allows to chain/invert transformations
- Augmented vectors
can always be replaced by general homogeneous ones
Euclidean: (2D Translation + 2D Rotation, 3 DF)
is a rotation matrix and is an arbitrary scale factor - The similarity transform preserves angles between lines
Affine: (2D Linear Transformation, 6 DF)
is an arbitrary matrix - Parallel lines remain parallel under affine transformations
Perspective: (Homography, 8DF)
is an arbitrary homogeneous matrix (specified up to scale) - Perspective transformations preserve straight lines
2D Transformations on Co-Vectors
Considering any perspective 2D transformation
the transformed 2D line equation is given by:
Therefore, we have:
Thus, the action of a projective transformation on a co-vector such as a 2D line or 3D normal can be represented by the transposed inverse of the matrix.
Overview of 2D Transformation
| Transformation | Matrix | DF | Preserves |
|---|---|---|---|
| translation | 2 | orientation | |
| rigid | 3 | lengths | |
| similarity | 4 | angles | |
| affine | 6 | parallelism | |
| projective | 8 | straight lines |
- Transformations form nested set of groups
- Interpret as restricted
matrices operating on 2D homogeneous coordinates - Transformations preserve properties below
Overview of 3D Transformation
| Transformation | Matrix | DF | Preserves |
|---|---|---|---|
| translation | 3 | orientation | |
| rigid | 6 | lengths | |
| similarity | 7 | angles | |
| affine | 12 | parallelism | |
| projective | 15 | straight lines |
- 3D transformations are defined analogously to 2D transformations
matrices are extended with a fourth row for homogeneous transforms - Transformations preserve properties below
Direct Linear Transform for Homography Estimation
Q: How can we estimate a homography from a set of 2D correspondences?
Let
Using
Each point correspondence yields two equations. Stacking all equations int a
where we have fixed
1.2 Geometric Image Formation
Origins of the Pinhole Camera
- In a physical pinhole camera the image is projected up-side down onto the image plane which is located behind the focal point
- When modeling perspective projection, we assume the image plane in front
- Both models are equivalent, with appropriate change of image coordinates
Projection Models
- Orthographic Projection
- Perspective Projection
Orthographic Projection
Orthographic projection of a 3D point
- The x and y axes of the camera and image coordinate systems are shared
- Light rays are parallel to the z-coordinate of the camera coordinate system
- During projection, the z-coordinate is dropped, x and y remain the same
An orthographic projection simply drops the z component of the 3D point in camera coordinates
Orthography is exact for telecentric lenses and an approximation for telephoto lenses. After projection, the distance of the 3D point from the image can't be recovered.
We usually scale orthography:
Here, the unit for
Perspective Projection
Perspective projection of a 3D point
- Light rays passes through the camera center, the pixel
and the point - Convention: the principal axis (orthogonal to image plane) aligns with the z-axis
In perspective projection, 3D points in camera coordinates are mapped to the image plane by dividing their z component and multiplying with the focal length:
Note that this projection is linear when using homogeneous coordinates. After the projection, it is not possible to recover the distance of the 3D point from the image.
- To ensure positive pixel coordinates, a principal point offset c is usually added
- This moves the image coordinate system to the corner of the image plane
The complete perspective projection model is given by:
- The left
submatrix of the projection matrix is called calibration matrix - The parameters of
are called camera intrinsics (as opposed to extrinsic pose) - Here,
and are independent, allowing for different pixel aspect ratios - The skew
arises due to the sensor not mounted perpendicular to the optical axis - In practice, we often set
and , but model
Chaining Transformations
Let
We chain both transformations to project a point in world coordinates to the image:
Full Rank Representation
It is sometimes preferable to use a full rank
Now, the homogeneous vector
Note that the 4th component of the inhomogeneous 4D vector is the inverse depth. If the inverse depth is known, a 3D point can be retrieved from its pixel coordinates via
Lens Distortion
The assumption of linear projection is violated in practice due to the properties of the camera lens which introduces distortions. Both radial and tangential distortion effects can be modeled relatively easily: Let
Images can be undistorted such that the perspective projection model applies. More complex distortion models must be used for wide-angle lenses (e.g., fisheye).