Image processing

Dr. Oliver Fleischmann | Inka Krischke,

3D vision made easy

Too little computing power, high prices and insufficient accuracy have slowed down earlier 3D systems in many applications. Today, however, the technology is finding its way into more and more industries thanks to increased performance and high-resolution sensors.

© The Imaging Source Europe

Whether it's the smart industrial robot in the Industry 4.0 era that uses three-dimensional information to orient itself in space, the reverse vending machine that counts the bottles in the drinks crate or surface inspection systems that detect the smallest material defects - three-dimensional information about the environment and objects, acquired using modern 3D sensor technology, is the future in many industrial applications. There are now various technologies on the market for collecting three-dimensional information from a scene. A fundamental distinction must be made between active and passive methods: Active methods, such as 'lidar' (light detection and ranging) or time-of-flight sensors, use an active light source to determine distance information; passive methods simply use the image information obtained by cameras, similar to the distance perception of the human visual perception system.

All methods have their advantages and disadvantages: while time-of-flight systems generally require little computing power and have hardly any restrictions in terms of scene structure, the spatial resolution of current time-of-flight systems is rather low at a maximum of 800 × 600 pixels and their use in outdoor areas is severely limited by the sun's infrared radiation. Passive multi-view stereo systems offer very high spatial resolutions due to the image sensors now available, but require a computing effort that should not be underestimated and suffer from weak or highly repetitive textured scenes. Nevertheless, today's computing resources and optional pattern projectors enable the real-time use of stereo systems at high spatial and depth resolutions. This is precisely why they are among the most popular and versatile systems for the acquisition of 3D information.

(Multi-view) stereo systems consist of two or more cameras that inspect a scene simultaneously. If the cameras are calibrated and the image point of an object point in the scene can be found in the individual camera views, the three-dimensional object point can be reconstructed from the image points by triangulation. The achievable accuracy depends on the distance between the cameras (baseline), the vergence angle between the cameras, the pixel size of the sensor and the focal length. The essential components of calibration and correspondence detection already place high demands on the underlying image processing algorithms.

Advertisement

Stereo systems in real-time use

Exemplary detection results of a calibration pattern in different positions and orientations. The internal and external parameters of the cameras are determined using the detected pixels of the calibration pattern.

© The Imaging Source Europe

Camera calibration can be used to determine the positions and orientations of the individual cameras (external parameters) as well as the focal lengths, main points and distortion parameters of the cameras (internal parameters), which are significantly influenced by the optics used.

Camera calibration is usually carried out using two-dimensional calibration patterns such as checkerboard or dot patterns, in which distinctive points can be detected as easily and clearly as possible. The dimensions of the calibration patterns, such as the distances between the distinctive points, are known exactly. Image sequences with varying pattern positions and orientations are then acquired from these calibration patterns. Image processing algorithms detect the distinctive points of the calibration pattern in the individual images. Corner and edge detectors in the case of simple checkerboard patterns or blob detectors in the case of dot patterns, for example, serve as the basis. This results in a large number of 3D-2D correspondences between the calibration object and the individual images. An optimization process then provides the camera parameters based on these correspondences.

While the calibration is only carried out once - provided that the camera parameters do not change during operation of the system - the much more computationally complex correspondence determination between the views must be carried out for each image in order to determine the 3D information of the scene. In the case of a stereo system, correspondences between two views are determined. As pre-processing, the images are usually rectified using the calibrated internal distortion parameters. For an image point in the reference view, the corresponding point in the target view is then searched for, which depicts the same object point. Assuming the 'Lambertian illumination model' - i.e. diffusely reflecting surfaces - local environments of corresponding pixels in the views should be very similar. For a given similarity measure - the normalized cross-correlation, for example, is common - similarity values of a local environment of a point in the reference view and local environments in the target view are determined.

Corresponding points

Not all points are possible candidates in the target view: Geometrically, possible corresponding points in the rectified views lie on a straight line, the so-called epipolar line. Corresponding points must therefore only be searched for along this straight line. To speed up this search further, the rectified input images are often rectified. The input images are transformed in such a way that the points of the epipolar line have the same vertical coordinate as the reference point for all points in the reference view. For a point in the reference view, corresponding points only need to be searched for along the same image line in the target view. While the complexity of the search remains the same, the previous rectification enables a more efficient implementation of the correspondence search. If the minimum and maximum working distance in the scene are also known, the search along the epipolar line can be further restricted and thus accelerated.

Above: Original image pair of a stereo system from The Imaging Source. Below: Rectified image pair. For a point in the reference view (left), corresponding points can only be found along the same image line in the target view (right).

© The Imaging Source Europe

Once all possible target environments along the epipolar line have been compared with the reference environment, the target environment with the highest similarity is usually selected as the final correspondence in the case of local stereo algorithms. Once the correspondence search has been completed, the distance information for each pixel in the reference view of a rectified stereo system (if a clear correspondence has been found) is available in the form of disparity, i.e. in the form of the offset in pixels along the corresponding image line. This is also referred to as a disparity image or disparity map.

Using the previously calibrated internal and external parameters, the disparity can in turn be converted into actual metric distance information. If this distance is calculated for each point at which a disparity could be estimated, a three-dimensional model is obtained in the form of a so-called point cloud. In the case of homogeneous or highly repetitive scenes, the use of local stereo methods can lead to incorrect estimates, as several points with equivalent similarities may exist in the target view. Global stereo methods, which place additional conditions on the final disparity maps - for example in the form of neighboring depth values that are as similar as possible - can provide a remedy here, but are also significantly more computationally complex. It is often easier to use a projector to project artificial structures onto the object and thus achieve unambiguous correspondences (projected texture stereo). The projector does not need to be calibrated with regard to the cameras, as it only serves as an artificial structure source.

Acceleration through GPUs

Disparity estimation and final point cloud visualized using an SDK from The Imaging Source: On the left the disparity map with respect to the reference view, the 3D view of the textured point cloud (center) and on the right the color-coded point cloud.

© The Imaging Source Europe

If high frame rates and high spatial resolutions need to be guaranteed at the same time, the calculation of 3D information can be significantly accelerated using modern GPUs.

For the final integration of a stereo system into existing environments, for example, The Imaging Source relies on modular solutions: For example, the company's own C++ SDK with optional GPU acceleration in conjunction with cameras from The Imaging Source's portfolio or 'Halcon' from MVTec can be used as the environment for obtaining the 3D information. While the SDK enables stereo systems to be calibrated and 3D information to be acquired and viewed with little effort, Halcon offers further options such as hand-eye calibration for integration into robot systems or advanced algorithms such as the registration of CAD models with regard to the acquired 3D data.

Author:
Dr. Oliver Fleischmann is Project Manager at The Imaging Source in Bremen.

  • Xing Icon
  • LinkedIn Icon
Advertisement
Advertisement

You might also be interested in

Advertisement
Advertisement
Advertisement
Advertisement

IDS

AI and streaming meet robustness

IDS Imaging Development Systems is expanding its range of AI cameras with the 'IDS NXT oslo'. The model combines artificial intelligence with video streaming and recording functions in a compact, industrial-grade housing.

read more...

Digital Twins

With real-time data against disasters

A digital twin is more than just a two- or three-dimensional digital copy. It combines precise data with intelligent analysis and solves complex challenges. With locations in Paderborn and other cities, Eviden relies on the combination of artificial...

read more...
Advertisement
Advertisement
Advertisement
Subscribe to our newsletter
Advertisement
Back to home