IDS Imaging Development Systems
AI-based pick and place
Traditional pick-and-place tasks in industry are still largely carried out by humans. However, artificial intelligence can lead to a decisive change - robotics processes in the machine vision environment can also be automated using AI.
Industrial automation is undergoing a paradigm shift across all sectors: on the one hand, there are improved computing and enormously high data transmission capacities, which are also exponentially driving the further development of artificial intelligence. On the other hand, these developments are coming up against factors such as a shortage of skilled workers and companies' increasing reshoring activities.
Since AI can also be used to automate processes that can otherwise only be handled and controlled by human decisions, this creates favorable conditions that will and - in view of the volatility of the markets - must accelerate developments in automation technology across all industries. To date, automation in the industrial environment has been synonymous with recurring processes that take place with the aid of robotics and implement pick-and-place processes, for example.
There is hardly an industry in which classic delta robots do not have their place in the production process. Vision systems are usually a central component of such pick-and-place applications. Their task is to clearly identify the products fed onto the conveyor belt based on specific parameters and thus support handling by the robot.
However, some industries still cannot manage without employees at this point: For example, they carry out checks of the smallest deviations that cannot be detected by image processing (without AI), or only to a limited extent. It is often the last 10% that require human interaction for quality assurance. Employees therefore pre-sort or correct errors that have occurred - a physical and mental task that should not be underestimated.
Countering the shortage of skilled workers
This is where artificial intelligence can help: It is fast, robust, works almost error-free and does not take breaks. This means that it is already superior to humans where work processes need to be carried out continuously with consistently high performance and quality. So why not use AI in the machine vision environment in conjunction with robotics? The example of a "smart gripping process" may serve to illustrate this: In this process, different disciplines have to work together optimally.
For example, if the task is to sort products of different sizes and/or shapes, different materials or varying quality using robots, they must not only be gripped, but also identified, analyzed and localized beforehand. This is often not only very time-consuming with rule-based image processing systems, especially in small batch sizes, but also hardly economically feasible. This is different with AI-based inference, where industrial robots are trained with the necessary skills and product knowledge of a skilled worker.
The AI is able to draw conclusions from new facts that it derives from existing data. It is no longer even necessary to "reinvent the wheel" for the individual subtasks - it is sufficient to have the right products working together effectively across disciplines as a "smart robot vision system".
Data processing of the present
The so-called "Vision Guided Robot" - also known as Eyebot - is an example of how pick-and-place tasks can be intelligently automated: Thanks to a smart camera system with integrated AI-based image processing, the compact embedded vision platform does not require a PC. The vision solution can perform everything from image acquisition, image analysis and processing to the control of industrial production machines. The smart gripping process in a production line then works as follows: Objects are randomly scattered on a conveyor belt.
They are recognized, selected and placed in packaging, for example, or passed on in the correct position for a processing or analysis station. The basis for the automated application described here was a PC-based solution developed by the software company urobots for detecting objects and controlling robots. The AI model trained by urobots is able to recognize the position and orientation of objects in camera images. Grip coordinates for the robot are determined from this data.
In the next step, this solution was ported directly to the AI-based embedded vision system from IDS Imaging Development Systems. Consisting of an intelligent camera plus a comprehensive software environment with easy-to-use tools, the complete system also enables users without AI expertise to adapt various use cases themselves. The machine vision tasks are processed "on device", i.e. on the camera itself. "Apps", which can be loaded and executed on the camera as easily as on a smartphone, determine the tasks.
Whenever conditions change in production, for example - such as lighting, the appearance of objects or new object types - the user should be able to take action themselves. In addition, the overall system should function through direct communication between the device components, so that a PC with all the integration tasks and the interface connection can be dispensed with.
The technical approach
A trained neural network identifies all objects in the image and also detects their position and orientation. Thanks to AI, this is not only possible for fixed objects that always look the same, but also when there is a lot of natural variance, such as with food, plants or other flexible objects. This results in very stable position and orientation recognition of the objects. In this example, urobots trained the network for the customer using its own software and was able to easily convert it into a format compatible with the "IDS NXT inference camera" using a tool provided by IDS.
Each layer of the CNN network became a fully described node end descriptor in a fully concatenated list of the CNN in binary representation. A CNN accelerator based on an FPGA core, specially developed for the camera, can then execute these universal CNN formats in an optimized manner. The vision app developed by urobots calculates optimal grip positions from the detection data. But the task was not yet complete. In addition to the results of "what", "where" and "how" to grip, it was also necessary to establish direct communication between the camera and the robot.
This task in particular should not be underestimated. After all, this is often where the decision is made as to how much time, money and manpower needs to be invested in a solution. In this specific application, an XMLRPC-based network protocol was implemented in the camera's vision app in order to pass the specific work instructions directly to the robot. The final AI vision application detects objects in around 200 ms and achieves a positional accuracy of ± 2°. The camera's neural network localizes and detects the exact position of the objects. Based on this image information, the robot can pick and place the objects independently.
Lower development and follow-up costs
It is not just the artificial intelligence that makes this use case smart - the fact that the solution works completely without an additional PC is also interesting in two respects: as the camera itself generates image processing results and does not just deliver images, the PC hardware and the associated infrastructure can be dispensed with. This ultimately reduces the acquisition and maintenance costs of the system. However, it is often also important that process decisions are made directly on site "in time".
This allows subsequent processes to be executed faster and without latency, which in some cases enables the clock rate to be increased. Another aspect relates to development costs. AI vision or the training of a neural network works very differently to classic, rule-based image processing, which changes the approach and processing of image processing tasks. The quality of the results is no longer the product of manually developed program code by image processing experts and application developers.
This means that if an application can be solved using AI, a comprehensive and user-friendly software environment can save the costs and time of the relevant experts. Each user group can train a neural network, design the appropriate vision app and run it on the camera.















