Sabo Mobile IT
LLM as the basis for voice control
Sabo Mobile IT relies on 'Large Language Models' - LLM for short - for its Sabot voice control system. Thomas Sykora talks about the technology behind it and data protection in an industrial environment.
Why is voice control still so rare in the industry?
Thomas Sykora: Because the technology wasn't ready until recently. Only the new 'Large Language Models' - LLM for short - such as BERT from Google or GPT from open.ai are able to correctly interpret the speaker's intentions. Operating security and data protection were also barriers that had to be overcome. And last but not least, users had to be prepared for this. To this day, talking to machines is sometimes perceived as annoying. Keyboard input doesn't bother anyone, everyone has to listen in on my conversation with the machine. In contrast, working at a computer screen used to be considered particularly stressful, which led to special allowances. This shows what habituation can do. Good voice assistants in our phones and digital assistants are in the process of changing our attitudes. Young smartphone users show where the journey is heading: why type texts when there is Siri?
What are the differences in voice control?
First-generation voice controls could only respond to a few spoken commands - a number, yes or no. Alexa came onto the market in 2015 and, like Siri and Cortana, responded to programmed phrases - AI wasn't quite ready yet. Even common chatbots still essentially search for text matches with prepared sentences and questions and give memorized answers. This often works, but quickly reaches its limits in special cases, complicated contexts and specific problems. The advent of LLM was therefore a quantum leap. This is the opportunity for the current generation of assistants, which includes Sabot as well as the latest Alexa. While it has to serve an extremely broad field of dialogs, Sabot is focused on one machine and its context. Sabot must therefore be particularly reliable and secure. Large internet companies sell the data that we make available to them by using it. This is prohibited in an industrial environment. Sabot must not pass on what it hears without express permission. That's why we use a wake word: it is recognized in the device and only then are the voice signals forwarded via the Internet.
You rely on an AI-controlled chatbot, can you explain how it works in more detail?
Sabot is a modular platform: the interface for controlling the machine, acoustics and conversion of the voice signal into text are different depending on the task and can therefore be solved in several ways. We use AI in several places, for example to suppress background noise. The core, however, is an AI that recognizes the operator's intentions and, if necessary, clarifies and secures them through a dialogue. Sabo uses existing LLMs, which are supplemented as required by the machine's domain knowledge and a specially trained neural network. Sabo is a self-learning system that can monitor inputs, optimize itself, warn of operating errors and make recommendations for action. We have already been researching in this direction with Fraunhofer IPA in 2021.
How much effort is involved in equipping an existing machine with voice control? What technical requirements need to be met?
That depends on the interface to the machine that Sabot accesses. It must be adapted or replaced - an issue for which Sabo Mobile IT GmbH and Grossenbacher Systeme AG offer two solutions. Retrofit hardware is possible, which implements the audio part, the intelligence and the communication to the control system. However, the future-oriented and cost-optimized solution is to use a new controller that has the necessary hardware and meets future cyber security requirements. In both cases, the machine and the connected process must be recorded and tested in the AI, which SABO Mobile IT offers as a service. This is based on training and testing tools currently being developed in research projects, which will increase effectiveness and benefit customers.










