Big Data
Machine learning - getting started!
Google, Facebook, Netflix and Amazon are already using machine learning. The possibilities of these technologies are also widely available in the industrial environment - but companies are hesitating. There is no time to lose.
Data is the oil or raw material of the 21st century! This statement has been circulating through the automation industry for many years. However, if you look at the spread of practical data mining applications or applications with integrated machine learning in mechanical and plant engineering, the oil rush - at least in this industry - seems to have failed to materialize so far. This cannot be due to a lack of suitable use cases, as there are many optimization and prediction problems in the development and operation of machines and systems.
This raises the question of whether such optimized products are not (yet) being accepted by customers, or whether companies are questioning the economic viability of such solutions. Or is it simply tangible implementation and organizational issues that are acting as a brake.
In the hardware-driven industry, software, including software development, often has a tough time of it. The practical application of data analysis techniques and machine learning is yet another step out of the comfort zone. This two-part series of articles is intended to help answer fundamental questions about a still young discipline in engineering.
The status quo
The starting point is the question: Where do we really stand with analytics technology? This question can be answered by looking at the current Gartner Hype Cycle 2016 from a strategic perspective: machine learning for specific use cases has reached the 'Peak of Inflated Expectations'. This means that the first phase of the topic is coming to an end. At the peak of inflated expectations, so much is projected into the topic due to the great attention it generates that disappointment over teething troubles in implementation and unfulfillable promises is practically pre-programmed. Only after a subsequent phase of disillusionment does it become clear whether the new technology will establish itself or disappear again.
However, companies that believe they can wait and see before jumping on the bandwagon should not be too sure. According to Gartner, the estimated time frame for the establishment of machine learning is only two to five years. It is therefore important not to lose any time and to answer the questions with regard to a specific company and product application without becoming disillusioned.
What is actually feasible?
Figure 1: The project steps of an analysis project: data preparation, modeling using data mining and machine learning, interpretation of the interim results and evaluation and validation of the models.
© Engineering office lean-digital-transformationMachine learning is a technology that undoubtedly enables innovative and highly optimized products and processes. This is demonstrated by companies such as Google, Facebook, Netflix and Amazon. Their business models and products would simply not work without machine learning. We also come into contact with machine learning on a daily basis and use functions such as facial recognition for photos, image search or virtual assistants such as Siri and Cortana. These technologies can also be used on a broad scale in an industrial environment, although they have not yet been implemented as consistently or are still in the implementation phase. In general, there are two main fields of application: knowledge discovery and predictive analytics. The aim of industrial knowledge discovery is to identify causal relationships from data (Figure 1) in order to derive specific measures.
The most frequently asked questions in this context are probably: "Which process parameters lead to high quality?" And: "What is the cause of the recently increased reject rate?". Established, interpretable methods - such as learning and analyzing decision trees and multivariate regression methods (with integrated feature selection) - can be used here. For example, a cooperation between AMS Engineering and SCCH was able to identify an excessively thick oil film residue on the starting material as the cause of sub-optimal weld seams in a laser welding process. The integration of technical experts into the knowledge discovery process is essential for the success of these methods.
Machine learning methods can be used to evaluate the quality of the models, but not the value of the correlations found and the hidden potential for improvement. On the one hand, users can ensure that the data is prepared correctly and that the algorithms are applied correctly. On the other hand, automatic differentiation between causal relationships and often irrelevant statistical relationships is very difficult and often requires manual intervention. The latter is still a very active area of research and the methods developed to date are not yet sufficiently stable for productive industrial use.
Generate predictions
Figure 2: Data analytics can occur at all levels of the automation pyramid. Accordingly, different groups of experts need to be involved.
© Engineering office lean-digital-transformationTasks in the field of predictive analytics go beyond the mere generation of knowledge. Their aim is to generate prediction models for future events. The field of industrial application of such models is very diverse. It ranges from virtual sensors, fault detection, diagnosis and prediction to the prediction of critical quality attributes and model-predictive control at the lower levels of the automation pyramid (Figure 2). At the upper levels of the pyramid, these methods are used, for example, to support logistical processes or to better predict material requirements and the necessary stock levels. The range of methods used here is very broad and includes well-known machine learning methods such as neural networks and support vector machines as well as related methods that have their origins in areas such as system identification or control engineering. Hybrid approaches, in which expert knowledge about the system and the process serves as the basis for an analytical model whose potential systematic shortcomings can be corrected using a data-driven approach, are particularly promising.
94 % accuracy
For example, the Software Competence Center Hagenberg (SCCH) recently successfully created a neural network for fault prediction, which can predict major faults of devices in the field with 94% accuracy. This is a significant advantage compared to the subjective assessment of service personnel or rigid service planning, which can directly save money and time.
The software tools available for developing such applications, locally installed data mining suites or cloud-based industrial IoT platforms, already make it much easier to connect small to very large (big data) data sources and combine and apply existing analytics algorithms: creating the neural network mentioned above for fault prediction based on a few terabytes of historical data was carried out on a powerful desktop PC, while learning a fault detection model for 100,000 devices in the field can be done with a powerful PC.000 devices in the field with hundreds of high-resolution sensor data can only be achieved with a corresponding big data architecture. However, the development of the specific model or the model creation process is still a manual, demanding and often lengthy process. Data preparation in particular - the necessary foundation of any successful analytics project - is a key factor from a project perspective. This accounts for between 50 % and sometimes even up to 80 % of the total effort involved in an analytics project. If the analysis task is approached with the wrong expectations, data preparation alone can be a stumbling block. As already mentioned, automated data preparation is a very active area of research and is not yet suitable for productive industrial use.
Evaluate the benefits
However, in many cases it is still very difficult to make an entrepreneurial decision based solely on the knowledge of what would be technically feasible. The questions of profitability or the unclear market potential are quickly very high hurdles - and the first step is often not taken.
In order to be able to start with a relatively low level of information and a high decision risk, there are methods for decision support. One method with a particularly high practical benefit is 'Applied Information Economics', or AIE for short (source: Hubbard, "How to measure anything"). With this method, potential users can pragmatically estimate the expected losses and gains as well as the costs that may be incurred for an analysis, training or consultation, for example, in order to reduce the decision risk within a reasonable framework.
The basic prerequisite for the successful use of this process is to get a feeling for your own uncertainty and not to lie to yourself. The technical term for this is calibrated estimation. Check your own estimates using a simple interactive example on the following homepage to see whether you should invest in the development of a machine learning process and what the reduction in decision risk might cost, for example through a pilot project or a consultancy project: https://lean-dt.shinyapps.io/shinyDecisionSupport.
Authors:
Dr.-Ing. Hans Egermeier is an independent management consultant specializing in lean digital transformation;
Dr. Thomas Natschläger is Scientific Head of Data Analysis Systems Group at the Software Competence Center Hagenberg;
Markus Riedenbauer is a project manager in the Research & Development department at Siemens Transformers Austria.















