Real-time analysis

Big Data

Dominik Ulmer | Lukas Dehling, 23.08.2016, 15:50

Real-time analysis

If fast response times are required for data analysis, a cloud solution can reach its limits. An alternative is supercomputer technology: In combination with customized analysis software, it promises analyses in real time.

Images

The hype surrounding big data is huge: the mere acquisition of a big data analytics solution is often seen as a panacea in terms of business intelligence and ROI - but there are many other factors to consider. Despite all the discussions, at least one thing is certain: As far as the impact of Big Data is concerned, we are only seeing the tip of the iceberg at the moment. Big data is the basis of digitalization and will consequently affect more and more areas in the future. Therefore, both IT and business must find ways to use the potential of big data to convert data into information and this in turn into added value and knowledge.

After all, what is the point of having huge amounts of data if it is not put to good use and companies can only make appropriate business decisions far too late? This is why the time factor is crucial. In many cases, it must be possible to process data within a very short time in order to make profitable decisions at all. Data analysis in real time is often even necessary.

Fundamental problems

Big data analyses are very difficult overall. The data volumes are immense and the data itself is extremely diverse because it is available in every conceivable format. Whether in terms of the size of the data records, the scope or the complexity: big data analytics is experiencing an almost explosive development. And this poses additional problems for companies that are already struggling with the unchecked proliferation of clusters, the flood of new applications and the ever-increasing need for ever faster insights. What's more, technological developments in the world of big data are anything but stagnating. Technologies such as Spark, Hadoop and graph databases are now ubiquitous in many industries. And innovative approaches such as deep learning / machine learning are also on the rise.

Against this backdrop, solutions are needed that make mountains of data quickly comprehensible and that can be successfully applied in a scalable environment. In addition, a correspondingly high level of computing power is required, which conventional computing architectures are generally unable to deliver.

The 'Urika GX' system is available in three versions - in its most powerful version with up to 1728 cores per system. This is a 42U/19-inch standard rack with dimensions of 2000 mm × 600 mm × 1600 mm (H×W×D).

Fusion of software and hardware

To address these problems, Cray has developed the new agile big data analytics platform 'Urika-GX', which is designed to help tackle the biggest big data challenges - despite ever-increasing data volumes, complexity and a growing number of application areas. To achieve this, the characteristics of a supercomputer, namely enormous computing speed as well as scaling and throughput rates, were combined with those of standardized enterprise hardware and an open source software environment (OpenStack for data management and Apache Mesos for dynamic configuration) - which ultimately means more application convenience and flexibility for the user. In contrast to the often cited 'shadow IT', in which different cluster architectures are used for different workloads and thus pose a problem for the integration of applications, the focus here is on the use of uniform and open industry standards. This makes it much easier to integrate new analytics tools.

The 'Urika-GX' system has pre-integrated industry-standard software for easy implementation during operation.

The hardware appliance is designed for demanding analysis workloads and allows multiple analysis tasks - be it Hadoop, Apache Spark or Graph - to be executed simultaneously on a single platform. Because even very extensive and complex graph analyses are possible, users have a powerful tool at their disposal to quickly gain insights into large volumes of unstructured data.

The Aries connection chip

On the hardware side, the system has Intel Xeon Broadwell cores, 22 TByte RAM and 35 TByte local SSD storage as well as the Aries connection chip.

How can this be realized? It is made possible by the use of components that are already successfully in use in the 'Cray XC' supercomputers - including the so-called Aries interconnect chip (Aries Interconnect). This high-speed internal network is a distributed interconnect system designed for low latency, high bandwidth and optimized for high messaging rates. As a result, network-dependent workloads such as Spark or graph-based analyses run faster, as the data packets can be fed in continuously (in-flight) without having to wait for a response. This refers to the ability of the network to keep very large quantities of data packets active on the network at the same time. This is a necessary prerequisite to enable so-called 'one-way' communication, in which the sender no longer waits for an acknowledgement from the recipient before sending the next data packet, meaning that different communication streams can be overlapped. This results in very high rates of small data packets on the network.

The Aries connection chip replaces connections via Ethernet or InfiniBand nodes, eliminating the need to build a network fabric between individual nodes, which unnecessarily consumes time, support and capital.

Graph analyses in database

Once the large amount of unstructured data has been brought 'into shape', graph analyses come into play. They are a particular strength of the new platform. Graphs are still the fastest growing type of database. One reason for their increasing popularity is the realization that they can map relationships between entities much better than relational databases. Graph databases can be used to recognize certain patterns and relationships between individual variables - this is often very difficult or even impossible with relational databases.

While graph analyses have long been considered one of the most difficult tasks for modern analytics systems in terms of scaling and performance, they can now be performed up to 100 times faster thanks to state-of-the-art technology. In the case described here, the 'Cray Graph Engine' takes over the calculations and enables the necessary fast and complex iterative deep search. In this environment, it is important that every scenario - from a single processor to thousands of processors - is supported without any loss of performance. Another important factor is the ability to process data sets of several terabytes in size without causing unnecessary data shifts.

The graph engine can be used to recognize new patterns within data, make correlations between data points and then formulate corresponding hypotheses. And the analytics workflows on which these hypotheses are based can be run in parallel to compare results in real time and flexibly adapt workflows depending on the outcome.

The difference to conventional cluster architectures is that the calculations performed on them do not slow down as soon as the graphs become larger. With traditional clusters, this can even be the case if additional computing nodes are added, which generally do not bring any additional performance benefits anyway.

Author:
Dominik Ulmer is Vice President EMEA Business Operations at Cray.

Application scenarios for big data analysis

Data scientists, IT departments and researchers can use graph analysis capabilities to first build and then query graphs with tens of billions of relationships, which have also been compiled from all kinds of data sources. This opens up new application possibilities for many industries:

Graph analyses in cancer research:
In cancer research, graph analytics in particular and Big Data analytics as a whole are being used to analyze genomic data and genome sequencing. Here, too, one of the biggest challenges is that the medical data to be collected is very diverse and fragmented. This is precisely why a standardized platform for recording, analyzing, retrieving and querying data is so essential. The non-profit research institute Broad Institute of the Massachusetts Institute of Technology (MIT) and Harvard in the United States, which strives for a greater understanding of diseases and progress in their treatment, was able to significantly reduce the time it takes to obtain quality score recalibration (QSR) results from its genome analysis toolkit 'GATK4' and the Apache Spark pipeline from 40 to 9 minutes with the new system.
Predictive maintenance in manufacturing:
Big Data Analytics also holds enormous potential for the manufacturing industry. A prime example of the use of big data analytics solutions in the manufacturing sector is predictive maintenance. This involves analyzing the data obtained from sensors and machine control systems in order to time maintenance intervals and avoid breakdowns. For this use case, it is advisable to use a hardware appliance instead of a cloud solution for two reasons. Firstly, the cloud has too high a latency time to be able to achieve analysis results quickly enough. Secondly, the data must first be moved to the cloud - this ties up resources and is often not recommended, especially when it comes to protecting business-critical data.
Fending off cyber attacks:
Ensuring a secure network for uninterrupted business operations is more important than ever in today's hyper-connected world. However, IT departments and security managers are also faced with the problem of coping with the sheer volume of machine-generated data. Conventional technologies often reach their limits at this point. Another key area of application for big data analytics and graph databases in particular is therefore cyber security. Fast reactions are particularly important here, as otherwise a company's reputation and progress may be at stake. In order to detect cyber attacks or anomalies, hundreds of millions of log data must be analyzed. If an attack then occurs on a company network, companies must be able to react immediately - i.e. in real time.

You might also be interested in

Sensor+Test 2026

From sensor to intelligent system

Sensors, measurement technology and artificial intelligence are merging to form intelligent systems. From June 9 to 11, 2026, Sensor+Test in Nuremberg will showcase the most important trends in a changing industry.

Kistler

Solid sales development in 2025

The Kistler Group generated sales of 424 million Swiss francs in 2025. With a currency-adjusted decline of 1 percent, or 5 percent in Swiss francs, the company was able to keep sales stable compared to the previous year.

Chauvin Arnoux at the SPS 2024

Multimeter for hazardous areas

With the 'MTX 3297Ex', Chauvin Arnoux is launching an intrinsically safe multimeter for continuous use in potentially explosive ATEX environments.

AMA Innovation Award 2025

Submit applications now

The AMA Association for Sensors and Measurement (AMA) is calling on scientists and developers from industry and research to apply for the AMA Innovation Award 2025. We are looking for research and development results in sensor and measurement...

Janitza

Management expanded

Axel Hessenkämper joined Janitza as the new Chief Executive Officer (CEO) on September 1, 2024 and is part of the management team alongside Mr. Janitza, Mr. Müller and Mr. Veidt.

Janitza

Update for multifunctional energy analyzer

The multifunctional energy analyzer 'UMG 806' from Janitza is now available in a low-power version and with UL certification.

Gossen Metrawatt

Three-phase adapter for test devices and CEE sockets

Gossen Metrawatt has introduced new three-phase adapters for connecting test devices to 5-pin CEE sockets.

Ceta at the Motek

Test technology and accessories for test benches

Ceta is coming to Motek/Bondexpo 2024 with test systems and test devices. The range of test devices from Ceta Testsysteme includes leak testers, mass flow testers, volumetric flow testers and digital pressure gauges.

Wika

Wireless gas density sensor

Wika supports the digitalization of gas-insulated switchgear with the 'Type GD-20-W' gas density sensor.

Real-time analysis

Fundamental problems

Fusion of software and hardware

The Aries connection chip

Graph analyses in database

Application scenarios for big data analysis

You might also be interested in

From sensor to intelligent system

Solid sales development in 2025

Multimeter for hazardous areas

Submit applications now

Management expanded

Update for multifunctional energy analyzer

Three-phase adapter for test devices and CEE sockets

Test technology and accessories for test benches

Wireless gas density sensor

Categories

Focus areas

Service

Magazine

Our network