SSV Software Systems

Klaus-Dieter Walter,

Avoid crowdstrike errors

The cyber security of networked applications can be improved with the help of automatic attack detection systems or intrusion detection systems (IDS). However, as the Crowdstrike incident makes clear, an IDS is only suitable for use in industrial IoT applications under certain conditions.

© AdobeStock-Robert

This summer, on July 19, 2024, Crowdstrike experts delivered a faulty Falcon sensor update to over 8 million Windows PCs and servers worldwide within 78 minutes, which was automatically installed and led to the total failure of the target systems. The 'Windows Bluescreen of Death' (BSoD) enabled those affected to quickly recognize that something was wrong and begin damage limitation.

Crowdstrike-Falcon is a highly developed security software that protects computer systems in real time against cyber attacks of all kinds and is not intended to serve as the source of a massive denial-of-service attack. The speed with which such malware can now spread is remarkable: According to the Crowdstrike statements, only Windows systems that were online on July 19 in the period from 4:09 to 5:27 UTC and downloaded the update from the Crowdstrike servers were affected. On a positive note, there was a quick response and the faulty update was deleted from the servers. As the BSoD also affected large screens in New York's Time Square, displays at numerous airports, bus stations, train stations and self-service checkouts in supermarkets, this Crowdstrike incident caused something like maximum visibility for a cyber security problem.

Advertisement

The subtleties of Falcon

Figure 1: Example of a functionally redundant IoT gateway: Two independent computer systems (Sys1, Sys2), which have separate power supplies U1, U2, are connected to the same local OT network (OT-LAN). Each computer system has a wireless wide area network (WWAN) interface to enable the IoT gateway to have two different connection options to an external IoT platform via multiple configuration. © SSV Software Systems

Falcon is predominantly used by larger companies and organizations as an endpoint security platform to detect, prevent and respond to malware, zero-day attacks, behaviour-based anomalies and other threats in real time, but also to meet ISO 27001 compliance requirements, for example. Falcon, together with its extensive accessories, can therefore also be classified as Enterprise Endpoint Detection and Response (EDR) software. Falcon is based on a SaaS architecture with agent-based sensor technology in the endpoints on site and therefore requires a cloud connection to these endpoints. For this purpose, a sensor software function is installed on the customer devices, which collects real-time data and checks it for certain characteristics. If suspicious activities are detected, the agent transmits the relevant data to the Crowdstrike cloud for further analysis. If necessary, the attack detection algorithms are adapted to new findings and so-called 'rapid response content configuration updates' are generated there. These updates are automatically downloaded and installed by the agents, sometimes several times a day. As a result of these activities, dynamic attack detection is created in the individual endpoints.

How was it tested?

Figure 2. The two computer systems (Sys1, Sys2) are synchronized to form a functionally redundant IoT gateway with the help of system watchdog software (SysWd). This distributed software function is based on a four-stage status diagram for this example. © SSV Software Systems

If you imagine a Windows update that you only have to install to immediately generate a BSoD, you ask yourself whether anything was tested at all or whether other causes could also be possible. Crowdstrike has now published various pieces of information on this. This shows that very extensive quality assurance measures with unit and integration tests, as well as performance and stress tests, are used (see web tip 1).

However, these tests are probably only used for sensor and driver codes at Crowdstrike. In addition to the code, there is other data, the so-called rapid response content. These are binary key-value pairs that are used by the Falcon sensor code to detect suspicious activities. This signature data was probably not fully included in the code-oriented test concepts, so that incorrect data from a so-called channel file ultimately led to the BSoD incident on over 8 million Windows computers.

Customers who use Crowdstrike for compliance reasons will generally not give too much thought to the scope and details of the update test processes. In the critical infrastructure environment, however, the question must be asked as to whether the update tests on the provider side are sufficient or whether appropriate tests are not also required on the user side as part of adapted CI/CD processes before a new IDS attack detection version is installed in the respective sensors. A two-stage test concept is actually necessary: The IDS partner creates an update and tests it in accordance with documented quality assurance measures. This update is then sent to the user. There, the software component undergoes further tests. If no anomalies are found, the software is deployed. Although such two-stage software update test concepts incur additional costs, they result in a significantly higher level of security.

Resilience optimization and redundancy check

Effective cyber resilience and meeting compliance requirements are not necessarily the same thing in practice. This is why the Crowdstrike outage also affected numerous organizations that are part of the critical infrastructure. As such incidents must be reported in Germany (KRITIS Regulation and soon also NIS-2), the BSI was called in. This federal authority is now dealing intensively with the crowdstrike incident and also wants to work with the perpetrators to optimize resilience. To this end, a multi-stage action plan has been published to help improve their EDR tools "by establishing cooperation between the BSI and Crowdstrike and Microsoft" (see web tip 2).

The fact that KRITIS organizations implement important functions at all with Windows computers that are only protected by standard EDR software with automatic remote updates is thought-provoking. With the current state of the art, there are several options that such operators could use in addition to a two-stage software update test to better protect themselves against total failures: A/B boot concepts and system redundant plant concepts are solutions that have been successfully used in other areas to improve reliability. Here are two examples:

The author: Klaus-Dieter Walter, SSV Software Systems © SSV Software Systems
  • A/B boot strategy: This means that a faulty update would have had no significant impact on operations, as the "old" unchanged software stack is simply reactivated instead of the blue screen display after booting the faulty software partition.
  • Redundancy: The two figures illustrate a functional redundancy example for the multiple design of an IoT gateway function using two largely identical computer modules that synchronize with the help of a distributed watchdog application. In the event of a computer failure, the system switches over automatically.

Who or what is Crowdstrike?

Crowdstrike is a publicly listed US cyber security technology company with around 8,000 employees. Its main product is a hybrid IDS called "Falcon", which detects various types of cyber attacks in IT systems with Windows, macOS or Linux operating systems. Crowdstrike sees itself as a market leader for platforms for "risk-based vulnerability management". Fast and scalable threat detection and response is emphasized as a user benefit. This enables the responsible security teams to react immediately on site at the user's premises, isolate compromised systems and carry out forensic investigations.

ah

  • Xing Icon
  • LinkedIn Icon
Advertisement
Advertisement

You might also be interested in

Advertisement

SSV

AI-based Continuous Monitoring

Each connected module now includes integrated AI functions: while this is standard in today’s smartphones, it remains rare in automation. Yet, such AI could serve as a foundation to detect communication issues caused by cyberattacks in real time.

read more...
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Subscribe to our newsletter
Advertisement
Back to home