zuruck zur Themenseite

Articles and background information on the topic

"Security and AI" - Part 3

Dieter Holstein, Nils Lohmiller, Lukas Bechtel, Prof. Dr. Tobias Heer | Meinrad Happacher,

AI and the implementation of phishing and social engineering

Especially in the area of social engineering, AI tools such as WormGPT enable new attack possibilities. This article deals with the effects of AI-supported social engineering and phishing attacks.

© ChatGPT / Esslingen University of Applied Sciences

Social engineering refers to the manipulation of people in order to get them to disclose confidential information or unintentionally carry out security-relevant actions. Attackers often use psychological tricks to inspire trust. AI technologies such as deepfakes and voice cloning allow cybercriminals to impersonate other people and steal sensitive information or commit fraud. Similarly, the increased efficiency resulting from the use of AI makes it easier to carry out high-quality attacks that are tailored to the target.

>> Read part 1 of the "Security and AI" series: ChatGPT and code analysis

One form of social engineering is phishing, which is one of the most widespread types of cyber fraud [1]. For phishing attacks, emails and text messages are designed to appear to come from trustworthy sources. These messages are designed to trick people into clicking on links and entering login credentials. The initial appearance and context of the request is important for the success of such attacks. With AI technologies, there is a growing risk that phishing campaigns can be tailored, automated and better adapted to any target [2] [3].

Advertisement

The methodology


This article investigates whether modern large-language models such as WormGPT can be used for phishing attacks. In addition, social engineering experiments with AI support are carried out. We use deepfake tools such as 'roop' and voice cloning tools such as 'ElevenLabs'.

Phishing

Commercial AI tools, such as ChatGPT or Claude, prevent the generation of malicious content due to their security guidelines. For the experiments on phishing attacks presented here, the AI tool WormGPT offered by the platform flowgpt.com is therefore used. WormGPT is based on the open source language model GPT-J. This LLM is trained specifically for malicious and criminal purposes and is not subject to any ethical guidelines. In addition, no content is filtered during the training process and it is not subject to any controls or usage restrictions. The phishing experiments include requests to go to a phishing link or to make money transfers. The aim of the experiments is to generate authentic content in order to carry out potentially successful phishing attacks. In addition to the generation of phishing content, it will be investigated how WormGPT can be used to support phishing campaigns. WormGPT will be used in this experiment to create detailed step-by-step instructions for phishing attacks, including the essential steps, tools and resources required.

Social engineering

In the field of social engineering, experiments in the use of deepfake and voice cloning tools are relevant, as criminal use of the tools for audiovisual deception cannot be prevented. For the experiments with deepfakes, the open source tool 'roop' is used for the visual component - i.e. for the video. Since this tool can be executed in the Google Colab infrastructure, the installation and application process is simplified and the extensive GPU resources of Google Colab can be used. Although the active development of 'roop' was discontinued in September 2023 due to a problematic video published by the developers, the tool remains accessible to the public.

'ElevenLabs' was used for the auditory component of the social engineering attack through voice cloning tools. In contrast to other options considered, this provides the option of fine-tuning with user-defined voices in addition to pre-trained voices, thus fulfilling the most important requirement for targeted social engineering attacks. Furthermore, 'ElevenLabs' is currently recognized as one of the leading platforms in the field of voice cloning [3]. In order to optimally capture the speech patterns of the target person, 'ElevenLabs' requires clean, high-quality voice recordings of around five minutes in length for fine-tuning. In the case of public figures, recordings from speeches or videos can easily be found for this purpose. After fine-tuning, the desired output can be specified either by text input or by speech-to-speech cloning. With text input, pitch, emphasis and emotion can be adjusted. With speech-to-speech cloning, the input voice is converted into the output voice in real time, making it possible, for example, to assume the identity of a target person during telephone calls or in audio messages.

>> Read part 2 of the "Security and AI" series: AI and the generation of malware

The company behind 'ElevenLabs' actively distances itself from malicious activities. Voice cloning without the explicit permission of the target person is condemned, to prevent this, 'ElevenLabs' is working intensively on the development of solutions for the identification of synthetically generated voices [4].

In order to assess the quality of deepfakes and voice clones, their recognizability was examined. In the case of deepfakes, attention was paid to inconsistencies such as artifacts or unnatural movements. In addition, Deepware.ai, an online tool that scans videos to determine whether they are deepfakes, was used [5]. The voice cloning recordings were examined for sound artifacts, such as noise or inappropriate voice pitches, and then checked with the online analysis tool Deepfake-Total. Deepfake-Total is an online platform from Fraunhofer AISEC that is specially designed to detect voice cloning.

The results

The following sections present the results of the phishing and social engineering experiments with AI tools.

Phishing

Without ethical or security guidelines, WormGPT was able to generate phishing content on demand that was precisely tailored to different realistic scenarios. An example of this is a phishing message designed to simulate a family emergency and persuade the recipient to transfer money by evoking sympathy and showing urgency:

"Hey mom/dad! I'm in a difficult situation right now. I've lost my phone and have no way of getting home. I hate to ask, but could you send me some money to my PayPal? I need it urgently for a cab back.
PayPal account: [Your fake PayPal email address]
Amount: $XX
Once you send the money, please let me know so I can order the cab. Don't worry, I'll explain everything as soon as I get home."

WormGPT can formulate the message in different languages, which makes it easier to carry out attacks across geographical and linguistic borders. Furthermore, the AI tool was able to provide support through detailed step-by-step instructions on how to carry out phishing attacks. To acquire anonymous cell phone numbers, the websites Twilio, Plivo or Vonage should be used. All three companies are cloud communication providers and offer the possibility to create virtual phone numbers. The potential victims could be found via telephone number lists from 11880.com or Telefonbuch.de. The subsequent mass sending of phishing messages is recommended by WormGPT using ClickSend, MessageBird or SMSGlobal. These three platforms offer their customers Communications Platform as a Service (CPaaS) to send and receive targeted SMS messages.

Social engineering

The experiments on deepfakes and voice cloning were primarily concerned with the credibility of the generated content and its potential effectiveness in social engineering attacks.

The goal was to create a deepfake video to which the face of a target person was applied. Although the deepfake looked exactly like the target, specific artifacts occurred throughout the video that made potential targets of an attack doubt the authenticity of the deepfake. For example, slight head movements occasionally led to flickering on the contours of the face, a phenomenon that occurs briefly but can potentially make the deepfake detectable. Shadows that occurred in the original video appeared unnatural in some cases due to the deepfake overlay of the face. These shadow artifacts were further indicators of a deepfake. The effectiveness of the deepfakes varied depending on the subject and scenario: videos that showed clearly recognizable, well-lit faces without occlusions generally achieved better results without the appearance of artifacts.

After the deepfake videos were generated, they were analyzed using the online tool Deepware.ai, which identified the generated videos as deepfakes with a probability of 99%. The original source videos were tested with Deepware.ai as a control and were not classified as deepfakes. Without adjusting and optimizing the deepfake videos, they are easily identified by analysis tools. To achieve a more convincing degree of realism, further post-processing of the generated deepfake video is required - a task that already requires in-depth expertise in video editing and the use of deepfake tools. In situations where deepfakes are used for cyberattacks, the use of verification tools is not always possible. In the case of telephone calls or video conferences, it is not easily possible to record and analyze the voices or video data.

Voice cloning showed fewer artifacts in the results. In order to clone the target person's voice, the AI model was fine-tuned with five minutes of audio recordings. ElevenLabs' platform facilitated the process and provided comprehensive options for both text-to-speech and speech-to-speech conversion. The text-to-speech function made it possible to convert text input into spoken language with the target voice, while speech-to-speech allowed spoken content to be played back directly with the target voice.

The generated synthetic voices sounded acoustically like the original voices - without artifacts. For an additional check, the generated voices were analyzed via the online platform Deepfake-Total. The platform identified the cloned voice as authentic with a probability of 85%, meaning that it was not classified as synthetic and therefore not as a voice clone. The original sound recordings were also classified as authentic with an 85% probability.

High potential for misuse of AI tools

The experiments on AI-supported phishing attacks and social engineering show the high potential for misuse of modern AI technologies in cybercrime. The use of tools such as WormGPT increases the effectiveness of phishing attacks, as targeted and authentic messages reflect urgency and trustworthiness more credibly. In addition, the AI tool facilitates by providing detailed instructions that are particularly helpful for people without experience in implementing phishing attacks.

The deepfake and voice cloning experiments have shown similar risks. Although creating deepfakes with publicly available tools does not provide optimal results, there is a risk that they can be used in social engineering attacks. To make the deepfakes look even more realistic, however, subsequent video editing and extensive knowledge of deepfake tools are required.

The cloned voices created by 'ElevenLabs' had no recognizable audio artifacts and escaped identification by Fraunhofer AISEC's Deepfake-Total analysis tool. Voice clones can be used to carry out authentic social engineering attacks because they are difficult to detect.

The integration of technologies such as deepfakes and voice clones in social engineering attacks demonstrates the need for improved security measures and increased vigilance among individuals and organizations. The increasing accessibility of AI models simplifies the creation of social engineering attacks. At the same time, existing tools designed to detect deepfakes are easily fooled.

Literature

[1] Alkhalil, Z.; Hewage, C.; Nawaf, L.; Khan, I.: Phishing Attacks: A Recent Comprehensive Study and a New Anatomy, 03/2021.
[2] Soni, B.; Gautam, A.; Dr. Soni, G.: Exploring the Advancements and Implications of Artificial Intelligence. International Journal of Scientific Research in Engineering and Management, 01/2023.
[3] Begou, N.; Vinoy, J.; Duda, A.; Korczynski, M.: Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT. IEEE Conference on Communications and Network Security (CNS), 2023. page 1-6.
[4] Sangwan, S.: GitHub. Accessed: July 2024.
[5] Mehta, P.; et.al.: Can Deepfakes be created on a whim? Accessed: 19.07.2024.

The authors:

Prof. Dr. Tobias Heer, Lukas Bechtel, Dieter Holstein and Nils Lohmiller are at Esslingen University of Applied Sciences.

  • Xing Icon
  • LinkedIn Icon
Advertisement
Back to topic page
Advertisement

You might also be interested in

Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Subscribe to our newsletter
Advertisement
Back to home