rss-bridge 2025-12-08T00:00:00+00:00

AI-Automated Threat Hunting Brings GhostPenguin Out of the Shadows

In this blog entry, Trend™ Research provides a comprehensive breakdown of GhostPenguin, a previously undocumented Linux backdoor with low detection rates that was discovered through AI-powered threat hunting and in-depth malware analysis.

Cyber Threats

AI-Automated Threat Hunting Brings GhostPenguin Out of the Shadows

By: Aliakbar Zahravi

Dec 08, 2025

Read time: ( words)

Save to Folio

Key takeaways

GhostPenguin is a multi-threaded Linux backdoor written in C++ that provides remote shell access and comprehensive file system operations over an RC5-encrypted UDP channel. It establishes communication through a structured session handshake mechanism and synchronizes multiple threads to handle registration, heartbeat signaling, and reliable command delivery.

GhostPenguin was discovered using Trend™ Research’s AI-driven, automated threat hunting pipeline that collected and analyzed zero-detection Linux samples from VirusTotal. The investigation involved building a structured database of extracted artifacts, using AI to automate profiling, and employing VirusTotal hunting queries to surface zero-detection samples for deeper analysis.

This approach allowed artifacts to be extracted from thousands of malware samples, generated structured profiles, and used custom YARA rules and VirusTotal queries to surface undetected threats like GhostPenguin.

Our analysis showed GhostPenguin is still in development, with debug artifacts and unused functions, highlighting the importance of advanced AI and automation in uncovering sophisticated, evasive threats.

Trend Vision One™ detects and blocks the specific indicators of compromise (IoCs) mentioned in this blog entry, and offers customers access to hunting queries, threat insights, and intelligence reports related to the GhostPenguin backdoor.

Hunting high-impact, advanced malware is a difficult task. It becomes even harder and more time-consuming when defenders focus on low-detection or zero-detection samples. Every day, a huge number of files are sent to platforms like VirusTotal, and the relevant ones often get lost in all that noise. Identifying malware with low or no detections is a particularly challenging process, especially when the malware is new, undocumented, and built largely from scratch. When threat actors avoid publicly available libraries, known GitHub code, or code borrowed from other malware families, they create previously unseen samples that can evade detection and make hunting them significantly harder.

In these cases, the threat actors carefully craft both the code and the network communication to minimize noise and keep the malware as inconspicuous as possible. They often use multi-stage architectures and secure communication channels that do not reveal subsequent stages unless the communication sequence unfolds exactly as expected. As a result, only a very small amount of data is transferred between the infected host and the command-and-control (C&C) server, further complicating detection and analysis.

Previously, Trend™ Research reported on the effects of offensive GitHub projects and open-source red-teaming tools on modern malware development ecosystem, and how defenders can use this as a chance to improve detection patterns and their overall approach for threat hunting. Our analysis also showed how artificial intelligence (AI) and automation can speed up and improve the accuracy of detection when a new malware family is created and shares code from those open-source repositories.

In this blog entry, we demonstrate how AI can be utilized to find low-detection samples from VirusTotal and how this was used to analyze the GhostPenguin Linux backdoor.

Threat hunting approach

Our approach focused on collecting, processing, and analyzing a large number of malware samples from known and reported attacks. The goal was to extract useful artifacts that help hunt for new, undetected threats.

Hunting workflow

1. Collect and extract artifacts

We gather many malware samples from known and reported attacks and extract key information from them such as strings, API calls, behaviors, function names, variable names, and constants. All collected data is stored in a structured database. Afterwards, we tag and categorize the samples so they are easier to search and compare.

2. Build VirusTotal hunting queries

Using the extracted artifacts, we create VirusTotal hunting rules and run them against samples with zero detections. When we find potential candidates, pass the samples to the profiling stage.

3. Profiling and analysis

Binary files are sent to IDA Pro (Hex-Rays) for decompilation and further artifact extraction. CAPA also utilized to identify specific capabilities (A custom rule has been generated based on the artifacts collected during Stage 1). Non-binary files like scripts or code are passed directly to the profiler for feature extraction. The profiler subsequently generates a unified profile in JSON format for each file, which is then forwarded to the next stage of analysis.

The AI agent Quick Inspect reviews the JSON profile created during the profiling stage. It analyzes the artifact, scores it, and determines if the file is malicious or not. Files below the threshold go into a monitoring list for later review, while files above the threshold tagged as malicious and move to the next stage.

The Deep Inspector agent performs a deeper analysis on files that pass the threshold and are tagged as malicious. It generates a detailed analysis report for the file based on the decompiled code and the metadata created during the profiling stage. The agent reviews the file profile and produces a code-analysis report that includes:

A short summary

Identified capabilities

Code execution flow

Technical analysis

MITRE ATT&CK framework mapping

We used this pipeline to hunt for a VirusTotal zero-detection sample that we named GhostPenguin. The sample was submitted on July 7, 2025, and remained undetected in VirusTotal for more than four months.

If a file is packed or obfuscated, the YARA scanner and AI model usually detect this and tags it. If you have automated scripts for unpacking, you can set up an MCP server that can route these files to your unpacking pipeline for dynamic, static, or manual unpacking. Simple obfuscation and unpacking process can often be handled directly by AI (by a AI resolver or AI generating script for deobfuscation/unpacking), but heavy or complex obfuscation should be processed by external automation, custom scripts or manual efforts.**

Phase 1

In this phase, in which we first need to gather as much intelligence as possible before we can hunt new and unknown threats, we built a structured database and populated it with detailed information about each sample. The database stores file metadata, category, tags, capabilities, MITRE techniques, strings, and Malware Behavior Catalog (MBC) behaviors of collected malware samples. This database is extremely valuable, as it can be used for AI model fine-tuning, context-based AI search, RAG workflows, building a knowledge base, malware similarity matching, APT attribution, and more.

We began by defining the main categories for our hunting workflow, using Google Magika to help classify files automatically:

Platform categories

Windows

Linux

MacOS

File types

Binary

Script

[...]

Original source