Normal view

Unveiling Hidden Connections: JA4 Client Fingerprinting on VirusTotal

18 October 2024 at 11:48
VirusTotal has incorporated a powerful new tool to fight against malware: JA4 client fingerprinting. This feature allows security researchers to track and identify malicious files based on the unique characteristics of their TLS client communications.

JA4: A More Robust Successor to JA3

JA4, developed by FoxIO, represents a significant advancement over the older JA3 fingerprinting method. JA3's effectiveness had been hampered by the increasing use of TLS extension randomization in https clients, which made fingerprints less consistent. JA4 was specifically designed to be resilient to this randomization, resulting in more stable and reliable fingerprints.

Unveiling the Secrets of the Client Hello

JA4 fingerprinting focuses on analyzing the TLS Client Hello packet, which is sent unencrypted from the client to the server at the start of a TLS connection. This packet contains a treasure trove of information that can uniquely identify the client application or its underlying TLS library. Some of the key elements extracted by JA4 include:
  • TLS Version: The version of TLS supported by the client.
  • Cipher Suites: The list of cryptographic algorithms the client can use.
  • TLS Extensions: Additional features and capabilities supported by the client.
  • ALPN (Application-Layer Protocol Negotiation): The application-level protocol, such as HTTP/2 or HTTP/3, that the client wants to use after the TLS handshake.

JA4 in Action: Pivoting and Hunting on VirusTotal

VirusTotal has integrated JA4 fingerprinting into its platform through the behavior_network file search modifier. This allows analysts to quickly discover relationships between files based on their JA4 fingerprints.

To find the JA4 value, navigate to the "behavior" section of the desired sample and locate the TLS subsection. In addition to JA4, you might also find JA3 or JA3S there.

Example Search: Let's say you've encountered a suspicious file that exhibits the JA4 fingerprint "t10d070600_c50f5591e341_1a3805c3aa63" during VirusTotal's behavioral analysis.

You can click on this JA4 to pivot using the search query behavior_network:t10d070600_c50f5591e341_1a3805c3aa63 finding other files with the same fingerprint This search will pivot you to additional samples that share the same JA4 fingerprint, suggesting they might be related. This could indicate that these files are part of the same malware family or share a common developer or simply share a common TLS library.

Wildcard Searches

To broaden your search, you can use wildcards within the JA4 hash. For instance, the search: behaviour_network:t13d190900_*_97f8aa674fd9

Returns files that match the JA4_A and JA4_C components of the JA4 hash while allowing for variations in the middle section, which often corresponds to the cipher suite. This technique is useful for identifying files that might use different ciphers but share other JA4 characteristics.

YARA Hunting Rules: Automating JA4-Based Detection

YARA hunting rules using the "vt" module can be written to automatically detect files based on their JA4 fingerprints. Here's an example of a YARA rule that targets a specific JA4 fingerprint:


This rules will flag any file submitted to VirusTotal that exhibits the matching JA4 fingerprint. The first example only matches "t12d190800_d83cc789557e_7af1ed941c26" during behavioral analysis. The second rule will match a regular expression /t10d070600_.*_1a3805c3aa63/, only matching JA4_A and JA4_C components, excluding the JA4_B cipher suite. These fingerprints could be linked to known malware, a suspicious application, or any TLS client behavior that is considered risky by security analysts.



JA4: Elevating Threat Hunting on VirusTotal

VirusTotal's adoption of JA4 client fingerprinting will provide users with an invaluable tool for dissecting and tracking TLS client behaviors, leading to enhanced threat hunting, pivoting, and more robust malware identification.

Happy Hunting.

Analyse, hunt and classify malware using .NET metadata

By: Bart
25 March 2024 at 20:13

Introduction

Earlier last week, I ran into a sample that turned out to be PureCrypter, a loader and obfuscator for all different kinds of malware such as Agent Tesla and RedLine. 

Upon further investigation, I developed Yara rules for the various stages, which can be found here (excluding the final payload):

With that out of the way, all of this reminded me of the fact that we can also write Yara rules for unique identifiers specific to malware written in .NET, or any other .NET assemblies for that matter.

A bit of history

This isn’t my first encounter with analysing .NET malware at scale: several years ago, I co-authored a presentation with Santiago on hunting SteamStealer malware, which was surging exponentially at the time (the malware intended to steal your Steam inventory items and/or your account). A huge thanks goes to Brian Wallace who had developed a tool at the time called GetNetGUIDs which made it trivial to extract all the GUID types and start clustering to identify patterns: basically, which of the malware samples are likely authored by the same person or belong to the same attack campaign.

.NET assemblies or binaries often contain all sorts of metadata, such as the internal assembly name and GUIDs, specifically; the MVID and TYPELIB.

  • GUID: Also known as the TYPELIB ID, generated when creating a new project.

  • MVID: Module Version ID, a unique identifier for a .NET module, generated at build time.

  • TYPELIB: the TYBELIB version – or number of the type library (think major & minor version).

These specific identifiers can be parsed with the strings command and a simple regular expression (regex): [a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}

Taking a sample of PureLogStealer posted by James_in_the_box, you could then write a Yara rule based on the MVID or Typelib detected.

As shown on VirusTotal for this sample:

A screen shot of a computer

Description automatically generated
Figure 1 - Sample with MVID 9066ee39-87f9-4468-9d70-b57c25f29a67

And the resulting (simple) Yara rule, could then be as follows:

rule PureLogStealer_GUID

{

strings:

$mvid = "9066ee39-87f9-4468-9d70-b57c25f29a67" ascii wide fullword

condition:

$mvid

}

There are however some issues with this: 

  • The MVID is stored as a binary value rather than a string, whereas the Typelib GUID is effectively stored as a string and since we only have the MVID here, the sample above will not be detected with this rule.

  • It is important to note that VirusTotal does not seem to report the Typelib.

  • It is cumbersome to “do it the manual way” with strings and regex, especially on larger data sets – and it’s prone to issues such as:

    • false positives: if you run "strings" on the sample and then use the following CyberChef recipe – we get plenty of GUIDs, but only 1 is the actual Typelib;

    • false negatives: we miss out on unique identifiers, which means we might miss detection of samples, campaigns or actors.

Note that with tools such as IlSpy or dnSpy(Ex), you can also view the Typelib GUID and MVID, however, not all tools display all data, for example:

A screenshot of a computer program

Description automatically generated
Figure 2 - dnSpy detects the Typelib GUID of the sample

And if we go the "oldschool" route using ildasm:

Figure 3 - ildasm displays the MVID or Module Version ID


For all the above reasons, let’s go beyond and do more: both with Yara, and with a new Python tool I’ve created.

The now and the tooling

Before we dive into the tooling, some final history to say that Yara has evolved and thanks to that, we can now hunt and detect more effectively due to the following modules added:

  • 2017: introduction of .NET module (link)

  • 2022: introduction of console module (link)

This means that using the .NET module, we can now write a Yara rule like so instead:

import "dotnet"

rule PureLogStealer_GUID

{

condition:

dotnet.guids[0]== "9066ee39-87f9-4468-9d70-b57c25f29a67"

}

And indeed:

Figure 4 - Yara now detects the sample

Yara rule

Let’s now leverage the power of Yara and its dotnet and console modules to write a new Yara rule that displays useful data of any given .NET sample that can be leveraged to create meaningful rules, for example: assembly name, typelib and MVID. 

A screenshot of a computer code

Description automatically generated
Figure 5 - Yara rule to display .NET information to the console

We first verify if the binary is a .NET compiled file, if so, log certain Portable Executable (PE) or binary information to the console as well, and then display all relevant .NET information.

And the output will be, again for the same sample:

A computer screen shot of a computer program

Description automatically generated
Figure 6 - Yara rule output: sample metadata!


Meaning we can now write a rule as follows:

import "dotnet"

rule PureLogStealer_GUID

{

condition:

dotnet.guids[0]=="9066ee39-87f9-4468-9d70-b57c25f29a67" or

dotnet.typelib=="856e9a70-148f-4705-9549-d69a57e669b0"

}

Python tool

But what if we want to run this on a large set of samples and produce statistics, which we can then use to hunt or classify malware families, or cluster campaigns?

A newly developed Python tool will help you do exactly just that. It supports both a single file as well as a whole folder of your samples or malware repository. It will skip over any non-.NET binary and simply report the typelib, MVID and typelib ID (if present, which is seldom the case and rarely useful).


If we run it on our single sample like before:

A computer code with white text

Description automatically generated
Figure 7 - New tool output on single sample


The tool (or script) has the following capabilities:

A screen shot of a computer program

Description automatically generated
Figure 8 - Run the tool with -h to display usage or help

You need Python 3, pythonnet and a compiled dnlib.dll in order for it to work.

You are of course not limited to just using the MVID or Typelib for .NET malware hunting: you can also use the assembly name and other features that could be unique, using either the Yara rule or the Python tool to extract the data you’d like.
Both the Yara rule and the Python tool are published on the following GitHub page: https://github.com/bartblaze/DotNet-MetaData 

I highly recommend to use the tool rather than the Yara rule, as it detects .NET metadata more reliably. Both Yara rule and Python tool can be adapted to display less or more information according to your needs. 


Clustering

Tracking attacker’s campaigns is always an exercise, and can be both fun and exhausting, depending on how many rabbit holes you (want to) go through. An example of clustering campaigns as well as malware developers was done in the work I did with Santiago as mentioned earlier, which resulted in the following graphics:

A screenshot of a graph

Description automatically generated
Figure 9 - Statistics from 2016 research (bonus obfuscation stats)


This was a pretty large dataset (1.300 samples!) and specific to SteamStealers at the time.

For our analysis purposes, I took 4 of the most current popular malware (that are .NET based or have at least a .NET variant) according to Any.run’s Malware Trends: https://any.run/malware-trends/. These are:

  • RedLine

  • Agent Tesla

  • Quasar

  • Pure*: basically anything related to PureCrypter, PureLogs, …

Downloading the latest available samples per family from MalwareBazaar, then running my DotNetMetadata Python script, and playing around with pandas and matplot, we can create the following graphs per family:



RedLine – 56 samples

A pie chart with colorful circles

Description automatically generated
Figure 10 - RedLine Typelib GUID frequency


A colorful circular chart with numbers and numbers

Description automatically generated
Figure 11 -RedLine MVID frequency


Agent Tesla – 140 samples

A pie chart with numbers and a number

Description automatically generated
Figure 12 - Agent Tesla Typelib GUID frequency



A circular pattern with different colors

Description automatically generated with medium confidence
Figure 13 -Agent Tesla MVID frequency





Quasar – 141 samples


A pie chart with colorful circles

Description automatically generated
Figure 14 - Quasar Typelib GUID frequency



A pie chart with different colored circles

Description automatically generated
Figure 15 -Quasar MVID frequency




Pure* family - 194 samples 


A diagram of a pie chart

Description automatically generated
Figure 16 - Pure* Typelib GUID frequency



A circular pattern with different colors

Description automatically generated with medium confidence
Figure 17 -Pure* MVID frequency




While these piecharts are certainly hypnotic and display the frequency - or occurrence of the same typelib or MVID, we can also leverage these and create meaningful Yara rules for clustering samples per family, especially in the case of Quasar, the MVID with GUID "60f5dce2-4de4-4c86-aa69-383ebe2f504c" appears like a good candidate.

You might think that while these charts look visually appealing (depending on your art preferences), they may not be particularly useful because they don't scale well with larger datasets. You’re exactly right! By limiting the amount of results displayed, we can indeed produce even better results. In our sample dataset for the 4 malware families above, so a total of 531 samples, let’s run our visualisations again and now we will:

  • Run it on the whole sample set

  • Extract the assembly name

  • List only the top 10 of assembly names

  • Use a bar chart instead of a pie


And the result:

A bar chart with blue squares

Description automatically generated
Figure 18 - Assembly name frequency - looking better right?

The top 3 is then:

  • “Client”: Quasar family

  • “Product Design 1”: Pure family

  • “Sample Design 1”: Pure family

Client is likely the default assembly name when compiling the Quasar malware (project), and Product Design and Sample Design are likely default assembly names from the PureCrypter builder. 

If we then want to write a Yara rule for Quasar based on the default assembly name:

import "dotnet"

rule Quasar_AssemblyName

{

condition:

dotnet.assembly.name == "Client"

}


But why stop there? We can build a Yara rule to classify our malware dataset or repository:

import "dotnet"

import "console"

rule DotNet_Malware_Classifier

{

condition:

(dotnet.assembly.name == "Client" and console.log(“Likely Quasar, assembly name: ", dotnet.assembly.name)) or

(dotnet.assembly.name == "Product Design 1" and console.log("Likely Pure family, assembly name: ", dotnet.assembly.name)) or

(dotnet.assembly.name == "Sample Design 1" and console.log("Likely Pure family, assembly name: ", dotnet.assembly.name))

}


And we run this new Yara rule on the combined samples of the Pure family and Quasar:

A screenshot of a computer

Description automatically generated
Figure 19 - Simple "malware classifier"


We can combine sets of Yara rules bases on assembly name, Typelib, MVID and so on to create rules with a higher confidence, and we can use this in further hunting, classification and... much more. 


Bonus

If you’ve made it this far, it only makes sense to add in an additional extra use-case for all of this: finding new crypters or obfuscators! 

When I ran the script on the +500 samples, there was 1 assembly / binary that stood out:

A cartoon of a bathtub

Description automatically generated
Figure 20 - Potential new crypter "Cronos"

Making a simple Yara rule again:

import "dotnet"

rule cronos_crypter

{

strings:

$cronos = "Cronos-Crypter" ascii wide nocase

condition:

dotnet.is_dotnet and $cronos

}


Running this on the Unpac.me dataset yields:

A screenshot of a computer

Description automatically generated
Figure 21 - Unpac.me Yara hunt results


4 matches in 12 weeks: it appears this crypter is not popular (yet): 2 Async RAT samples and 2 PovertyStealer samples have used it so far. 


Bonus on Bonus


Let’s go with a final bonus round: improving the previous “classification” rule by also reviewing results for Async RAT. Seeing the previous crypter was used on at least 2 Async RAT samples, I wanted to see some statistics for this malware as well, for just the assembly name. This results in the following, based on 86 samples:

A pie chart with different colored circles

Description automatically generated
Figure 22 - Another pie chart: AsyncRat top used assembly names

 

Jumping out are the following assembly names:

  • AsyncClient

  • Client --> Also seen in Quasar!

  • XClient

  • Output

  • Loader

  • Stub


AsyncClient is likely the default name when building the Async RAT project. But we are interested in widening the net: from the previous rule DotNet_Malware_Classifier, let’s update it with these new “generic” or default assembly names:


import "dotnet"

import "console"

rule DotNet_Malware_Classifier

{

condition:

(dotnet.assembly.name == "Client" and console.log("Suspicious assembly name: ", dotnet.assembly.name)) or

(dotnet.assembly.name == "Output" and console.log("Suspicious assembly name: ", dotnet.assembly.name)) or

(dotnet.assembly.name == "Loader" and console.log("Suspicious assembly name: ", dotnet.assembly.name)) or

(dotnet.assembly.name == "Stub" and console.log("Suspicious assembly name: ", dotnet.assembly.name))

}




A screenshot of a computer

Description automatically generated
Figure 23 - Classifier Yara rule results


Conclusion

In this blog post, two new tools were presented to extract metadata from .NET malware samples. Specifically, we can now reliably extract 2 unique GUIDs: the Typelib and the MVID.

The Python script is capable of extracting the desired data from a large set of .NET assemblies, whereas the Yara rule is tailored for use with one particular sample. Of course, either of them can be used interchangeably: you can still fine-tune the Yara rule for a large set and work this way if you don’t want to rely on an external script. Similarly, the script can be extended to extract more data to be used.

Based on the output of these tools, you can then create Yara hunting rules, combine it with your existing rule sets, or use them in an attempt to classify malware families or specific attack campaigns.

Some closing remarks:

  • GUIDs could be spoofed or even removed. No method is 100% reliable.

  • However, this method can enhance already existing rulesets, especially those where .NET obfuscators (e.g. SmartAssembly) obfuscate (user) strings, modules and more, making it harder to write Yara rules for a malware family. Detecting based on GUID however, can work regardless of obfuscation method.

  • That said, obfuscating or deobfuscating may also alter the GUIDs. Keep this in mind when creating your detection rules based on an original or unpacked/deobfuscated sample.

  • If you encounter a GUID comprised entirely of zeros, such as 00000000-0000-0000-0000-000000000000, avoid using it for hunting since it's an empty GUID. This indicates the value may not be set or has been altered. This would make for a poor hunting rule as it can be a default value for any .NET project.

  • You can also use this methodology and tooling for .NET assemblies that are not malicious: extract developer information and other metadata per your use case or purpose.

    The Python tool in addition, just as the Yara rule, allows for analysing, classifying and hunting on much more .NET (meta)data.

     

Happy .NET hunting! You can find the tools and some of the example Yara rules in the repository: https://github.com/bartblaze/DotNet-MetaData 

As always, feedback is welcomed.


Fara: Faux YARA

By: Bart
4 December 2023 at 20:09

FARA, or Faux YARA, is a simple repository that contains a set of purposefully erroneous Yara rules. It is meant as a training vehicle for new security analysts, those that are new to Yara and even Yara veterans that want to keep their rule writing (and debugging) sharp.


Example "faux" rule


Find it over on Github:

https://github.com/bartblaze/FARA 


Yara rules collection

By: Bart
10 December 2022 at 16:20

Quite a while ago, I've published some of my private Yara rules online, on Github.

They can be found here:

https://github.com/bartblaze/Yara-rules

There's two workflows running on that Github repository:

  • YARA-CI: runs automatically to detect signature errors, as well as false positives and negatives.
  • Package Yara rules: allows download of a complete rules file (all Yara rules from this repo in one file) for convenience from the Actions tab > Artifacts (see image below).

image

The Yara rules are divided into:

  • APT
  • Crimeware
  • Generic
  • Hacktools
  • Ransomware

Furthermore, the rules can work natively with AssemblyLine due to the CCCS Yara rule standard adoption.

PR's are welcome where you see fit. 

Avoiding Memory Scanners

Kyle Avery // Introduction This post compliments a presentation I gave at DEF CON 30 – “Avoiding Memory Scanners: Customizing Malware to Evade YARA, PE-sieve, and More,” which included the […]

The post Avoiding Memory Scanners appeared first on Black Hills Information Security, Inc..

❌