Reading view

DEW #145 - Modified Z-Score for Anomaly Detection, Watermarking for Audit Logs -> SIEM and Zack gives you all an RFC for homework

11 February 2026 at 14:02

Welcome to Issue #145 of Detection Engineering Weekly!

Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

✍️ Musings from the life of Zack:

I’ve been tinkering a ton with Anthropic’s Opus 4.6, and the agentic swarm mode is gratifying and terrifying to watch in action. I recommend trying it out!
My life the last two weeks have been sickness and travel. I got COVID before my office visit trip in NY (I went in negative!), came home, got a sinus infection 2 days later and I’m sitting here writing this with a fever. Go figure.
For those who watched the Superbowl: When the Patriots lose, America wins.

Sponsor: runZero

Master KEV Prioritization with Evidence-Based Intelligence
The CISA KEV Catalog tells you what to patch, but not how urgently or why it matters to your environment. 68% of KEV entries need additional context to prioritize effectively, yet most teams patch in order without understanding true operational risk.
A new KEVology report by former CISA KEV Section Chief Tod Beardsley reveals what KEV entries actually mean for defenders. Plus, the free KEV Collider tool from runZero helps you prioritize based on evidence, not assumptions.
Get The Report

💎 Detection Engineering Gem 💎

The Detection Engineering Baseline: Hypothesis and Structure (Part 1) by Brandon Lyons

Baselining is an overused term in this field because, at least in my experience, it’s a hand-wavy marketing term. You’ll read about a product that’ll perform baselines of your behavior and environment, and it’ll alert you if it detects something abnormal or outside that baseline. In practice, this works, but the opaqueness of some of these methods makes it hard to understand how it happens.

This is why posts like Lyons help cut through the opaqueness and show the receipts of how to do this in practice. And to be honest, it’s nothing groundbreaking, only in the sense that the concepts Lyons proposes here are part of entry-level statistics literacy. Which is why I’m pretty opinionated on the engineer of detection engineer. Don’t get it twisted: although the concepts in this post are entry-level statistics, understanding the application requires deep security expertise.

Lyons lays out a 7-step, repeatable process to establish a detection baseline, quoted here:

Backtesting of rule logic: Validate your detection against historical data before deploying
Codified thought process: Document why you chose specific thresholds and methods
Historical context: Capture what your environment looked like when the baseline was created
Reproducible process: Enable re-running when tuning or validating detection logic
Foundation for the ADS: Feed directly into your Alerting Detection Strategy documentation
Cross-team collaboration fuel: Surface insecure patterns and workflows with data-backed evidence
Threat hunting runway: When alert precision isn’t achievable, convert the baseline into a scheduled hunt

This process succinctly captures a well-thought-out detection process. Without data, how can anyone possibly deploy detections that will fire? Without context around that data, how can anyone possibly believe the rules that are firing outside of the baseline?

They step through the 7 steps here using a CloudTrail API example. Basically, Lyons tries to map out what anomalous behavior looks like for CloudTrail access across an environment. The statistics section focuses on a modified Z-Score. Here’s the rundown:

Security metrics (API calls per day, login attempts per hour, file accesses) approximate a normal distribution (a bell curve), especially when aggregated over time. This means that:

Most values cluster around the median (middle value)
Extreme values become increasingly rare as you move away from the center
The distribution is symmetric

To establish a baseline, Lyons collects historical data, such as 30 days of activity, and computes two key statistics:

Median - the middle value
MAD (Median Absolute Deviation) - measures spread around the median

When a new value enters your queue, you compute the Modified Z-score, which is the distance-via-standard-deviation of that value from the median. Modified Z-score is really good at capturing outliers, versus the regular Z-score, which focuses on standard deviations from the mean, and can be sensitive to outliers.

An outlier can be, according to Lyons, creating administrative credentials at 3am to an abnormal amount of S3 bucket accesses, perhaps used for exfiltration. Here’s a graphic I prompted Claude to create to drive this point home:

If my stats professor put normal distribution computation problems in the context of finding russian threat actors, I probably would have aced the class

This type of rigor removes the guessing game about whether events are absolute measurements. Is 1000 API calls weird, or is 100? Is 10 pm an acceptable window for Administrator access, or is 5 pm? By looking at the standard deviations away from the median, you focus on relative measurement. It removes the human judgment about the absolute weirdness of an event, and whenever you remove a human from a large data problem, you get a bit closer to sanity.

Lyons created a follow-along Jupyter notebook with synthetic data to recreate the measurements in his blog. I’ll link that repository below in the Open Source section!

🔬 State of the Art

Building a Production-Ready Snowflake Audit Log Pipeline to S3 by xcal

Centralizing logs to your SIEM is a full-time endeavor, and requires expertise in so many areas, such as:

Data formats of the logs you are extracting, transforming, and loading into the SIEM
Telemetry source peculiarities, such as APIs, subsystems on hosts, or weird licensing issues
Choosing a technology stack that can normalize logs and send them into the SIEM
Navigating technological barriers due to inherent design choices, especially between data lakes or SaaS products

This is why I really enjoyed reading this post about moving audit log data from Snowflake into a SIEM. It focuses on the software engineering component of detection engineering, because many of the design choices made inside this post are things that you’ll hear about on a Software Engineering interview.

The first half of this blog details the design choices behind moving data from Snowflake to S3 and then to a SIEM, with clear architectural “gotchas” you need to design around. The most interesting one to me is the watermark strategy.

Snowflake audit logs have built-in latency. An event can occur at 12:00, but the audit log does not appear until 12:03. You use a watermark to pull the oldest events up to the last event you saw. For example, a watermark of 12:00 means you processed events up to 11:59. This watermark doesn’t work if you focus only on the timestamp generated, so you try to use it to focus on what you’ve observed.

In the purple example, 3 export runs for logs came in, and the watermark is updated based on the export time. When the “late arrival” log comes in, the watermark is later than the data's arrival time, so the log is lost forever. In the second yellow example, this is fixed by looking at the maximum observed time in the logs, not at the time the export is run.

What’s beautiful about this blog, too, is how it sets up a “configuration-as-data” design pattern. They use a statically stored procedure for the export logic and a table that maps the target View, such as SESSION or LOGIN, to the timestamp used to perform the watermark.

This design choice makes it easy to add more views, VIEW_NAME, specify a target timestamp, TS_COLUMN_NAME, then store the watermark in LAST_TS. A singular INSERT into the EXPORT_WATERMARK table adds additional Audit logs views to export, without changing the code.

Detection Rule Fragility: Design Pitfalls Every Detection Engineer Must Know by SOCLabs

Detection rule fragility occurs when your rules become too precise for a single detection scenario and miss variants that achieve the same outcome. In this post, SOCLabs details several “gotcha” scenarios on the command line where classic detection on strings can be circumvented by operating-system-level trickery.

My favorite examples they list involve URL detection with cURL. There’s something about the concept of URL parsing that is so fascinating on the operating system level, because it’s a little known attack path that can have some hilarious results. For example, if you want some light reading, check out RFC3986 - Uniform Resource Identifier (URI): Generic Syntax.

Let’s say you write a rule to detect a local IP address, such as http://192.168.x.x Your operating system and browser parses it, and can navigate to it, so you write a rule to detect local subnet usage in cURL. But you can also write http://192.168. as hex, http://0xC0.0xA, or even octal, http://0300.0250. So, did you write a rule for those? :)

How I Use LLMs for Security Work by Josh Rickard

This is a cool, battle-tested approach by Rickard for prompting an LLM to do security work. I think people can become overwhelmed by what to prompt an LLM, because they are generally really good at taking vanilla prompt sessions and running with whatever work you assign them. But, as your work gets more complex, there are some nifty strategies you can use, and Rickard lays out, to make the best use of what they have to offer.

Giving context is probably the biggest takeaway here, so Rickard describes the concept of role-stacking, explains your technology stack, clarifies the current understanding of the ask, and gives it time to execute the ask.

What AI Really Looks Like Inside the SOC: Notes from a Fireside Chat by Daniel Santiago

In this post, Santiago shares his notes around a SOC fireside chat they attended during a Simply Cyber event. The cool part of his synopsis was seeing the “ground reality” of AI working and not working in a SOC environment. Most of the insights aren’t surprising to me, but it’s good to hear it validate some of our feelings. For example, Santiago points out how these agents raise the baseline for analysts, rather than replace them.

☣️ Threat Landscape

Beyond the Battlefield: Threats to the Defense Industrial Base by Google Threat Intelligence Group (GTIG)

The GTIG group published a large survey of threats they are tracking against Defense firms and organizations, such as contractors, critical infrastructure and government entities. They have four large takeaways and specify which threat actor groups are part of these takeaways:

Targeting of critical infrastructure by Russian-nexus threat actor groups to introduce physical and security effects
Hiring of fake IT Workers and DPRK’s focus on espionage using IT workers and malware campaigns
China-nexus threat actors representing the largest campaigns targeting these sectors by volume
An uptick of data leak sites and extortion groups against manufacturing firms that may supply the defense industrial base

VoidLink: Dissecting an AI-Generated C2 Implant by Rhys Downing

VoidLink is a post-exploitation and implant framework that focuses on cloud-native infrastructure. It was in the headlines around a month ago, and the main headline was that it was likely LLM-generated. Downing pulled apart the payloads and tried to confirm this finding, so it’s nice to see proof rather than believing the hype. The fun part is that within the binary, several clues suggested it was LLM-generated, primarily in the code comments.

According to Downing, and I tend to agree here, adding comments to your malware seems like a rookie move because you want operational security and anti-research capabilities, so this likely suggests it’s LLM-generated and the operators were careless.

New Clickfix variant ‘CrashFix’ deploying Python Remote Access Trojan by Microsoft Defender Security Research Team

Microsoft Security Research uncovered a new style of ClickFix social engineering techniques, dubbed CrashFix. When a victim is funneled to the malicious site, they are tricked to thinking their computer is crashing, and are directed to run the malicious payload.

this screams the age-old Runescape scam of “LET ME HOLD YOUR GOLD FOR YOU REAL QUICK”

The rest of the campaign is well-researched, but nothing particularly different from other ClickFix and infostealer campaigns. I imagine we’ll continue to see these social engineering threats evolve until we blow up command-line access for people and move to something else. Perhaps Claude Cowork social engineering?

Malicious use of virtual machine infrastructure by Sophos Counter Threat Unit Research Team

This piece by the Sophos Threat Research Team began with a security incident in which they uncovered attacker infrastructure with unique Windows hostnames. When the team dug into these hostnames, they found they were out-of-the-box names from a legitimate IT provider, ISPSystem. At first, it seemed like a single actor was leveraging ISPSystem to quickly deploy infrastructure, but when the team pivoted to Shodan, they found several thousand instances of ISPSystem infrastructure in use across many different malware campaigns.

Windows hostnames are a cool pivot that I haven’t really seen much of in my years of threat research. This worked in Sophos’ favor because it’s virtual machine software that offers some ease of use for several threat actor groups.

ClawdBot Skills Just Ganked Your Crypto by Open Source Malware

This ClawdBot malware post is a little different from the VirusTotal one I posted last week, mostly because it shows some of the conversations to the creator of ClawdBot on X on removing them. Hint: it doesn’t look good, and you should avoid using these skills registries until they get much better security and governance practices in place.

Peter Steinberger admits he can't secure ClawHub — we need to deploy an army of OpenClaw agents to battle OpenClaw agents that are malicious or zombies

🔗 Open Source

Btlyons1/Detection-Engineering-Baseline

Link to Brandon Lyon’s modified Z-score lab listed above in the Gem. Contains a Jupyter notebook to help readers follow along, as well as loads of synthetic data to try out the detections.

moltenbit/NotepadPlusPlus-Attack-Triage

PowerShell cmdlet to test if you ran a compromised version of NotepadPlusPlus from their incident announcement last week. It checks known IOCs, so it’s not a guarantee that they are still relevant or that a clean run means you weren’t compromised.

S1lkys/PhantomFS

This is a clever technique that abuses Windows ProjFS. ProjFS allows processes to project filesystems based on several attributes, so it’s used for things like OneDrive where you connect out to a drive hosted on a cloud provider. S1lkys built this in a way that it’ll project an encrypted payload, like Mimikatz, if it detects a source process coming from the command line versus EDR tools.

wardgate/wardgate

Wardgate is an Agentic proxy that stores secrets and API keys on your agent’s behalf. The idea here is that the Agent is aware it has API access to some external service, you have it use Wardgate, and Wardgate will serve as the API proxy. This is especially helpful if you are afraid of attacks on Agents that steal local or cached credentials.

praetorian-inc/augustus

August is an LLM penetration testing harness that integrates with dozens of LLMs. It has hundreds of attacks in 47 attack categories that you can let loose on models you are using from foundational labs, or some that you are training on top of the foundational models.

DEW #144 - Pyramid of Permanence and 🦞OpenClaw 🦞 Security Dumpster Fires

Detection Engineering Weekly

By: Zack Allen

4 February 2026 at 14:03

Welcome to Issue #144 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I’m in beautiful New York City this week, and finally made the move to get a hotel away from Times Square. Best decision ever, even if you are in Manhattan, anywhere is quieter than Times Square
I got OpenClaw up and running, and made a Moltbook account with it. This issue is also heavy on OpenClaw security because it’s a dumpster fire
I flew to my hometown and it was colder than New England and New York. The jet bridge at our arrival gate was frozen to the ground, and they spent 30 mins trying to get it moving. We eventually moved to a different jet bridge

Sponsor: Adaptive Security

Stop Deepfake Phishing Before It Tricks Your Team
Today’s phishing attacks involve AI voices, videos, and deepfakes of executives.
Adaptive is the security awareness platform built to stop AI-powered social engineering.
Protect your team with:
AI-driven risk scoring that reveals what attackers can learn from public data
Deepfake attack simulations featuring your executives
Take a Free Self-Guided Tour

💎 Detection Engineering Gem 💎

TTPI’s: Extending the Classic Model by Andrew VanVleet

Tactics, Techniques & Procedures (TTPs) is a table-stakes term in our industry. It binds our understanding of attacker behavior into a common lexicon. Within this lexicon, MITRE ATT&CK reigns supreme, and they have some generally agreed-upon definitions within their ATT&CK FAQ. Basically, in order to understand MITRE ATT&CK, you have to understand their nomenclature of TTPs, where:

Tactics describe an adversarial objective, such as initial access
Techniques describe how an attacker can execute some operation to achieve that objective
Procedures describe the implementation details of a technique in a given environment

In this post, VanVleet challenges this model because the specific details of how an attack is carried out at the Procedure level can sometimes be vague. I think this is by design on MITRE’s part, because the procedure to achieve it can differ depending on the environmental context I mentioned earlier. He makes the analogy that Procedures are like a cake, not necessarily a recipe. He proposes the concept of Instance, which is the recipe itself, to achieve that procedure.

ATT&CK does get close to this via Detection Strategies. As an example, VanVleet looks at T1070.001, Indicator Removal: Clear Windows Event Logs. The MITRE page includes a description of how this can be achieved, but it seems high-level enough that some more detail on the recipe would be helpful. The detection strategy can provide more clues from an event-ID perspective, but without the technical implementation, it may be hard to recreate and test. Here’s his idea of what an Instance section could look like:

This could be helpful for detection engineers who want to recreate the attack in their own environment to test their telemetry generation and detection rules.

I’ve always had a hard time with the Pyramid of Pain for this exact reason. The “TTPs” part at the top of the Pyramid can encapsulate so much work, without any ability to reverse-engineer how the attack is captured. In fact, I’ve always thought TTPs/Tools should be combined, because almost every Procedure contains some level of tooling to capture the attack.

In the spirit of alliteration, and perhaps more as a thought exercise, he proposes the “Pyramid of Permanence”.

Basically, Procedures are what we want to capture, and everything below the tip of the Pyramid are Instances that supports the procedure. It’s an interesting thought experiment, and as long as it serves as a lexicon to drive the conversation on better modeling, I’m all for it.

🔬 State of the Art

The story of the 5-minute-long endpoint by Leónidas Neftalí González Campos

This is more software engineering-related, but I sometimes come across blogs where I can see how security analysts and software engineers alike can commiserate working in a bureaucracy. Campos is a software engineer working on a customer appointment management product, and a JIRA ticket came in reporting that a simple task of uploading customers started crashing on “large” uploads. They took the ticket, found a terrible pattern within their software base that tried to upload one user at a time, and deployed a fix in record time.

This is a story of how many bad small decisions and only shipping new features can lead to a monstrosity of an issue. My takeaway here for all my security readers is to challenge governance around your security operations, because optimizing decisions around a cool technology or an isolated problem can lead to a lot of heartache and burnout.

OpenClaw Observatory Report #1: Adversarial Agent Interaction & Defense Protocols by Udit Raj Akhouri

OpenClaw is the new hotness right now, and as expected, security researchers are running to poke holes in it, both from an architectural security perspective and, in this case, security agent efficacy. I thought this was a unique pentesting report, where Akhouri set up a red team/blue team exercise to test the blue team’s ability to prevent abuse of the Blue team’s Lethal Trifecta trust relationships. In the first scenario, the red team agent sends a “help” threat detection template to set up a CI/CD project for detection testing. Within that CI/CD pipeline, a malicious cURL command and a bash script would download a payload and infect the blue team. In the second scenario, they tried something similar with a JSON template injection payload.

Openclaw caught the first attack and, according to Akhouri, is awaiting an analysis from the blue team agent on the second attack. I’m not too surprised that the blue team agent caught these types of attacks, but it goes to show how important it is to have emerging technologies and agent orchestration platforms undergo security testing to see how well they handle these scenarios.

Work travel means more podcasts, and it was great to dive back in with Jack Naglieri’s detection engineering-focused podcast, Detection at Scale. In this episode, Jack interviews Ryan Glynn from Compass and picks his brain on the use of LLMs in his day-to-day work as a staff security engineer.

I appreciated the grounding of the LLM hype Glynn makes and what works and doesn’t work. At the beginning of the episode, he makes a great point about using LLMs to make binary decisions as an investigation technique. Basically, it’s much easier to look at a yes versus a no for an alert investigation and challenge its assumptions than to try to solve a lot of components at once.

He also shared his experience evaluating AI SOC vendors and how hard it was to understand their efficacy. For example, when an AI SOC agent can say whether an alert is being or malicious, it’ll at times make up steps along the way that never happened.

Glynns phishing detection setup was super interesting. He compared and contrasted the agony of training ML models for phishing before the advent of LLMs, where you’d need to set up various binary classification and entity extraction capabilities to achieve that binary feature. Now, you can still arrive at that binary feature and use more traditional models, but you use the LLM to generate the flag. It uses the LLM as a feature-extraction tool rather than a hegemonic security tool.

👊 Quick Hits

Precision & Recall in Detection Engineering by rootxover

It’s cool to see how others interpret the concepts of precision & recall within their own detection writing. In this post, RootXover covers the concepts in the context of detection engineering and provides an example of how to compute them in a phishing alert scenario. I liked their graph of the four “zones” of labels for detections:

Alert Storm: low precision, high recall
Detection Purgatory: low precision, low recall
Quiet but Risky: high precision, low recall
Dream Zone: high precision, high recall

I will say, it’s rare that I’ve ever seen the “Dream Zone” in my career. There’s a natural relationship between precision and recall where, in general, as one increases, the other decreases.

Task Management for Agentic Coding by Jimmy Vo

Friend of the newsletter, Jimmy Vo, dives into Anthropic’s task management framework, to-dos, but now called “tasks”. This isn’t a cybersecurity post, but I think the content is important if you are starting to leverage Claude Code to manage task and todo lists. The obvious example of using tasks is alert triage, but I think it’s important for any security person to have a system for managing how they do work. Jimmy uses gardening tasks as an example, but it was cool to see how Claude can create the tasks, dependency graphs, and build a plan to achieve whatever task he issues.

☣️ Threat Landscape

I’m back on my Three Buddy Problem listening sprees, but this one was SO good to listen to just for the commentary on the wiper attack against Poland. The gang dives deep into a Polish CERT Report where a Russian APT targeted 30 wind and solar farms, as well as a power plant, and issued a wiper attack to essentially shut them down. Of note, it’s the dead of winter in December in Poland, and this heat and power outage threatened nearly half a million people.

The key argument here is how the reliance on Fortinet leads to these attacks. These appliances are notoriously bad at preventing exploitation due to poor coding practices. But if you want additional security support, you have to pay for services, since they don’t allow any forensic access to the devices.

Notepad++ Hijacked by State-Sponsored Hackers by Notepad++

Notepad++’s update servers were compromised from June 2025 to September 2025, according to Notepad++. Chinese-nexus actors allegedly compromised Notepad++’s hosting provider, leading them to redirect update traffic for downstream compromise. The specific language that the blog author used was that the “Shared Hosting Server” was compromised. It’s hard to say what the difference is between “shared” and their “hosting server”.

Did the APT find a way onto the shared server, escalate privileges, and laterally move to Notepad++? Or is this just semantics about using a VPS, and was Notepad++ specifically targeted? I’d be much more interested in the technical details of the former.

No Place Like Home Network: Disrupting the World's Largest Residential Proxy Network by Google Threat Intelligence Group (GTIG)

GTIG disrupted and tookdown a massive residential proxy network, IPIDEA. Residential proxy networks are akin to what Google calls Operational Relay Boxes (ORBs), but with a specific commercial application: you can “rent” exit points from unaware victims.

These networks operationalize their proxies by providing SDKs to mobile app providers that enroll devices into their networks. The mobile apps essentially get a cut of their profits, and IPIDEA sells access to these mobile phones for threat actors to abuse. This is especially helpful if you want to perform credential-stuffing attacks, ticket-scalping campaigns, or something more malicious, such as hiding C2 servers.

The report contains all kinds of technical details in how IPIDEA orchestrated their network of residential proxies. It operates like a command and control network, which is what makes it hard for me to understand any type of legitimate use of these services.

OpenClaw in the Wild: Mapping the Public Exposure of a Viral AI Assistant by Silas Cutler

Threat Researcher G.O.A.T. (and my undergrad classmate!) Silas Cutler released a post in which he scanned and found OpenClaw instances exposed on the Internet. If you haven’t heard of OpenClaw, it’s an autonomous AI agent that took the Internet by storm due to its ability to connect to apps you own, such as your Brave Browser or 1Password, to do work on your behalf. It became especially popular with the advent of Moltbook, where these agents were given the ability to post on a Reddit-like site without any interaction from the owner.

When you start OpenClaw, you can use the CLI or a web server. So when searching for its default port on Censys, Silas found over 21,000 instances of OpenClaw exposed on the Internet. Most of these should be secured through a secret password or token, but it’s still worrying in the sense that due to its popularity, people will try to find ways to exploit these instances. And if they get on these instances, they’ll use the interface to abuse the integrations and extract everything, including passwords and email contents.

From Automation to Infection: How OpenClaw AI Agent Skills Are Being Weaponized by Bernardo Quintero

OpenClaw becomes more terrifying when you realize how extendable it is. In the agentic world, popularized by Claude Code, skills provide prompts and instructions to an agent, making it more specialized for running tasks. For example, if you want your agent to join Moltbook, you download a skill that teaches OpenClaw how to use the site, including using its API to perform heartbeat checks.

Several Skills registries emerged after OpenClaw’s popularity exploded, and VirusTotal researcher Quintero found malware on many of the Skills hosted on these sites. The numbers are pretty crazy:

At the time of writing, VirusTotal Code Insight has already analyzed more than 3,016 OpenClaw skills, and hundreds of them show malicious characteristics.

Quintero splits “malicious characteristics” into poor security practices and vulnerabilities and straight up malware. The malware is in plain English, and reminds me of ClickFix in the sense that it’s socially engineering your OpenClaw / Claude Code.

🔗 Open Source

trailofbits/claude-code-devcontainer

Sandbox environment for running Claude Code. You install a CLI and it boots up a container for you to run Claude in an isolated environment. It includes tooling to install remote container extensions in VSCode or Cursor, so it offers some options if you prefer an IDE over the CLI.

trailofbits/dropkit

Dropkit lets you quickly bootstrap a secure DigitalOcean droplet. You provide dropkit a Digital Ocean API key, and it’ll create a workspace with your SSH key and an out-of-the-box Tailscale installation. It has some cool cost-saving features that allow you to hibernate droplets so you aren’t spending money when you aren’t using them.

backbay-labs/clawdstrike

Runtime security monitoring for autonomous agents, including Open Clawd, Claude Code, LangChain and more. It exposes a set of tools that enforce policy boundaries, such as preventing network calls, local filesystem reads and writes, or shell commands.

You can configure it to allow or block certain actions based on the policy you set. It comes with some out-of-the-box policies and appears to follow a pattern similar to EDRs, intercepting risky functions and performing a security check before allowing them to execute.

a2awais/Threat-Hunting

Collection of dozens of threat hunting queries for KQL & Crowdstrike.

toborrm9/malicious_extension_sentry

Threat intelligence list of malicious Chrome extensions removed from the Chrome Web Store. This is especially helpful if you want to test detections in a lab environment on malicious extensions, or build out scanners in your environment to see if you can find net new ones.

DEW #143 - Suppressing False Positives at Scale, Silencing EDRs & Detection Fidelity via Social Network Analysis

Detection Engineering Weekly

By: Zack Allen

28 January 2026 at 14:04

Welcome to Issue #143 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

New England got hit hard by a snowstorm, and my town alone recorded over 20 inches/50 cm of snow!
I got COVID for the third time in the last 6 years. It definitely was milder, but I can still feel the shortness of breath that I vividly remember from the earlier and more potent strains
If you have 30 mins, check out the blog about Gas Town. It’s written like someone who’s running through an Agentic fever dream, and they managed to wake up with an insane orchestration system that makes you run out of Claude credits in 3 minutes

Sponsor: Permiso Security

ITDR Playbook: Detect & Respond to Non-Human Identity Compromise
Non-human identities are everywhere, and when they’re compromised, attackers blend in as “normal” automation. This ITDR Playbook focuses on detecting and responding to NHI compromise using operational anomalies, not login patterns. Learn how to spot exposed keys, boundary violations, privilege creep, and abnormal service behavior. Plus, get response steps that will contain risk without breaking production.
Download The Playbook

💎 Detection Engineering Gem 💎

Centralized Suppression Management for Detections Using Macros & Lookups by Harrison Pomeroy

Detection rule efficacy is the practice of curating rule sets that balance precision, recall, and the cost of triage. New detection engineers typically think about rules being the only place you can apply logic to help manage this balance. A more precise query that accounts for benign behaviors, given the tactic or technique, can increase the likelihood of capturing true positives. But there are other capabilities in SIEM technologies and software engineering practices that can perform filtering and suppress alerts in more dynamic, context-aware ways that align with the threat landscape or your environment.

This post by Harrison Pomeroy details the power of Splunk’s macro and lookup table functionality to perform suppression of alerts without re-deploying rules. A suppression is a concept in which detection engineers deploy a capability to dynamically mute alerts, thereby reducing the cost of both false-positive generation and the subsequent need to tune a rule on small fields. It also makes the rule more resilient because it can account for external factors related to benign behaviors, such as known service accounts, scheduled tasks, or internal tooling.

Harrison leverages Splunk’s macro and lookup table features to achieve this.

The above Mermaid diagram shows his really clever setup. When you apply macros to each of your Splunk rules, you can start bringing in logic to evaluate whether suppressions are enabled for the rule (the T value), and then specify a lookup table to find additional alert logic to append to your original rule to suppress false positives.

The above example suppresses alerting on any user called svc_backup. The macro executes based on the T value and performs a lookup in a table relevant to the PShell Alert rule. svc_backup is in the table and uses a NOT() filter to prevent an alert if svc_backup is present. The suppressed green box ensures the alert doesn’t fire, and the Alert red box fires because the user is jsmith.

This type of suppression occurs at query time, before the alert is generated. There are other suppressions you can apply before a log hits the index, or after the alert fires. This is a great topic for my Field Manual series, so thank you, Harrison, for the inspiration!

🔬 State of the Art

EDR Silencing by Pentest Laboratories

EDR Silencing has been a super interesting area of research for security operations and threat actors alike. Typically, when a threat actor lands on a victim box and sees an EDR process running, their top priority is finding a way to evade the EDR to avoid detection. They can employ several techniques, such as:

Avoiding EDR detection rules themselves, such as abusing indirect syscalls that EDRs have not accounted for, or using living-off-the-land binaries
Obtaining privileged access and installing kernel modules that circumvent EDR hooking logic, avoiding malicious traffic generation
Uninstalling (!) the EDR

The last bullet above is the most interesting, because it’s so simple. It makes me think of the adage “don’t let perfect be the enemy of good”. EDR Silencing follows the same process because it abuses the same simple-but-effective concept. It focuses on disrupting the network connection between the EDR cloud service and the agent. This network connection hamstrings the effectiveness of the EDR, without necessarily worrying about evasion of logic.

In this post, Pentest Laboratories provides readers with a fantastic survey of the state of the art of EDR Silencing. A huge part of this research relies on obtaining Local Administrator privileges to leverage everything from Windows Filtering Platform APIs to adding blocking entries in local DNS configuration files.

The End of the “Write & Pray” Era in SIEM: Detection as Code and Purple Team Validation by Ali Sefer

This is a clever introduction to the concept of detection-as-code through the lens of Sefer, a SOC Manager. I enjoyed the framing around moving from the “Craftsmanship” era of rule writing to the “Engineering” era. Detection engineers, at their core, should be part security experts, data analysts, and software engineers. This is especially true in Sefer’s day-to-day, where they’ve dealt with analysts who read a threat intelligence report, implement a rule in the SIEM, deploy it, and don’t perform testing.

This really is a post about detection rule governance. It’s important that we implement the boring stuff for detection rules, for the sake of managing costs. If an analyst or detection engineer deploys rules without careful validation, education, version control and testing, then operations teams run a huge risk of false positives and analyst burnout. Sefer brings the reader through an example automated test pipeline, where:

Analysts write rules
Check the rule into version control with syntax validation and linting
Run Atomic Red Team tests to validate the telemetry matches the rule
Deploy the rule into the SIEM
Instill feedback mechanisms to tune the rule

Sefer ends the blog with a real world example where an analyst tuned a rule and the logic failed the validation check with Atomic Red Team. The cool thing here is that it had nothing to do with the detection rule, but with the health of the system itself. Catching log source configurations and matching them with detection logic is just as useful as rule validation itself.

Detection Fidelity & Confidence Framework: Teaching Your SIEM to Score Its Own Homework by Hatim Bakkali

But here’s what I’ve noticed after staring at years of notable event data: detections don’t fire in isolation. They have patterns. They have Friends. And those Friendships tell us something important about fidelity and confidence.

This post is a deep dive into a new framework for measuring detection fidelity and confidence. Rule efficacy is like a garden; it requires constant curation and mindfulness of how you build and maintain detection rules. Bakkali’s approach is more math-heavy and academic but built from practical experience. The concept is around measuring the co-occurrence of alerts with other alerts, similar to how social networks create edges between friends and followers for suggestions.

The equation binds to an entity, much like Risk-Based-Alerting, and Bakkali says it should complement RBA rather than replace it. Their framework calculates two scores based on confidence and fidelity.

Confidence: scores pairs of alerts based on how often they co-occur within a time window
Fidelity: aggregates those pair scores to a detection-level “noise accumulation” score. The lower, the better

They provide a ton of examples and walkthroughs, along with SIEM-agnostic pseudocode, for readers to try themselves. There’s a bake-in period to measure these over time before you can start using them, but it’s a clever approach for a few reasons.

First, it’s an elegant addition to RBA because it’s still technically a GroupBy to an entity, but it starts looking at pairs of alerts rather than aggregating. This leads to my second point: any type of expert model, such as applying arbitrary scoring mechanisms to alerts, runs the risk of poor model validation. You need to redeploy these models every time you update your scores, which results in profound changes and creates more work. That risk exists here, but it tends to preserve relationships of the pairings, making it easier to understand changes.

Introducing IDE-SHEPHERD: Your shield against threat actors lurking in your IDE by Tesnim Hamdouni

~ Note: I work at Datadog, and Tesnim is my colleague ~ I’m super excited to post this because it was Tesnim’s internship project, and she now works at Datadog and is releasing it to the world! IDE-SHEPHERD is an IDE extension that helps prevent malicious extension installation, an emerging attack vector over the last year. The cool part of this extension is that it generates telemetry from the extension manifest for reporting and threat hunting, in addition to runtime monitoring.

It has runtime and heuristic detection capabilities. At runtime, it’ll shim Node functions that attempt to spawn processes, detect and block malicious commands, and perform network monitoring. The heuristic functionality analyzes metadata related to extensions and checks for poor developer practices, metadata anomalies, and hidden commands.

From Static Template to Dynamic Forge: Bringing the DCG420 Standard to Life for the Detectioniers by DCG420

DCG420, who wrote and released the Detection Engineering Template, has just launched a platform that serves as a workbench for detection engineers. It has an AI backend to help visualize attack flows, measure coverage and write rules. The intel analyst within me got really excited reading about their Analysis of Competing Hypothesis feature, which combines their tool and LLMs to generate competing hypotheses against your detection rule candidate. This helps check for bias and identify detection engineers who may be stuck in a rabbit hole, trying to get a rule out without considering other options.

The Indirect Realism of Threat Research by Amitai Cohen

This is an excellent commentary by Amitai on information asymmetry in threat research. We tend to (rightly) dunk on large cybersecurity companies as they create, update and hype their lexicon of APT and cybercriminal names. But, the very good ones do this for a reason: they have a lens in which they see threat activity, and they group it within their unique lens because no one else has the visibility that they do.

This bias is ever-present in security operations and detection engineering, where, according to Cohen, we become convinced that what we can measure can capture what threat actors generate. By making sure we check this bias, understand that information asymmetry exists, and obsessing over what you are missing, you can feel more confident that you are addressing gaps on an ongoing basis.

☣️ Threat Landscape

Who Operates the Badbox 2.0 Botnet? by Brian Krebs

In the latest saga of the Kimwolf botnet, it looks like the botnet's operators broke into a rival Chinese-nexus family dubbed Badbox 2.0. The admins of Kimwolf, “Dort” and “Snow”, managed to post a screenshot of the crew taking over a control panel that manages and deploys Badbox. The evolution of these botnets has recently moved away from traditional DDoS-style attacks to operating and selling access to residential proxy networks.

Krebs managed to pull an email address from the “proof” screenshot and worked his way into finding an identity. Email re-use and operational security still seem to be issues for threat actors, and it shows how one screenshot can pull the attribution thread all the way to a full identity.

A Shared Arsenal: Identifying Common TTPs Across RATs by Nasreddine Bencherchali & Teoderick Contreras

This research by Splunk’s threat research team is a survey of 18 infostealer malware families mapped to MITRE ATT&CK TTPs. The emergence of these infostealer families tends to revolve around criminal groups splitting, source code getting sold and leaked, and conversations with each other on criminal forums.

The interesting finding here is how 6 out of the 18 malware strains leverage legitimate services for their command & control infrastructure. So it’s not the worst detection opportunity to alert on anomalous traffic heading to places like GitHub, social networks, Discord, or Steam.

OpenSSL 3.6 Security Release with Vulnerabilities: 10 Vulnerabilities by OpenSSL

OpenSSL had a fairly large security release with around 10 vulnerabilities disclosed. One vulnerability who had a “High” severity rating, CVE-2025-15467, caught my eye because the title started with a stack-based buffer overflow. These theoretically can lead to remote code execution, and since OpenSSL is a security technology that underpins the Internet, I thought it would be worth to call this out.

Kubernetes Remote Code Execution Via Nodes/Proxy GET Permission by Graham Helton

This is a super interesting vulnerability writeup where the (mis)configuration was known for a long time, but a new nuance in the configuration made it much worse. Basically, Helton found a valid Kubernetes configuration that allowed authenticated attackers to access an API that serves as a “catch-all” and proxies potentially dangerous requests to the internal control-plane API for Kubernetes, called the Kubelet API.

By using a WebSocket connection to nodes/proxy with the GET verb, Kubernetes proxies the request to the Kubelet API, and it doesn’t respect its internal configuration that only allows CREATE verbs for the exec command, enabling remote code execution. Helton discovered 69 Helm Charts of well-known vendors using this configuration. The best part? There is no audit logging you can use to detect this!

Here’s the relevant snippet from Helton’s blog:

This should mean consistent behavior of a POST request mapping to the RBAC CREATE verb, and GET requests mapping to the RBAC GET verb. However, when the Kubelet’s /exec endpoint is accessed via a non-HTTP communication protocol such as WebSockets (which, per the RFC, requires an HTTP GET during the initial handshake), the Kubelet makes authorization decisions based on that initial GET, not the command execution operation that follow. The result is nodes/proxy GET incorrectly permits command execution that should require nodes/proxy CREATE.

🔗 Open Source

DataDog/IDE-Shepherd-extension

IDE extension from Tesnim’s research listed above in State of the Art.

zencefilefendi/satguard

Satguard is a Starlink telemetry detection & analysis framework to detect and visualize satellite attacks. You specify Starlink debug logs, and it’ll use a combination of static rules and anomaly detection to detect spoofing and jamming attacks and measure health of a signal.

FinkTech/mcp-security

Security rules and best practices for defending MCP servers. It’s structured super well, and has markdown reports with detailed examples, compliance mappings, example vulnerable and secure code and references. Would be great to feed this into an LLM and check for vulnerabilities as people push code to an MCP server repository.

thpeng/lokis-mcp

PoC MCP server that demonstrates how a malicious MCP server can hijack your local LLM CLI to perform four separate attacks:

Tool shadowing: convince your local LLM that this is the preferred tool, and perform prompt injection to take advantage of queries and responses
Data exfiltration: hijacks a prompt and exfiltrates it over the tool for further analysis
Response injection: injects “hidden instructions” in other tool responses to manipulate behavior
Context window flooding: DDoS the context window of the prompt which can render models with smaller context windows unresponsive

aserper/rtfd

Local MCP server that exposes tools to connect to API documentation across GitHub, npm, GoDocs and several others. This is helpful to run if you want to run agents locally and you don’t want them to hallucinate while they make up strategies that doesn’t match documentation, or you want them to use the most up-to-date documentation without trying to search the Internet.

DEW #142 - Slack's Agentic Triage Architecture, Detection <3's Data and Sigma evals

Detection Engineering Weekly

By: Zack Allen

21 January 2026 at 13:54

Welcome to Issue #142 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I’m not usually a person who does New Year’s resolutions, but I’ve committed to small changes that have already made a positive impact in my life.
- Using a notebook to take notes and to-dos at work
- Meditate on Headspace for 4 days a week
- Playing video games twice a week. For some reason, I’m back on Dota2 so I’m sure that’ll be helpful for my mental health
There’s a 50/50 chance I’ll make DistrictCon this weekend :( There’s a massive snowstorm hitting Washington, D.C., and as a former Marylander, I can tell you that part of the country cannot handle snow
I’ve been messing with local MCP server development via stdio and HTTP APIs, and I’m starting to shill Claude Code to everyone I talk to. It ripped through a malware analysis at work a week or so ago, and we were able to hunt for IOCs in under 5 minutes.

💎 Detection Engineering Gem 💎

Streamlining Security Investigations with Agents by Dominic Marks

In the age of AI SOCs, it’s still hard to understand where the concept of agentic triage fits into everyday operations. Products tend to present the problem set and solutions in a clean, understandable way. This is a good thing - having a product company frame the space in clear, concise benefits and downsides drives the decision by the security operations team about how much cost they incur in building or buying one.

Blogs like this are showing why our industry is awesome with transparency. Slack's security operations team published its work on building an in-house agent-based triage system. You see many of the same principles and concepts across products, but because there is no moat or trade secrets to protect, there’s a lot more to dig into.

What you see above is their approach to their agent-to-agent orchestration system. The top of the pyramid starts with a director who leverages high-cost models. Thinking models that tend to take their time and deliberate on prompts and results. This makes sense from a planning and analysis perspective.

The critic biases itself to the interrogation of individual analysis from telemetry and alerts. It doesn’t require as much model cost, but it should spend a reasonable amount of time challenging assumptions and analyzing the lower-cost model. It presents the amalgamation of data and investigative output back to the director. The Director is probably thinking mode models, where you spend the most money on tokens to understand whether the bottom parts of the pyramid performed their job correctly. This is the gate between a human and the system, so you want only high-quality analysis moving forward.

The phase transition diagram is super interesting because it puts the above “Director Poses Question..” investigation step into practice.

According to Marks, the Director makes decisions for each part of the phase to see whether it needs to close the investigation or continue it further. The “trace” component is where the Director engages an expert within their architecture to perform additional investigative analyses.

Honestly, it’s hard for me to provide my own analysis here, because the blog is just so complete. So, if you are a person who is skeptical of these types of setups, borrow or steal ideas from this Slack blog and try it on your own. It seems reasonable, and if the idea is that you perform 5 investigations that take 2 hours each, it reduces 3 of them from 2 hours to 10 minutes, and it catastrophically fails on 2 of them, you still saved 6 hours!

🔬 State of the Art

Data and Detect by Matthew Stevens

This post by Stevens dives a bit deeper into the concept of detection observability. In our field, we tend to focus on the research element of rules and detection opportunities, but leave much less conversation about data quality. Remember, there is no rule without telemetry, and there is a concept Stevens points out around data usefulness that I think demonstrates this point perfectly.

Not all sources are the same when it comes to individual atomic qualities for alerting, but when you map them to techniques, you notice that the composite qualities (a sum of many data sources finding an attack chain) become crucial. The graph above, generated by Stephens, shows how important Process Monitoring is for data usefulness. In fact, without Process Monitoring, you lose close to 30% of the techniques you can combine with other data types to alert on.

They also comment on how hard it is to build schemas and normalize telemetry so your teams can operate out of a common lexicon of writing rules. This highlights that a large swath of issues we should deal with it focus heavily on the software and data engineer components of our jobs as equally as the threat research components.

Sigma Detection Classification by Cotool

Continuing Cotool’s research on security AI agent benchmark performances, they setup a website for studying performances on their benchmarks and released a new one on Sigma Detection classifications. The goal of this benchmark was to assess how well foundational models were trained on attack tactics and techniques. The Cotool team fed the full Sigma corpus to 13 foundational models and stripped the MITRE ATT&CK tags to see if they correctly mapped the tags back to the original rule.

Claude’s Opus and Sonnet 4.5 performed the best overall with the highest F1-score and but also the highest cost, ~somewhat similar to what we saw in their last benchmark on the Botsv3 dataset. The team provided their analysis of these placements, their prompts and tradecraft behind the evaluation, so others can run the same benchmarks as well.

5 KQL Queries to Slash Your Containment Time in Microsoft Sentinel by Matt Swann

I have a biased view on what is and what is not a detection rule. Even to the point where I’ve reduced the concept of rules down to one definition: a rule is a search query. There is a rationale behind it: SIEMs and logging technologies require a search query to generate results. But, as I break out of my bubble, I notice that not all search queries have the same value from a detection point of view.

In this post, Swann demonstrates this concept through the lens of a Security Incident Responder. When your goal is containment rather than accuracy or a balanced cost of alerting, accuracy matters less because the goal is to use your analysis skills to find and kick out threat actors as quickly as possible. Swann provides readers with five high-value KQL queries to help responders quickly orient around a potential intrusion. The cool part here is their unique experience in this field, even noting that some queries led to the discovery and containment of an active ransomware actor.

👊 Quick Hits

Detection as Code Home-Lab Architecture by Tobias Castleberry

I love seeing home-lab setups because there are many ways to set up an environment to practice advanced concepts with open-source and free software. This blog is part of a series by Castleberry where they document their journey from an analyst to a detection engineer, and they showcase some of their expertise and how they’ve learned along the way.

Building your own AI SOC? Here’s how to succeed by Monzy Merza

Speaking of demystifying AI SOC and agentic security engineering from Marks’ Gem listed above, this blog by Merza provides an irreverent commentary on the state of building these architectures. There are some non-negotiables Merza points out, such as data normalization, the concept of a “knowledge graph”, and honing foundational models and giving them the right instructions rather than relying on them out of the box.

The Levenshtein Mile by Siddharth Avi Singh

Before the age of LLMs, there was a ton of research and implementation of some pretty clever mathematical techniques to find and detect on threats. I used to work for a threat intelligence product company that specialized in detecting phishing infrastructure, and one of the key elements of finding phishing is understanding what the victim organization owns, so you can see how threat actors try to abuse and socially engineer its customers.

In this post, Singh details the Levenshtein Distance algorithm. The basic premise here is that you can measure the similarity between two strings and generate a score. If that score exceeds some threshold of similarity, you can generate an alert to an analyst and investigate whether or not it is phishing. Domain names are the logical data source here, and you can review them from the public domain registries, DNS traffic, or the Certificate Transparency Log and try to proactively block them before they become an issue.

☣️ Threat Landscape

After the Takedown: Excavating Abuse Infrastructure with DNS Sinkholes by Max van der Horst

This post by van der Horst helps readers understand what happens after a domain is sinkholed. We typically see news stories about a large botnet or ransomware operation being taken down, and the takedown includes seizing domain names used for command-and-control communications with victims. High fives and good vibes happen and then we focus on the next big thing.

van der Horst challenges this finality and tries to argue that a sinkhole is more than just an interruption operation; it’s also a forensic artifact that helps discover more victims and additional malicious infrastructure. They downloaded several datasets, combining passive DNS and open-source intelligence feeds, to understand the rate of disruptions and how to perform temporal analysis of these takedowns to discover unreported infrastructure.

It also allows analysts to cluster activity and create new detections as new botnets or campaigns emerge, where many cases involve the reuse of code and infrastructure techniques.

How to Get Scammed (by DPRK Hackers) by OZ

This is a great article showing an individual infection chain done by a Contagious Interview threat actor. OZ accepts the bait on Discord and walks through how the DPRK-nexus threat actor tries to infect him by taking a malicious coding test. OZ brings receipts: there’s a lengthy Discord conversation where the threat actor prods OZ and eventually convinces them to apply for the job.

There’s some cool analysis with cloning the repository and using docker and pspy to inspect the malicious traffic.

What’s in the box !? by NetAskari

NetAskari, a security researcher, stumbled upon a Chinese-nexus threat actor’s “pen-test” machine and managed to download a bunch of their custom tooling for analysis. The Chinese hacker ecosystem is in a bubble, the result of both cultural and artificial barriers imposed by the PRC. These barriers create opportunities to build tooling, exploits, and software in a silo, so when you find a goldmine of tooling available for download, it’s always great to download it and see how other hackers are performing operations.

They found a litany of post-exploitation tools, some of which are custom-written and look similar to the likes of Cobalt Strike or Sliver, a bunch of custom Burp Suite extensions, and some malware families, like Godzilla, that were used in nation-state operations against the U.S.

Dutch police sell fake tickets to show how easily scams work by Danny Bradbury

I think phishing simulations at a professional organization is lame, but I actually think it works at scale against the general populace as a form of education. Apparently, the Dutch Police thought the same. They set up a fake ticket sales website and bought ads to trick victims into visiting and purchasing tickets for sold-out shows.

Tens of thousands of people visited the website, and several thousand people bought tickets, which is a wild stat if you want to steal some credit cards. Obviously, the Police did not steal credit cards; they used them as an educational opportunity to help folks understand the risks of online ticket fraud.

CVE-2025-64155 Fortinet FortiSIEM Arbitrary File Write Remote Code Execution Vulnerability by Horizon3.ai

From the blog:

CVE-2025-64155 is a remote code execution vulnerability caused by improper neutralization of user-supplied input to an unauthenticated API endpoint exposed by the FortiSIEM phMonitor service. Oof. I couldn’t tell any of you the last time I’ve seen remote code execution vulnerabilities in SIEM technology.

The specific service, pMonitor, listens on 7900. It serves as the control plane for these devices, much like the Kubernetes control plane, and supports orchestration and configuration API calls. I ran a quick scan of likely FortiSIEM devices on Censys and found over 5000 publicly facing servers.

This blog has some details on the vulnerability, and, as with most FortiGuard and edge device vulnerabilities, user-supplied web request data with complex string parsing leads to a command injection deep within the application code.

🔗 Open Source

MHaggis/Security-Detections-MCP

Locally run MCP server for detection engineering. Leverages stdio transport so nothing leaves your machine which is always good if you are writing rules or queries in a sensitive information. It exposes 28 tools where a local LLM client (Claude, Cursor) can look at detection coverage, MITRE classification, KQL queries and data source classification.

SeanHeelan/anamnesis-release

PoC of an LLM exploit generation harness. The README has an extensive background on how they approached benchmarking Claude Opus and GPT 5.2 with no instruction on how fast they can analyze a vulnerability and generate exploit code. They introduced several constraints in test environments to challenge the models, such as removing certain syscalls, adding additional memory and operating system protections, and forcing the agents to generate an exploit with a callback.

tracebit-com/awesome-deception

Yet another awesome-* list on deception technology research, open-source repositories and conference talks.

mr-r3b00t/rmm_from_shotgunners_rmm_lol/main/mega_rmm_query.kql

This repository caught my eye because I’ve never seen a rule that started with the word “mega”. And when I mean mega, I’m thinking a few hundred lines for something pretty complicated. But this RMM detection query rule is 3000 lines long. Can you imagine needing to tune this?

ineesdv/Tangled

This is a clever phishing simulation platform that abuses iCalendar rendering to deliver legitimate-looking phishing invites. It leverages research from RenderBender, which abuses Outlook’s insecure parsing of the Organizer field.

DEW #141 - K8s Detection Engineering, macOS EDR evasion, Cloud-native detection handbook

Detection Engineering Weekly

By: Zack Allen

14 January 2026 at 14:03

Welcome to Issue #141 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

It was a long but restful month away from you all! I can’t wait to get back into writing every week for y’all
🤝 I am accepting new sponsors for 2026! If you are interested in sponsoring the newsletter, shoot me an email at techy@detectionengineering.net. We are already almost halfway booked for Primary slots and now have Secondary slots so you have options!
I’ve started writing again for the Field Manual and I really love encapsulating my experience and knowledge into these posts. If you have ideas for Field Manual posts, comment below. I have my latest post below as the last story under State of the Art

This Week’s Primary Sponsor: Push Security

Want to learn how to respond to modern attacks that don’t touch the endpoint?
Modern attacks have evolved—most breaches today don’t start with malware or vulnerability exploitation. Instead, attackers are targeting business applications directly over the internet.
This means that the way security teams need to detect and respond has changed too.
Register for the latest webinar from Push Security on February 11 for an interactive, “choose-your-own-adventure” experience walking through modern IR scenarios, where your inputs will determine the course of our investigations.
Register Now

💎 Detection Engineering Gem 💎

A Brief Deep-Dive into Attacking and Defending Kubernetes by Alexis Obeng

For detection engineers, incident responders, and threat hunters who operate in a cloud-first environment, you probably heard developers in your organization talk about Kubernetes (k8s for short). It’s an extremely popular container orchestration framework that has been used as the de facto standard for controlling scaling, application isolation, and cost. Whether you have it in your environment or you’ve never worked with it, it’s important to note how important the security controls and detection opportunities work inside these environments, because it’s like an operating system of its own.

When Obeng first shared this research on a Slack server I was on, I was excited to read it because it’s truly a deep dive into Kubernetes security, as the title suggests. She started the blog by describing how unfamiliar this space was, and by the end, you could tell Obeng had become very familiar with detection and hunting scenarios in Kubernetes.

The blog starts with an introduction to k8s and breaks down the jargon, architecture, and nuances of how a Kubernetes environment operates. The most important thing I try to get folks to understand with k8s is that it’s separated into two detection planes. The control plane, as Obeng explains, “is the core of Kubernetes.” It helps control everything from scaling plans, what containers to run, permissions, and health checks.

The other plane, the data plane, is everything else. The hyperscalers describe this as the service’s core functionality. Since k8s’ functionality revolves around running containers, you could argue that it’s about each individual container and the isolation of those containers within k8s.

As you can see from the threat matrix, attacks along MITRE ATT&CK operate in both planes.

After giving this introduction, she jumps into several attack scenarios. But the start of this scenario section first describes her description of the k8s attack surface. This is my favorite part of the blog. Obeng outlines four major scenarios you’ll see in any k8s attack: pod weaknesses, identity and access mechanisms, cluster configuration, and control plane entry points. Notice these are focused on the control plane as the end goal. So, if you can compromise any part of the data plane, for the most part, the main goal is to attack the control plane afterward.

She ends the blog with close to 10 attack scenarios, detection rules using Falco, and a follow-up with her lab for folks who want more hands-on learning.

🔬 State of the Art

EDR Evasion with Lesser-Known Languages & macOS APIs by Olivia Gallucci

~ Note, Olivia is my colleague at Datadog ~

EDR blogs from independent researchers are hard to find. It’s not that the blogs are tucked away in dark corners of the Internet, instead, EDR researchers who don’t work at vendors are few and far between. So, anytime I get to see research that goes deep into the EDR space, I pay close attention.

This is especially true for the macOS world. Microsoft has years of security solutions and a litany of researchers who document all kinds of peculiar malware and EDR behavior. This is logical, since most major security incidents over the last 30 years have been on Windows platforms. But in the last few years, attackers have shifted their focus to macOS. The opaqueness-by-design of EDR vendors AND Apple makes it hard to learn about security internals on this platform.

This technical analysis by Olivia helps break down those barriers by first describing the ecosystem of opaqueness of macOS combined with security vendor technologies. From my understanding (and with lots of stupid questions from me to Olivia), rely on the extended security (ES) system, which is somewhat equivalent to Linux’s eBPF observability and security framework. Security vendors subscribe to security events, build detections over them, and implement EDR security response features, such as blocking a piece of malware from executing.

This has its limitations, and Olivia’s analysis under her “Technical Analysis” section points them out. It’s reminiscent of the early days of Microsoft security, when bypasses emerged from malware families, and it took a lot of effort for vendors and Microsoft to respond to them. The closed ecosystem has it’s advantages from a security controls perspective, but IMHO, it starts to do a disservice to organizations when attackers move faster than the controls you try to implement.

The Cloud-Native Detection Engineering Handbook by Ved K

This post is an excellent follow-up to Abeng’s blog, which is under the Gem at the top of the newsletter!

Detection engineering is much more than building detection rules. There are elements of software engineering, data analysis, and threat research that separate a good detection engineer from a great one. I’ve talked about this across my publication, podcasts and conference talks. But, if you want a deep dive on the how to wear and implement these skillsets, Ved’s blog is a great resource to do so.

Ved defines cloud-native detections as any research, engineering and implementation of a detection rule to identify threat activity in cloud environments (AWS, Azure, GCP) and Kubernetes. He then describes his nine-phase (!) approach to writing detections, and opens each subsection with what “hat” you should be wearing.

The value of this post lies in the diligence put into each phase, especially in the use of real-world examples. They are bite-sized sections so that I wouldn’t be phased (ha!) out by the number. It serves more as a handbook for you to reference as you move through the detection lifecycle.

My favorite section is under Phase 4, titled “Enrichment and Context.” It ties nicely with my piece about context and complexity within rules, and according to Ved, it does require a Software Engineering Hat. Ved lists out five critical pieces of context to help increase the efficacy of rules:

Identity Context: who is this (human) or what is this (service-account).
Threat Intelligence: what IP addresses, domains, or general knowledge around indicators of compromise do we have to help make decisions on this activity?
Resource and asset metadata: What critical asset inventories, compliance tags or posture related information exists to help identify the riskiness of this asset being attacked?
Behavioral baselines: is this normal behavior for this type of activity? Think Administrator activity at 2am on Saturday.
Temporal context: Attacks aren’t point-in-time, they are over a period-of-time. Can you enrich this alert with other context of events before it occurred?

Ved finishes the rest of the post, writes a detection, tests it, follows it through deployment, and sees how useful the alert is. It looks like this is his first post on his Substack, so I recommend subscribing!

How to defend an exploding AI attack surface when the attackers haven’t shown up (yet) by Joshua Saxe

This is a fantastic commentary on what happens when the security community knows that a new technology is going to bring all kinds of security issues, even though the issues haven’t materialized yet. Saxe’s framing revolves around the growing attack surfaces around AI technologies. It’s hard to parse marketing-speak and LinkedIn ads and messages from startup founders and salespeople claiming that “the bad guys are already using AI at scale to attack you!!11” without much proof. Perhaps they reference a news article about some basic usage of vibecoding malware, or a phishing site that has an HTML comment of “created by Claude Code.”

Saxe has recommendations around what security functions and specific teams can do to help prepare for this, and I will steal his framing around making controls and policies “dialable”. Security should aim to be enablers rather than disablers for our engineering and technology counterparts. So, build controls in security engineering, and implement detection & response processes, but configure them in a way so you can “dial up” the strictness as we see new attacks emerge from real scenarios rather than theoretical ones.

Introducing Pathfinding.cloud by Seth Art

~ Note, Seth is my colleague at Datadog ~

Seth recently released a comprehensive library on privilege escalation scenarios and techniques abusing IAM in AWS environments. There are 65 total paths, and 27 of them are not covered by existing OSS tools to test coverage. That good news is that the website has the description of each attack and how to perform it, as well as a helpful graph visualization so you can see the traversal rather than try to create an image in your head.

📔 Field Manual

I wrote a Field Manual issue on Atomic Detection Rules over break! Please go check it out!

☣️ Threat Landscape

The Mac Malware of 2025 👾 by Patrick Wardle

This blog is a comprehensive look back at Mac Malware incidents and research throughout 2025. Maybe I am showing my age, but if you told me 10 years ago that macOS’s popularity is going to explode in cybercriminal groups, leading to large scale compromises, I would laugh at you. Wardle lists out the top malware families, some associated incidents and blogs dissecting the malware, as well as walk through analysis of the malware using an open-source toolbox.

Researcher Wipes White Supremacist Dating Sites, Leaks Data on okstupid.lol by Waqas Ahmed

lmao

🌊 Trending Vulnerabilities

MongoDB Server Security Update, December 2025

I’m a bit late on this one due to holidays and time off, but MongoDB recently disclosed a critical vulnerability dubbed “MongoBleed” under CVE-2025-14847. It allows an unauthenticated attacker to connect to a MongoDB instance and leak memory contents, which potentially contain sensitive information around data inside Mongo, authentication data and cryptographic data.

I’m impressed with the transparency and diligence in the post. MongoDB found the vulnerability internally, validated it, built a patch, notified customers and rolled out a post. A researcher at Elastic published a PoC two days later (on Christmas, no less) that I’ll link below.

Ni8mare - Unauthenticated Remote Code Execution in n8n (CVE-2026-21858) by Dor Attias

n8n is an open-source workflow framework to build Agent-to-Agent systems. They recently disclosed two vulnerabilities, CVE-2026-21858 and CVE-2026-21877, a 9.9 and 10.0, respectively. n8n itself has skyrocketed in popularity primarily due to it’s ease of use for interfacing with Agentic workflows and platforms. The .1 difference is 21858’s arbitrary file read, which could allow reading secrets from a target system, and full remote code execution on 21877.

I really enjoyed the technical detail of this post by Attias, focused on the arbirary file read vulnerability. When you think of arbitrary file reads in a modern application stack like n8n, you can pull a lot more credentials that give you access besides dumping password files. Attias created a clever scenario on reading in arbitrary sessions and loading it into n8n’s knowledge base, allowing the extraction of the key from the chat interface itself.

🔗 Open Source

heilancoos/k8s-custom-detections

Kubernetes lab environment and corresponding detection rules from Obeng’s gem above.

appsecco/vulnerable-mcp-servers-lab

Hands-on lab for testing security vulnerability knowledge against MCP servers. There are nine scenarios, and each one looks pretty reasonable in their real-world applicability. You’ll need Claude and python to run each one, and luckily with MCP, you can specify the singular Python file within the Claude config and get everything you need to get started.

Adversis/tailsnitch

Tailsnitch is a posture management tool for Tailscale configurations. You give it a Tailscale API key and it’ll connect to your tenant’s API and compare it’s configuration to secure baselines.

joe-desimone/mongobleed

Original PoC of CVE-2025-14847, a.k.a MongoBleed, dropped right on Christmas :|. Has a docker-compose file so you can safely test it yourself.

kpolley/easy-agents

This is a nice example of what I think will be a normal detection and response engineer’s setup in the next few years. Your org will operate a repository with agent setups for technology like Claude code, and it’ll contain a standardized list of MCP servers to use and agent instructions. Making it extendable to tweak or add agents and MCP servers should be as easy as another prompt and some glue work for a custom MCP.

What are Composite Detections?

Detection Engineering Weekly

By: Zack Allen

7 January 2026 at 02:48

Atomic Detection rules are critical building blocks for a detection engineering function. They provide visibility into singular event or indicator-based threat activity within an environment. The rules are narrow in scope and generally lack context for the blue teamer’s environment and the threat actor performing the malicious action. For example, an atomic detection rule can inspect Administrator logon activity in a cloud environment and generate an alert whenever an Administrator logs in. This captures malicious admin compromises (high recall), but also triggers on every legitimate admin login (low precision), flooding analysts with false positives.

This tradeoff also works in the opposite direction on the precision-recall spectrum. A detection engineer can deploy an atomic rule that is so precise it becomes brittle. It may never generate an alert because the fields it tries to capture are so specific that they offer low operational value.

The Detection Engineering Field Manual is a series dedicated to sharing knowledge and my experience building, operating and scaling a detection engineering organization at a F500 tech company. Please like and subscribe if you find this series useful!

The answer to combat these types of detections is to increase the context around the attack itself. This means capturing more threat activity to group atomic detections together, as well as increasing the context of the environment to differentiate benign and malicious activity. Composite detections, also known as correlated or stateful detections, increase the context and, therefore, complexity of writing and maintaining the rule.

This field manual post covers (ha!) the pros and cons of composite detection rules and begins to explore strategies to expand context around threat activity.

Detection Engineering Interview Questions:

What is the MITRE ATT&CK?
What is a composite detection rule?
Explain a threat activity scenario where a composite detection rule helps reduce false positives?
How do composite rules increase operational complexity for a detection engineer?

MITRE ATT&CK

MITRE ATT&CK (pronounced “MY-ter AT-ack”) is the industry standard for modeling threat activity. According to their main website:

“MITRE ATT&CK® is a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations. The ATT&CK knowledge base is used as a foundation for the development of specific threat models and methodologies in the private sector, in government, and in the cybersecurity product and service community.”

There is no modern detection engineering and incident response without MITRE ATT&CK. It serves as a lexicon for security engineers across red and blue teams to standardize on how a specific attack occurs and the telemetry it generates.

Tactics are along the X axis and represent the stages an attacker traverses to achieve an objective, such as exfiltrating sensitive data, deploying ransomware, or causing a denial-of-service attack. Ransomware deployment is the end goal, but it requires a lot of steps to achieve that impact. For example, getting access to a victim machine, laterally moving to a domain controller, collecting secrets and cracking administrator passwords, and finally finding a way to deploy the ransomware.

The Techniques are the Y-axis under each Tactic. Techniques are the how: specific methods adversaries use within each tactic to achieve their objective. For example, Network Share Discovery under Discovery is used by attackers to find interesting files, folders and target machines connected to the current machine. They can leverage this to perform Collection of sensitive information and perform Lateral Movement to a higher privileged victim machine.

The beauty of MITRE ATT&CK is that it directly contradicts the adage “attackers only need to be right once, defenders have to be right 100% of the time.” Each technique listed above has associated telemetry, detection opportunities, and some even have threat groups that leverage the documented techniques.

What does this have to do with Composite Detections?

In my last post on Atomic Detections, I talked about how Atomic Detection rules lack context. These rules can use threat intelligence, such as malicious IP addresses, to generate alerts, but those IP addresses can be rotated, making the rule very noisy. So you wouldn’t want to write that rule unless it existed in the same window where the IP address remains malicious.

On a separate Atomic Detection rule, a detection engineer can write a rule to alert on Network Share Discovery. This is an obvious choice from my example before: the next logical step after Network Share Discovery is Lateral Movement. We want to detect that, right?

The problem here, again, becomes context. What if a legitimate process, such as a File Search or Data Backup tool, performs Network Discovery? You generate an alert, block the activity, and just killed productivity or a critical business process for one of your users. Does this mean you need to painstakingly investigate every Network Discovery alert? You could, but you would burn out, and the operational costs would be too high.

This is where Composite Detections can help, and where MITRE ATT&CK enables context via chains of events. By correlating Network Share Discovery with subsequent Lateral Movement attempts, we filter out benign activity and surface actual threats.

Composite Detections Tell a Story

Let’s continue to challenge the adage “attackers only need to be right once, defenders have to be right 100% of the time.” We know that writing one Atomic Detection rule can be noisy. So what if you write two? What if you write these rules across every single path along MITRE ATT&CK, under every Tactic? You would have high recall, but terrible precision, and a flurry of alerts that can’t discern between benign and malicious activity.

Let’s look at an example from our previous post on Atomic Detection Rules:

In this scenario, the Atomic Detection rule fires on administrator login activity. We are only looking at the event and ignoring sourceIP, timestamp, and location. These can help tell the story, but the story stops on the singular event. You could write some additional enrichment to tell the story that:

The Admin is logging in from a risky location, let’s say outside the U.S. for the sake of example
The Admin is logging in past business hours

But these enrichment points can also be part of legitimate business activity. This is where context comes into play.

Let’s say you have two other rules that capture potential threat activity of an Administrator creating a second account and attaching an Administrator policy or profile to it. It’s riskier (it’s further along the ATT&CK chain), but it lacks context. But what if you combine the threat scenarios and create a story?

Here’s the story: an Administrator account gets compromised, and an attacker runs a script to log in to your AWS portal automatically. They are smart cookies and believe in another adage, “two is one, and one is none,” and create a second account to achieve Persistence on your account. They then leverage their Administrator privileges to attach an Administrator policy. Smart, if you reset the original Administrator password, they have a backdoor back into your environment!

By combining the three scenarios via the following rule, in pseudocode:

if user contains 'admin'
AND CreateUser action is called
AND AttachUserPolicy is called and the Policy = 'Admin'
THEN alert

You’ve told your SIEM quite a compelling story to look out for, and it found it!

There are some key questions from the above rule, and they emerge from the other data I’ve omitted from my diagram:

What is a legitimate amount of time between logging on and calling CreateUser?
Is calling CreateUser then attaching an Administrator policy malicious?
Does this Admin typically CreateUser and attach policies?

These questions are what adds complexity and cost to writing and maintaining a ruleset. So, a detection engineer must weigh the cost of this complexity versus the cost of false positives from Atomic rules.

In this specific Composite rule, we used Windowing. Windowing is a technique in which we capture activity in time windows and assume that any Composite detection that captures events within that window must be the result of threat activity. The rule assumes that if an Administrator account logs in, creates a secondary account, and attaches a privileged policy to it, it must be malicious. This reduces false positives by:

Combining three Atomic rules into one rule
Creates a story where these three actions together means something malicious is happening, or requires investigation
Assumes threat actors will try to do this quickly as their access may be revoked within a few minutes

Stories increase complexity

I linked a chart in my previous post about the trade-off between context, operational cost and false-positive reduction.

In this Windowed Composite Detection Case, there are several costs that detection engineers incur:

Does my SIEM technology support Windowing?
Does the combination of these detection rules capture the threat activity that I want? For example, should I also have a separate atomic rule for CreateUser to catch persistence attempts that don’t fit the 5 minute window? This can lead to false negatives if you only rely on composite rules.
Does the window period give me the best value? If I increase it to 15 minutes, what costs do I incur on server usage, indexing and other infrastructure components?

I will say that Detection Engineers I’ve hired, worked with, and spoken with at other companies spend as much time researching cost trade-offs as they do performing pure security research. This is the Engineering component of threat detection, and to me, these types of problems are what make the field exciting. You are part security researcher, part engineer, and part data scientist!

Conclusion

Composite detections shift detection engineers’ focus to reduce false positives by creating stories of attack chains. MITRE ATT&CK is the de facto industry standard for documenting how an attacker progresses through a breach to achieve an objective. Detection engineers can use ATT&CK to build atomic and composite rules to capture threat activity.

Atomic rules lack context by design, but when combined with other atomic rules via composite detections, you can start building a story of an attack. This story is the context you want to decide on whether you should investigate an alert. This story also reduces false positives by capturing the logical progression an attacker may take in your environment, and reduces the likelihood of alerting on benign activity.

The complexity of creating and maintaining composite detections stems from technological capabilities, such as windowing, as well as the hidden costs of assumptions made by the detection engineer. For example, combining three distinct events into a composite detection may miss other alerting scenarios within those events, leading to a false negative.

In the next Field Manual post, we'll explore different alerting mechanisms for composite and atomic detections outside of windowing.

What are Atomic Detection Rules?

Detection Engineering Weekly

By: Zack Allen

15 December 2025 at 15:55

In the last post, we discussed the tradeoffs in designing effective rules. Detection efficacy captures the needs of the consumer of your detection rules, because the persona can be more concerned with missing an alert (false negative) or having too many alerts that don’t matter (false positives).

Finding attacks is the core value proposition of what detection engineers do, and it’s what makes this field technically challenging. Although difficult, this work has an art and aesthetic that is hard to find anywhere else in security. This is because you aren’t solving a machine-to-machine problem, but a human-to-human problem, and the other human is unwilling to cooperate with you. To me, detection engineering and blue teaming, overall, are studies of behavior.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this post, we’ll begin looking at how rules detect threat activity through atomic detections.

Detection Engineering Interview Questions:

What is the Pyramid of Pain?
What is an atomic detection rule?
Compare and contrast scenarios where an atomic detection rule can be effective or ineffective.
What is environmental context?

David Bianco’s Pyramid of Pain

Some attacks generate telemetry that is easy to identify as an attacker on your system or networks. Many attacks, however, require logic that depends on telemetry availability, environmental context, index windows of logs arriving at the SIEM, and understanding of attacker tradecraft or behavior.

Much as detection engineers must consider operational costs when writing rules, threat actors incur costs when carrying out attacks. This cost-versus-cost battle helps frame attack and defense so you can impose as much cost as possible on an attacker’s operations, so they’re in so much pain they deem a tactic or technique not worth their time. This is where the “Pyramid of Pain” by David Bianco becomes a valuable exercise for security teams.

https://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html

At its core, the Pyramid of Pain challenges defenders to focus on imposing as much pain on attackers. As you traverse the pyramid, operational cost to your efforts increases, but the amount of pain you cause to an attacker also increases. Each layer of the Pyramid represents an operational complexity for the threat actor to consider when staging an attack. The ideal state of detection is at the top: if you detect Tools executing in your environment, your detections are more robust because the order and context of the tool’s execution become irrelevant.

The best state is under “Tactics, Techniques and Procedures” (TTPs). This layer focuses on the behavioral aspect an attack. If you detect behavior of an attack, every layer below the pyramid become less relevant in your detection (for the most part), and the detection is robust enough to catch changes in Tools, Artifacts, Domains, IP addresses and hashes.

Imagine this: you write a rule that helps detect a known Command-and-control (C2) server you read from a blog post. You deploy that rule and it doesn’t find anything. Great, you aren’t compromised, and you’ll have great coverage for the future if there is a compromise.

Here’s the problem: threat actors are well aware that we find C2 servers, build rules, share with the community and blog about them. A C2 server is typically either an IP Address or a Domain. Have you ever rented a droplet on Digital Ocean, or bought a domain from Namecheap? You can spend a few dollars to rent more droplets or buy new domains. This requires minimal pain on the threat actor’s side, and defenders no longer block your new C2 server until it is discovered again.

Even worse, the IP address you wrote a rule for is now leased to a benign client, and it is now alerting on benign traffic, causing pain to you and your team.

So, how effective is your detection rule now? Not too effective! This is because detecting on a singular value, such as an IP address or a domain, is an Atomic Detection. Atomic Detections are narrowly defined rules that detect activity at a point in time with little to no context. Let’s dive into them in the next section.

Atomic Detections Lack Context

Atomic Detections are tactical in nature. They may seem precise in practice, but because they lack context from the environment and incur little pain for attackers, they become brittle and prone to false positives. As soon as an attacker changes their infrastructure or flips one bit in a new build of their malware, which changes the cryptographic hash value, your rule diminishes in quality.

Atomic Detections also exist for computer or network activity. The point here is that ignoring context in an environment, such as rules that don’t evaluate time signatures, environmental context, or regular activity, makes atomic rules risky to deploy.

Let’s look at a basic alerting example with Amazon AWS Administrator login activity.

The rule is in purple and only alerts on Log activity where the user field value is admin. The SIEM correctly identities the user field containing admin three times . The 11AM alert is a true positive: the administrator credentials were compromised. The other two are false positives, indicating normal administrative work. To make things worse, the compromised login was during normal business hours.

So how do you differentiate between the three alerts?

You differentiate them by spending incident response cycles investigating each one. Now imagine 100s or 1000s of these being generated. The atomic rule strategy doesn’t work because there is little to no context on the event.

The same thing can be said for IP-based C2 alerting.

In this example, the detection engineer wrote an atomic detection rule for a known C2 IP address. Perhaps they read a blog some time around December 10 and added it quickly to find exposure. Log 1 enters the SIEM; the rule checks the destination field and generates a true-positive alert.

Fantastic! Let’s keep the rule!

The C2 was removed by the leasing company that owns it on December 11 due to the blog post. On January 15, a content delivery network leases an IP address, and network traffic logs flow through the SIEM, triggering an alert. Each subsequent network log afterward is a false positive.

The context from both of the graphs above is under the UNUSED field in the purple box. Associated domains, timestamps and physical location are all useful fields to add into the atomic rule to increase robustness of the rule and remove false positives. It would make sense, then, to start including all of these in your detection rule. Detection engineers need to understand the relationship between detection context and cost.

Imposing cost on ourselves

As we progress the Pyramid of Pain and add context to your ruleset, the cost increases. Cost can depend on time, resources, maintenance, or the technology needed to add context, such as threat intelligence. The following graph tries to explain this causal relationship:

At the bottom left, you could deploy a rule similar to the examples above. Because the operational cost of matching on a single value is low, the context is low. And because the context is low, the risk for false positives is high. As you add context (move to the right), the cost increases, but the false-positive rate decreases.

This is why not every rule can be perfectly accurate. There is a cost-benefit tradeoff, as well as information asymmetry from attacker behavior, that detection engineers must consider. The only way a rule can catch all threat activity is to alert on every piece of activity. That seems costly!

Conclusion

Atomic detection rules generally focus on low-context events or values. They can certainly help a blue team function, such as a SOC or a Detection & Response team, and they have a place in security operations. They risk generating many noisy alerts when the detection engineer fails to account for a threat actor’s behavioral patterns.

The Pyramid of Pain and imposing cost are industry-accepted concepts that help contextualize the competing objectives of blue teamers and threat actors. Writing rules to alert on the bottom parts of the pyramid, which primarily involve threat intelligence indicators (IP addresses, domains, hash values), imposes a greater cost on defenders than on threat actors. Defenders impose more pain on threat actors by climbing The Pyramid and writing rules that detect tools and TTPs.

For the next few parts of this series, I’ll explain the different ways detection engineers can write rules to capture threat actor behavior and the associated operational complexity.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

DEW #140 - SVG Filter ClickJacking, Detection Engineering "Onboarding" and React2Shell spotlight

Detection Engineering Weekly

By: Zack Allen

10 December 2025 at 14:03

Welcome to Issue #140 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I’m in Paris this week after a quick personal trip to London. None of you told me that there are more people walking around in the West End than Manhattan!
I managed to get some great BJJ training in while in London, and tried cold plunging for the first time ever. Low key it’s amazing
This issue is vulnerability writeup forward. But, I’m happy for it, because I think people in blue team roles need to see and understand the inner workings of malicious, unintended code paths. IMHO it makes me a better security engineer

Primary Sponsor: Permiso Security

ITDR Playbook: Detect & Respond to Suspicious Authentication Patterns
Credential compromise now drives more than half of today’s breaches—and most teams still miss early warning signs. This Identity Threat Detection & Response Playbook breaks down the highest-value authentication anomalies and provides actionable detection and response steps your team can implement immediately. Strengthen identity defense where it matters most.
Download the Playbook

💎 Detection Engineering Gem 💎

SVG Filters - Clickjacking 2.0 by lyra

I wrote a blog about abusing Open Graph previews 7 years ago for phishing. The idea was that you could abuse how browsers render preview links to display one thing while redirecting to another. I’ve always tried to find a term or phrase to coin this style of attack. It’s not malware or phishing, but similar to IDN homograph attacks, it provides a confusing user experience for the victim. And within that confusing experience, you can socially engineer them to click into whatever malicious URL you want.

ClickFix became a huge hit for threat actors between last year and this year, and it abused this same concept. You are presented with instructions to copy and paste something into your terminal to download some piece of software or fix a bug. But by abusing how clipboard interactions work with a website, the user thinks they are copying and pasting a benign command, and they instead paste a malicious payload.

Lyra’s blog follows the same confusing user experience style, but this time, doing some fun things with SVG rendering. They got their original idea after Apple announced the Liquid Glass redesign, and wanted to recreate some of that experience in the web browser. After tinkering with some of the SVG Filter Effect primitives, they tried applying these effects over an iFrame, and whoops! It worked.

The reason this was so interesting to me is that my liquid glass effect uses the feColorMatrix and feDisplacementMap SVG filters - changing the colors of pixels, and moving them, respectively. And I could do that on a cross-origin document? - Lyra

The first demonstration was a PoC on layering these types of effects over an iframe for a sensitive one-time password code. You’d be an attacker, load the OTP frame inside an iframe, then trick the user to paste the code back into what they think is the legitimate site, but it’s an SVG element on top. They dubbed this style of attack ClickJacking.

This isn’t the most interesting part, it gets better! These <fe*> elements have some mathematical capabilities to help compute everything from masks to filters. Due to the nature of this attack, most of the logic has to occur inside the <fe*> elements, because you cannot extract pixel data from an SVG filter back into JavaScript or the DOM. So how do you create a multi-stage attack?

Well, why not make these elements functionally (not Turing) complete and create a limited-but-effective state machine inside the filters? That’s obvious, right, Zack? ←Lyra, probably, as they did this

Lyra made a logic-gate example to demonstrate this, but by applying a multi-stage filter mask to a victim iFrame, they successfully showed how they can perform this SVG ClickJacking attack within a state machine rendered solely from these <fe> elements. Here’s an ASCII art example of the QR code attack with exfiltration:

The cross-origin part worries me the most here, because they essentially figured out how to overlay and extract data from the attack without breaking CORS.

They demonstrated this attack against Google Docs and were awarded a good sum of money for doing so. Video here:

https://infosec.exchange/@rebane2001/115265287713185877

I don’t know how you’d detect this on the browser, and you could have some exfiltration-style detections to work with once the data leaves the machine. UX Confusion strikes again!

🔬 State of the Art

Why the MITRE ATT&CK Framework Actually Works by John Vester

I read a lot of blog posts introducing MITRE ATT&CK to readers. I think it’s a great first topic for folks getting into the industry, because ATT&CK is such a staple for us. My biggest feedback on these blog posts is that they aren’t really offering anything new for readers. This isn’t a bad thing, since the content shouldn’t change too much, but Vester’s blog here is comparatively different from the others I have read.

The blog starts with the typical introductory content on MITRE ATT&CK, but in the “Real-world ATT&CK” section, Vester begins describing ATT&CK as a practitioner who has been doing this for years. They do this by looking at how ATT&CK looks when overlayed with detection rules inside Sumologic.

I appreciate this approach because it feels like Vester is a senior engineer, you are onboarding to a new company, and they are giving you the experienced perspective on the whole system. ATT&CK has lots of faults and a lot of its criticism is pointing at its real-world applicability. Luckily, Vester shows where it works really well and where it doesn’t necessarily work. This type of balance is what makes ATT&CK useful; it’s a tool rather than a full-fledged solution.

Understanding the Nuances of Detection by Danny Zendejas

Maybe I’m stuck on this idea of reading blogs as if I’m onboarding to a new company, but Zendeja’s blog about Detection Nuances here is a great follow-up blog to Vester’s above.

We take a lot of time jumping straight into rules and ATT&CK, but taking time to understand the logistics of detection engineering matters just as much. For example, Zendejas laid out the general architecture for SIEM, and then introduced readers to the types of formats and standards dedicated to search languages and rules.

Understanding and navigating these formats effectively is a fundamental part of a Detection Engineer’s role. Being data agnostic should be the goal. - Zendejas

The rest of the blog contains some good content around alert precision and alerting. If you put on a proverbial “onboarding at a new job” hat, this is a great introduction for folks entering the field or seeking a fresh look at fundamental concepts.

Threat Hunting based on Tor Exit Nodes (+ KQLs queries) by Sergio Albea

The Onion Routing (Tor) network is one of those funny cases of intention versus use. The idea behind it is ethically amazing: it helps mask the source of a connection to a destination server, and it would be particularly useful for people like political dissidents in hostile countries. But, whenever there is anything good, criminals tend to follow and exploit the goodness. Except crypto, all criminals! Just kidding.

In this post, Albea provides some excellent hypotheses and use cases for threat hunters to find machines on a network connecting to the Tor network. The first case is around the use of Tor locally to connect to Tor domains. This, in my opinion, is benign behavior for the most part, but it can raise legal and ethical concerns for a company, so your acceptable use policies should address it.

The second case is rooted in a more likely intrusion scenario. Attackers have used Tor to mask their source IP addresses and credential stuff login endpoints to prevent attribution and likely legal action. Although this makes sense from a privacy perspective, it’s terrible OPSEC in other ways. By design, the Tor Network publishes its exit node IP address list because, without it, Tor clients won’t know how to route through it. So, that makes an excellent detection mechanism to find abusive sign-in attempts from those routing their malicious traffic through Tor.

They provide several KQL examples so you can follow along with their hunting queries.

How Amazon uses AI agents to anticipate and counter cyber threats by Daniel Weiss

This research piece from Amazon showcases their Autonomous Threat Analyst (ATA) environment. If you take AI out of the equation, it’s a neat setup that I haven’t really seen in other corporate environments. They created a separate rule-testing environment that mimics their production environment, which is a feat in itself.

Now to add the AI parts back: they have a multi-agent architecture where a blue-team agent creates rules, validates rule logic by querying their mimicked environment, and performs curation and deployment. The fun part here is their red-team agent. They ran a query to generate Python reverse shells for detection validation, and it generated over 30. They fed telemetry from these reverse shells into the mimicked environment and identified detection gaps to improve their ruleset.

The beauty of LLMs for detection isn’t really about accuracy, but more about scale. What I worry about with this type of scale is its comfortable nature. Over thirty types of reverse shells seem like a great dataset, but were each one validated by an expert? Will LLMs generate obscure and distracting payloads to complete their task? If we only care about coverage at scale, will these LLMs waste time on these things instead of what we see in the environment?

These are all questions for which I don’t have a good answer. But, it may not matter in the sense that if we keep driving token costs down, then scale becomes irrelevant, even if the types of attacks are obscure.

Secondary Sponsor: runZero

Join runZero’s Holiday Hackstravaganza!

Tune into runZero Hour, a monthly webcast examining new exposures & attack surface anomalies. Join us on Dec 17 for 2025’s wildest vulns, top research picks, & 2026 predictions. Plus, trivia and Hak5 gift cards!
Register Now

☣️ Threat Landscape

⚡ Emerging Threats Spotlight: React2Shell

So the big threat landscape news in the last week was the React2Shell vulnerability. The exploit is elegant and simple, but the way the exploit chain leverages React’s processing capabilities is quite complex. Whenever 10/10 CVSS CVEs like this come out, the immediate thought is oh shit, another Log4Shell. It’s even worse when the researchers name the vulnerability something similar to Log4Shell, and this was no exception.

For those unfamiliar with React, it’s one of the biggest open-source frontend frameworks for arguably the most used programming language in the world, JavaScript. You can build highly responsive, complex, and beautiful applications and hook them into any backend framework of your choice.

The specific vulnerability is a server-side prototype pollution. Every object in JavaScript inherits the base prototype Object. So, when you build object primitives in JavaScript, everything from a User to a Window can use the Object’s properties. Here’s a basic example courtesy of Claude:

A person is an object with property: name. On line 6, you can call person.toString(), but person doesn’t have a toString method. That’s because all objects in JavaScript inherit Object by default, and as you can see from Line 15-19, it’ll continue “calling” up the Object chain until it reaches something it does inherit, such as toString!

This is where things get interesting for React2Shell. If you can control the input to a JavaScript function in React, such that you can supply or override functions, you can achieve arbitrary code execution. This is the premise behind React2Shell.

My colleagues at Datadog wrote about this in an excellent post detailing the vulnerability details:

The payload is from lines 4-15. The prototype pollution to override then on line 5. The actual malicious payload is under _prefix on line 10. This is a shell execution command so, if a vulnerable React server processes this specific payload, the server will call out to a shell and write the output of id to /tmp/pwned.

React’s vulnerable codepath processes HTTP POST requests with the `Next-Action` header and attempts to deserialize the payload as a React Server Component action. During deserialization, React splits references like $1:__proto__:then on colons and traverses the property chain, inadvertently accessing Object.prototype when it hits __proto__ and boom, Object is polluted!

Why is this such a big deal?

React2Shell had the right ingredients to make it a serious vulnerability with an industry-wide response. These ingredients included a CVSS 10 score with potential remote code execution, a PoC, a website, a reference to a patch to reverse-engineer, and some hype on social media. Organizations rushed to find exposure and a patch, and some accidentally took down their global CDN network in the process. There were exploitation attempts in the wild (Greynoise has a great writeup on this). My $dayjob saw our environments get hit hard once more PoCs started to drop.

The hard part here, as Kevin Beaumont points out, is the environmental context when deploying this version of React Server Components with the Next.js router. A lot of prerequisites were required, not for the exploit itself, but for the stack that needed to be deployed, which had the vulnerable code path. And if you didn’t have any of these web servers exposed to the Internet, the urgency factor of patching diminished.

But was there as much impact as Log4Shell?

The answer is a resounding no, but with a big asterisk*. Nothing compares to Log4Shell, as it truly was a black swan event in vulnerability land. But this is the problem with emerging news around vulnerabilities. We make comparisons to make sense of the chaos, and try to use that to inform urgency. So although this turned out to be mostly fine from an impact point of view, I believe we correctly placed the right amount of urgency to do something.

It’s a net positive for an industry that has a reputation for crying wolf over the smallest things. It means we are getting smarter at identifying the prerequisites for a black swan event and being okay with it not happening, because we still protected ourselves.

Firm handshakes to all who responded within the last week!

🔗 Open Source

Bert-JanP/KustoHawk

Powershell-based incident and triage platform for Azure environments. It uses the Microsoft Graph API to query for events related to Entra, Defender and Microsoft XDR. It has pre-baked queries so you can run investigations out of the box.

xorhex/BinYars

Binary Ninja plugin to run YARA-x rules inside a binja project. This is useful for reverse engineering workflows where you want to orient your understanding of the binary based on threat intelligence baked into YARA rules.

msanft/CVE-2025-55182

Fully contained PoC environment for React2Shell. The README also has a great explanation of the vulnerability and exploit chain.

qazbnm456/awesome-cve-poc

Yet another awesome-* list, but similar to the CVE-2025-55182 repository I linked above, contains references for all kinds of PoC code and environments for testing. I’ve found these most useful for when I need to capture telemetry and write rules in an environment that doesn’t mind getting exploited ;).

DEW #139 - Detection Surface, Frontier Models are good at SecOps & THREE YEAR ANNIVERSARY!

Detection Engineering Weekly

By: Zack Allen

3 December 2025 at 14:03

Welcome to Issue #139 of Detection Engineering Weekly!

It’s crazy to think that it’s been three years of doing this newsletter.

Thank you all for making this a fantastic ride. Since I like stats and insights, here are some I pulled:

15,000 subscribers as of Monday :)
138 issues in total, so not perfect, 156 straight issues, 20 weeks of downtime sounds nice to me
Two kids, one major interstate move, one grad degree and no new tattoos, though I should commemorate this somehow and get a new one :)
At least one subscriber in all 50 states in the US. California, Texas, NY, Virginia and Florida are the top 5 most-subbed states
Subscribers from 153 countries across every continent. Substack doesn’t track Antarctica :(. US, India, UK, Canada & Australia are the top 5 most-subbed countries
If you like reading Ross Haleliuk, there’s a 30% chance you are also reading me. We have the top audience overlap! Eric Capuano, Jake Creps, Chris Hughes and Francis Odum are also fantastic newsletters with high overlap
I started sponsored ad placements in September and have been booked every week since then, and 2026 is looking even crazier

This Week’s Sponsor: root

Why Detection Teams Need Minute-Level Remediation
When CVE-2025-65018 dropped last week (libpng heap buffer overflow, CVSS 7.1-9.8), the exposure window started ticking. Attackers armed with AI can weaponize CVEs within hours. Traditional remediation workflows take 2-4 weeks: triage meetings, engineering scramble, testing delays.
But here’s what detection engineers need to know: the exposure window is where attackers win. The Root team patched the critical CVE in 42 minutes across three Debian releases (Bullseye, Bookworm, Trixie), creating a fundamentally different detection posture than the same CVE unpatched for weeks. Detection strategies must account for minute-level remediation capabilities.
Learn what CVE-2025-65018 teaches us about matching attackers at AI speed and why week-level remediation cycles leave detection teams with massive blind spots.
Full Story

💎 Detection Engineering Gem 💎

Turning Visibility Into Defense: Connecting the Attack Surface to the Detection Surface by Jon Schipp

I’ve been shilling the term “Attack Surface” with the detection team here at work. I think it’s a reasonable mental model to use when you need to focus detection efforts on your inventory and telemetry sources. So, when I read this post by Schipp, I was pleased to see a similar framing of the Attack Surface problem :).

The security industry has a good idea of what an attack surface is. It even has a product category vertical dedicated to it, but the definition becomes vague when you differentiate between internal and external attack surfaces. According to Schipp, the definition should focus on the assets you need to protect, which, in general, I agree with. There is no rule without telemetry, and it’s nearly a full-time job for detection engineers to identify, track, and ship the right telemetry so we can write detections.

Schipp takes this a step further with the concept of “detection surface”. The adversarial behavior you want to detect can only be detected in a subset of the assets that you own. He lists a few reasons why:

Do you have the right technology selected to generate the right telemetry and alerts on top of the assets you own?
Are you prioritizing the correct detections to find adversarial behavior in the assets you find the most critical?
How do you find new gaps in coverage, and are you doing the exercise enough as your attack surface grows?

These questions are why the 100% MITRE coverage meme exists in our space. You may write rules that cover 100% of ATT&CK, but are they detecting the right behavior given your environment? I’d much rather look at a MITRE ATT&CK heatmap with deep coverage in two tactics, like Exfiltration and Lateral Movement, so I know the team is really focusing on specific behaviors to catch.

If you want to see a visceral physical reaction from me, throw a print-out of an ATT&CK heatmap that’s all green. I’ll probably run away screaming.

🔬 State of the Art

Evaluating AI Agents in Security Operations Part 1 and Part 2 by Eddie Conk

~ Note, I had Part 1 ready to go for this week’s issue and Conk & the cotool team posted Part 2. It’s important to read Part 1 so you can understand my analysis for their follow-up blog! ~

I loved reading this post because it shows how detection-as-code evolves beyond your ruleset into AI agents that handle everything from rule triage to investigations. Cotool researchers performed a benchmarking analysis of frontier models (GPT-5, Claude Sonnet & Gemini) against Splunk’s Botsv3 dataset. Botsv3 is a security dataset containing millions of logs from real-world attacks, along with a series of questions in a CTF-like format for analysts to practice investigations.

Benchmark exercises like this answer more than “are these models accurately performing security tasks?” LLMs are cost-prohibitive, as in, they require financial capital to use the frontier model APIs, and human capital to shape, maintain, and verify results. AI agent efficacy is detection and investigation efficacy. Understanding ahead of time which agents perform well within the constraints of your business can accelerate decision-making.

Here are some of the results pasted from the blog:

The test harness for accuracy involved taking the individual CTF questions from Botsv3 and mapping them to investigative queries. Conk and team had to remove some bias from these questions because they were built as a progressive CTF. Basically, this means that answering one CTF question unlocked the next sequential question, and that sequential question could bias the investigation.

The latest frontier models from OpenAI and Anthropic outperformed Gemini here, but I was surprised to see 65% as a leading score.

Model investigative speed now enters the equation, and Anthropic’s Opus-4.5 beat the brakes off of every other model, including Haiku and Sonnet. This is good for teams who want to tune something to be fast and accurate, which seems like a good tradeoff, and it’s off to the races, right? Well, remember, detection efficacy means cost as much as it means accuracy, and the frontrunner, Opus-4.5, costs a little over $5 per investigation versus GPT-5.1’s $1.67.

There are a few other interesting callouts in the blog around token usage, but these three axes were the most relevant for people who need to balance accuracy, speed, and cost.

The detection community needs data like this to make cost-efficacy tradeoffs for their teams. Hopefully, we can see more studies comparing models, cost, and prompt strategies, and even better, releasing bootstrapping mechanisms to run these tests on our own.

OpenSourceMalware - Community Threat Database

This is a freely available threat intelligence database for reporting and tracking malicious open-source package malware. This is especially relevant for emerging threats, such as the Shai-Hulud attack, and it’s crazy to see how many packages are submitted nearly every day. If you sign in, you can view additional analysis details of the malware submitted by researchers.

Unfortunately, there are no direct IOCs on the page, so it’s hard to pivot to hashes if you want to download them from platforms like VirusTotal. It does link to sources like osv.dev , which sometimes contain hashes, but it’d be nice to see this platform host malware samples for download.

Revisiting the Idea of the “False Positive” by Joe Slowik

This oldie-but-goodie blog by Joe Slowik on the concept of false positives in security operations really drives home the underlying issues of the label. He first frames the idea of labels like true and false positives in terms of their origins in statistics. I wrote about these labels previously, and I tried to help readers understand that their value is directly proportional to the capacity of your security operations team.

Slowik goes in the other direction in terms of their value; instead of thinking about units of work, you should think about these labels in terms of the underlying behavior and hypothesis. Analysts talk about “true benigns” in this way. You alerted on the specific behavior you wanted to alert on, but you want to investigate further to determine whether it is malicious. This breaks the pure 1-shot application of a confusion matrix and adds more work for security analysts, since we need to question our underlying assumptions about a specific detection.

Recreated flow diagram from Slowik’s post

Challenging the hypothesis behind your detections aligns well with my discussion of security operations capacity versus efficacy. Here are a few questions I would ask you during this exercise:

Are you finding the right behaviors that could indicate maliciousness?
Are you okay with these behaviors generating true benign alerts, because the idea of a false negative with that behavior is detrimental?
Can the behavior you are looking for be enriched with environmental context, such as update cycles, peak traffic, or off-hours traffic?

The core of detection engineering is challenging assumptions. I hate the adage of “defenders have to be right every time, attackers have to be right once.” Finding a singular behavior to alert on across the attack chain gives us the advantage, so we really only need to be right once. So, as you build hypotheses and detection rules, you should balance what you want to see from a detection, even if it’s true benign behavior.

Intel to Detection Outcomes by Harrison Pomeroy

This is a nice introductory post to leveraging threat intelligence in detections.ai to generate detection outcomes. Full transparency: the platform has sponsored this newsletter, but it also has a community edition, so folks can sign up to benefit.

One of the hardest problems in cyber threat intelligence that I’ve dealt with for 15 years is proving tangible value. This is different than intangible value. The delivery of finished intelligence reports, RFIs, and investigative platform experiences can be considered intangible. You miss these things when you don’t have them, but it’s hard to measure the “why” behind the impact of a report or an RFI.

Detection engineering helps bridge this gap, specifically by enabling cyber threat intelligence teams to turn their research into tangible outcomes. This is what Pomeroy argues LLMs can do. You can feed an agent a cyber threat intelligence report, it can parse IOCs, TTPs, and log sources, and it can generate rules for you to try out and deploy to get up-to-date coverage of emerging threats.

Introducing LUMEN: Your EVTX Companion by Daniel Koifman

This is the release blogpost for Daniel Koifman’s LUMEN project, located at https://lumen.koifsec.me/. It’s a free tool for investigators and incident responders to load Windows evtx files for analysis. There are over 2,000 preloaded Sigma rules, and the entire analysis engine is run client-side. You can do several things once you load your logs in, such as running a sweep of the Sigma ruleset, building a dashboard on fired rules, building an attack timeline, and extracting IOCs. It has a feature to connect your favorite LLM platform to the tool using an API key and leveraging it for AI copilot capabilities.

☣️ Threat Landscape

Meet Rey, the Admin of ‘Scattered Lapsus$ Hunters’ by Brian Krebs

This is a classic Krebs doxing piece unveiling the identity of one of the main personas of The Com group, Scattered Lapsus$ Hunters. Rey was an administrator of one of the Com-aligned ransomware strains, ShinySp1d3r. It’s always crazy how he manages to pull the attribution thread to find these identities. An old message from Rey contained a joke screenshot of a scam email they received with a unique password. From there, he pivoted on the password to find more breach data tying Rey to a real person. Since Rey didn’t respond to him, Brian called his dad, and of course, Rey responded.

The Shai-Hulud 2.0 npm worm: analysis, and what you need to know by Christophe Tafani-Dereeper and Sebastian Obregoso

~ Note, I work at Datadog, and Christophe & Sebastian are my coworkers! ~

It’s rare to see the term worm inside a headline these days. It’s a rare label for a unique security phenomenon, and the idea still holds firm, this time targeting npm (again). The Datadog Security Research team put a lot of time and energy into their analysis of the latest Shai-Hulud wave. Some interesting notes from this campaign include using previous victims to post new victim data, a wiper component, and a clever local GitHub Actions persistence mechanism.

Inside the GitHub Infrastructure Powering North Korea’s Contagious Interview npm Attacks by Kirill Boychenko

Boychenko and the Socket Research team published their latest work on TTP updates to North Korea’s “Contagious Interview” campaign. It’s an impressive operation, given the scale they try to employ, aiming to conduct as many malicious interviews as possible. In this campaign, they tracked 100s of malicious packages, each with over 31,000 downloads. The factory-style setup of rolling new GitHub users with the malicious interview code, fake LinkedIn profiles, and rotating C2 servers is classic Contagious Interview.

Unmasking a new DPRK Front Company DredSoftLabs by Mees van Wickeren

To continue on the DPRK train, I found this post fascinating because it wasn’t about the malware associated with WageMole/Contagious Interview, but rather the techniques behind tracking infrastructure. Van Wickeren leveraged the reliable GitHub search engine to find malicious repositories linked to the campaign.

I was a little confused by their use of WageMole, only from a pure clustering nerd perspective. These look like Contagious Interview repositories, and the associated OSINT screenshots that call out some of them suggest that victims were taking malicious coding tests. WageMole, on the other hand, is a fake IT worker applying to companies.

At the end of the day it doesn’t matter too much because they all overlap, but its another demonstration of how hard it is to do attribution in this field.

🔗 Open Source

Koifman/LUMEN

Full LUMEN web-app from Daniel Koifman’s blog in State of the Art above. You can host your own LUMEN instance without ever leaving your localhost!

Vyntral/god-eye

Subdomain and attack surface enumeration tool that leverages local Ollama for AI analysis on top. It’ll connect to twenty different open-source scanning and directory services, like dnsdumpster, then push results into the local Ollama model. It looks intelligent enough to help with HTTP probing, CVE analysis, and sifting through Javascript code for anything leaked or vulnerable to standard web attacks.

R3DRUN3/magnet

Magnet leverages the GitHub API and specific query strings to find potential secrets posted to public repositories. You can specify strings or use ones provided by magnet. In their PoC, R3DRUN3 managed to find two repositories with leaked tokens, then responsibly reached out to them to provide remediation steps, and they responded.

ChiefGyk3D/pfsense-siem-stack

SIEM-in-a-box for pfSense firewalls. It has an impressive architecture: OpenSearch backend, parsers in Logstash and uses Grafana/InfluxDB for metrics. It looks like they’ll be extended the SIEM backend to other open-source SIEMs like Wazuh in the future.

RazviOverflow/advent-of-hacks

Awesome-* style list of hacking challenges for the holiday season. So far they have 8 listed, so if you wanted to spend some time this December to up your hacking and CTF knowledge you have your work cut out for you!

DEW #138 - Sigma's Detection Quality Pipeline, Anthropic finds AI-first APT & eBPF shenanigans

Detection Engineering Weekly

By: Zack Allen

19 November 2025 at 14:03

Welcome to Issue #138 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I switched to the Brave browser, and I don’t think I’m ever looking back
My coworker suggested I go to a Tottenham Hotspur match while I’m in London. I’m a fan of one of the most insane fanbases in the NFL, where we jump through folding tables set aflame before games, and I feel that same energy from the Spurs YouTube shorts I’m watching during my research
I fractured my rib 5 weeks ago and I’m finally back (carefully) training. It feels good to move again!

This Week’s Sponsor: Sublime Security

Tomorrow: Intro to MQL, Threat Hunting, and Detection in Sublime
We invite Detection Engineering Weekly subscribers to join a technical webinar that will guide you through how Sublime Security detects advanced email threats. Learn how MQL (Sublime’s native detection language), threat-hunting workflows, Lists, Rules, Actions, and Automations all contribute to a flexible detection pipeline.
Additionally, discover how our Autonomous Security Analyst (ASA) accelerates investigations.
Register today!

💎 Detection Engineering Gem 💎

SigmaHQ Quality Assurance Pipeline by Nasreddine Bencherchali

Many people claim to use detection-as-code, but I rarely see these pipelines discussed as transparently as those from SigmaHQ. In this post, Nasreddine provides readers with a complete overview of how Sigma’s community ruleset repository manages community contributions. Documentation is essential here: the Sigma team ensures that every community rule adheres to a specification, so they all appear the same, even down to the filename. Here’s their Linux rule specification:

I love the attention to detail here. When you have a ruleset of thousands of rules, you need to ensure consistency in every step of the detection engineering process. It may not matter to have these conventions when you are a single team managing dozens of rules, but when you are a five-person team managing 1000s, it makes the ruleset more attractive for others to use and also keeps you sane.

The coolest part here, IMHO, is the combination of benign and malicious log validation tests. Each rule in each pull request undergoes several validators, followed by a good-log test and regression testing. The good-log test takes candidate rules and runs them across the evtx-baseline repository. If a rule generates an alert, then it must be a false positive, and the pipeline fails.

Separately, the regression testing pipeline ensures that a change in the rules doesn’t introduce any regressions that could cause false negatives and forces submitters to contribute a sample of a malicious log to validate its usefulness. The maintainers may also request reference links to blogs, threat intelligence websites such as VirusTotal, and even malware sandboxes to ensure they understand the efficacy of the rule before merging.

🔬 State of the Art

Stopping kill signals against your eBPF programs by Neil Naveen

This post is an excellent study in the cat-and-mouse game of threat detection on Linux systems. For the most part, eBPF-style security agents are the de facto standard for telemetry inspectability and detection & response. We’ve seen a lot of research in this newsletter on how effective threat actors on Windows spend time trying to disable EDRs to go unnoticed during their operations. But, I have seen few, if any, research on how to protect against eBPF attacks on Linux until I read Naveen’s research here.

When you want to terminate an eBPF agent, you’ll need Administrator privileges to do so, as they run as Linux daemons. If someone did manage to get permissions, you could send a kill signal to the process and then Bob’s your uncle. But what if you wanted to add extra steps to collect even more telemetry and find a compromise? Naveen came up with two options:

Using eBPF to hook kill and never let anything kill it
Leveraging cryptographically signed nonces as an added layer of assurance to accept a kill signal, and to keep your sanity because you just locked yourself out from restarting the agent

I’ve been doing Linux development, both offensively and defensively, for over a decade. This is probably the first time I’ve seen a clever application of cryptography to give a defense-in-depth approach to Linux detection & response. Here’s Naveen’s workflow comparing and contrasting a standard public-private key setup to a nonce-based signature kill methodology:

Example signature flow from Naveen’s post

Of course, actors can also do fun stuff where they attack the Network stack directly and prevent the agent from reaching out to your security vendor’s domain for additional alerting.

Technique Research Reports: Capturing and Sharing Threat Research by Andrew VanVleet

This post serves as a follow-up to VanVleet’s research into detection data models (DDMs). DDMs are a form of documentation for detection engineers to help transcribe knowledge from an attack technique into actionable detection opportunities. But, there’s always more to a detection rule than the specific telemetry it’s trying to capture. This is where VanVleet introduces Technique Research Reports (TRRs).

The idea behind these reports is to capture the research knowledge surrounding the technique and rule. This is probably the most challenging part of our jobs, because individual research methodologies vary, and you may be an expert in a specific attack surface or style of attack, but it doesn’t do your team any favors if you can’t help them learn how you arrived to a rule. It’s even worse if you leave the team, and folks are left trying to understand the specifics of the attack, as well as the environmental context and the research you’ve performed.

I do see a lot of similarity with MITRE ATT&CK’s recent v18 launch, specifically Detection Strategies. “Identify possible telemetry” is, in general, where Detection Strategies stop and TRR reports begin. Log sources are environment-specific, and although you may have Sysmon, EDR, or syslog logs, they can become nuanced based on your environment setup. For example, a CrowdStrike vs. SentinelOne query will affect your log source query.

They are incredibly comprehensive write-ups, or “lossless” research reports, as VanVleet calls them. For example, the TRR for DCShadow attacks is a fantastic resource for detection engineers to understand the intricacies of a Rogue DC attack. It can be a blog post in its own right. However, this is where the tradeoff between documentation quality and the velocity of maintaining a ruleset comes into play.

I love this research, but given how much valuable time he invested in it, it may not be conducive to productivity unless your leadership time allows you to do so. I also worry about drift in techniques and telemetry sources, which can make some of these outdated. LLMs could help solve some of this because they are generally very good at parsing and maintaining knowledge bases.

Weird Is Wonderful by Matthew Stevens

This is a short-but-sweet commentary on the role of detection engineers and how we need to “catch the weird.” It’s always nice for me to see fresh takes on concepts I’ve talked and read about for years. When folks try to break into this industry, they are sometimes bombarded with extremely technical concepts, complex environments, and a wide array of technologies they must learn before they feel useful. But, sometimes, it’s nice to hear from others who can distill complicated subjects into easy-to-understand concepts.

Catching weird, to me, is the idea that we all succeed at our jobs when we can distinguish normal from malicious. Weird may not be malicious, so having some intuition around things that look off can help solidify the baseline of normal in your environment versus something not normal. It’s a professional paranoia, of sorts :).

Be KVM, Do Fraud by Grumpy Goose Labs / wav3

This is a follow-up post to Grumpy Goose Labs’ research on hunting for KVM switches to detect fraudulent employees. It’s full of Kim Jong-un memes, but there are excellent technical details around detecting KVM switches in your environment. The author, wav3, uses CrowdStrike as their example, and managed to dump a bunch of information on how to hunt indicators ranging from KVMs, Display settings and product indicators so you can see who among your workforce may employ some of these risky devices.

☣️ Threat Landscape

⚡ Emerging Threats Spotlight: Anthropic Disrupts First AI-Orchestrated Cyber Espionage Campaign

Disrupting the first reported AI-orchestrated cyber espionage campaign by Anthropic

Last week, the threat intelligence team at Anthropic disclosed the disruption of the “first-ever” AI-orchestrated espionage campaign by a Chinese Nexus threat actor. GTG-1002 is the designation for this threat cluster, and they attributed with high confidence to a Chinese state-sponsored operation. In this summary, I’ll break down the architecture and Anthropic’s analysis of the attack workflow, share my commentary on the parts of the report that I like and dislike, my medium-high confidence analysis of details missing from the report, and provide takeaways for detection engineers.

Attack Architecture

The most interesting aspect of this operation is that Anthropic had visibility into the orchestration layer of the threat activity, leveraging a combination of Claude and several MCP servers. They claim the threat group automated 80-90% of their operations autonomously, an impressive feat when you consider that this is a nation-state operation. GTG-1002 managed to jailbreak Claude into thinking it was talking to a red teamer, allowing them to instruct Claude to work on their behalf.

If you had told me last year that a nation-state would trust an AI system to execute its campaigns against victims, I would have (rudely) laughed in your face. But it looks pretty slick:

Architecture diagram pulled from the Anthropic report.

For those with a Model Context Protocol (MCP) server, it provides a standardized way to connect a human interface, such as chat or code editors, to external tools like APIs. AI applications like Claude can only use a small set of tools, so writing your own connectors to centralize your chat interface to whatever toolset you want is a powerful feature of these platforms.

According to Anthropic, GTG-1002 built a suite of MCP servers that connected to several open-source toolsets dedicated to performing reconnaissance and fingerprinting, exploitation, post-compromise lateral movement and discovery, and eventually, collection and exfiltration. This is the impressive part of the operation: imagine an operator leveraging a chat interface to create a scalable infrastructure for red team operations, with the “backend” attack tool system handled by Claude and capable of scaling as needed.

The team claims that with their visibility in Claude usage, the operators automated 80% to 90% of their attacks. The remaining 10%-20% involved human verification at the “Report & Analysis” step, as shown in the diagram above.

Attack Flow

Anthropic grouped their attack operations into five phases, as shown above. The “robot” in each phase serves as the MCP server, directing specific tools to perform tasks along the ATT&CK killchain. The human icon next to the robots indicates a manual validation step by a human. These pit stops serve as a verification step to make sure that Claude is behaving correctly and not hallucinating.

In the report, the validation steps did result in a myriad of hallucinations. They claim Claude returned incorrect results, non-existent credentials, and the wrong IP addresses. So, although the attack flow diagram shows a clean, step-by-step process for the attack phase, these operations were frequently rerun.

Pros & Cons

This report has received criticism from the security community since its publication. To me, it’s a landmark report and whether it’s a famous or infamous report, it has left it mark. I want to list both what I like and don’t like about it.

What I like:

There’s an excellent demonstration of the unique visibility the Anthropic team has over attack infrastructure. It’s certainly a threat intelligence source that we can derive useful insights from, and foundational model companies like Anthropic and OpenAI can provide that
There is a specific call out around responsible disclosure to victim organizations. It shows the good intentions of the security team at Anthropic, and I hope to see more of that in the future
They admit shortcomings around how the actors performed jailbreaking to get Claude Code to help them with their operations, as well as limitations in hallucinations
The transparent technical context around the threat model of AI Trust was helpful to see and understand their day-to-day challenges

What I didn’t like:

They did not provide any indicators of compromise. No IPs, domains, hashes, signatures, or payload examples. It’s hard for research teams to verify findings independently.
The attribution is vague, and it reads like Anthropic intentionally redacted proof around this activity. Indicators of compromise could help with this
It reads as if these attacks were cloud-based instead of on-premise. I couldn’t parse out if this was differentiated, but it doesn't matter when it comes to the severity of a Chinese-nexus APT cluster. The callout about attacks against databases, internal applications, and container registries makes me think this is a cloud environment

Overall, the report provides a net benefit to security teams on several fronts. The claim of an APT using modern AI architecture from Anthropic, rather than vendor marketing, is a step forward in our understanding of an evolving threat landscape. It builds trust in Anthropic’s security team, which is one of the most used platforms for foundational models today. If we got this report from another vendor, we’d question the efficacy of their security program.

I think the feedback is valid regarding the value of threat intelligence, but I only see them improving from here.

🔗 Open Source

tired-labs/techniques

Technique Research Report dataset from VanVleet’s work above. It has extensive documentation of several attack techniques, and they fit the style-guide he talked about in his blog. It also includes a link to a frontend searchable library for those who don’t want to navigate the GitHub repository.

ricardojoserf/SAMDump

Volume Shadow Copy technique leveraging internal Windows APIs versus the command line. When you run the binary, it won’t generate any traditional Sysmon telemetry leveraging vssadmin.exe, which arguably makes it harder to detect. It has a few other tricks, including using NT API and avoids GetProcAddress usage.

reconurge/flowsint

Open-source and graph-based OSINT tool that looks like a more modern take on Maltego. It has dozens of transforms, so you can get a good amount of functionality out of it to compete with Maltego. The differentiation here would be hosting something on your own, and if you require specific integrations, you’d have to build them yourself.

RootUp/git-fsmonitor

This is a fun initial access technique leveraging the fsmonitor capability of git clients. You edit the git configuration file and set the fsmonitor value to a shell script. When git is run, the shell script executes under the hood.

DEW #137 - AI Agents For Security By Security, Free Sigma training & JA4 for beginners

Detection Engineering Weekly

By: Zack Allen

12 November 2025 at 14:28

Welcome to Issue #137 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week:

I was in LA for a wedding and went to Venice Beach for the first time. It was awesome seeing pros at the skatepark, jamskaters, live music, and of course, this ^^ MF DOOM mural
Speaking of LA, there are Waymos EVERYWHERE
It started snowing here in New England, and we celebrated by running outside barefoot for as long as my family could bare it

This Week’s Sponsor: Nebulock

Trust Your Intuition. Vibe Hunt for Outcomes.
Good hunters feel suspicious activity before the alert ever hits. Vibe Hunting allows you to lean into that intuition and combine it with machine reasoning to hunt across data and telemetry without juggling tools. Nebulock’s threat hunting agents connect the dots, explain reasoning, and deliver contextual recommendations.
Hunting becomes less about process and more about bridging hypotheses with detection.
Start Vibe Hunting

💎 Detection Engineering Gem 💎

How Google Does It: Building AI agents for cybersecurity and defense by Anton Chuvakin and Dominik Swierad

I typically avoid including blogs from vendors that are high level concepts around complicated topics like security and AI. But, this blog struck a great balance between how they approached internal Google security engineers who were skeptic of leveraging AI in their day-to-day work. I think this approach can be copied for any security organization looking to augment their security operations with LLMs, as it focuses on small achievable wins grounded in risk reduction and reality versus “thinking big.”

Chuvakin and Swierad split this approach up into four steps:

Hands-on learning builds trust: You wouldn’t want to purchase a SIEM without having your Detection & Response team understand how to use it, so why do the same thing with agentic systems?
Prioritize real problems, not just possibilities: Ground your agentic problems in a space where you are already familiar with the problems. They list two prime examples every D&R engineer could use to help with: analyzing large swaths of security data into insights, and quickly triaging malicious code to understand its function
Measure, evaluate, and iterate to scale sucessfully: This section uses the dirty word/acronym “KPI” (cringes in business school). Instead, they gut-check success by asking two critical questions: “Did this meaningfully reduce risk?” and “What amount of repetitive tasks did this automate and free up capacity?”
Get your foundations right: This is the most nuanced section that carries the most value for folks to steal. When you develop agentic systems, stick to simplicity on the particular task you need the agent to do. Agents aren’t security engineers, they are containerized experts in a small subset of tasks. Ensure they are proficient in these tasks, because what makes them powerful is how you connect them together.

The way I see this working for years to come is that we’ll have agentic workflows handle the “80%” work, such as repetitive tasks or analysis. The “20%” work that requires a ton of focus will be traditional expert work that we know and love. This split still requires us to have deep expertise in our field, but I worry about the value of learning from the more boring or tedious work.

🔬 State of the Art

Detection Stream Sigma Training Playground by Kostas Tsialemis

Tsialemis, a long-time contributor to the detection engineering research space and a multi-time featured author on this newsletter, just published a free Sigma training playground for detection engineers. His associated blog post goes over the platform in detail, but it’s like a CTF for writing rules. There are some cool features which include interactive challenges, responsive feedback to the challenges, and the ability to write your own challenges and contribute them to the community.

A leaderboard always motivates me, too. #8 as of 10 November!

Mistrusted Advisor: Evading Detection with Public S3 Buckets and Potential Data Exfiltration in AWS by Jason Kao

Trusted Advisor is a free service from AWS that helps scan customer infrastructure for misconfigured security and resilience resources. One resource it helps find misconfigurations for is in S3 buckets, which have led to massive security incidents and breaches like those at Capital One and Twitch. So, if you can find a 0-day bypass to a security system like this, it can give an attacker the ability to evade defenses in your cloud accounts. And it appears that is what Kao and the Fog Security team did.

The basic premise behind this attack is setting an insecure policy that would generate an alert from Trusted Advisor, but explicitly denies three actions Trusted Advisor uses for the check.

So the insecure policy statement are lines 4-10, while the bypass occurs in a separate statement on lines 11-17. As it turns out, even AWS can get IAM wrong! Basically, the check failed close here and reported nothing was wrong, where the behavior should be failed open in cases where it can’t receive the telemetry to make an assessment.

The team submitted the security disclosure to AWS, and they fixed it after two tries. It also looks like Fog Security wasn’t happy with how AWS’ publicly disclosed the issue, as it contained an inaccuracy in a non-existent action that the hyperscaler fixed.

All you need to know about JA3 & JA4 Fingerprints (and how to collect them) by Gabriel Alves

This piece is an easy-to-understand introduction to the powerful TLS fingerprinting algorithms, JA3 & JA4. With TLS everywhere, the underlying Application Layer traffic has become much harder to analyze for potential security indicators. You could set up TLS termination, but there’s a large cost associated with building that infrastructure, and decrypting and inspecting traffic also leads to compliance issues.

The JA* algorithms solve this by building fingerprints of the unique characteristics of TLS handshakes. Virtually every implementation of TLS in code has its own quirks and intricacies that make it unique. When you add more infrastructure on top of that, it can be a powerful tool to cluster traffic in ways to identify malware families, hosting infrastructure or bots.

Alves provides readers with some great visuals to understand these unique fingerprints and utilizes the most powerful security tool in existence, Wireshark, to do so.

Agentic Detection Creation: From Sigma to Splunk Rules (or any platform) by Burak Karaduman

I’m seeing more blog posts leveraging agentic workflow platforms to build detection content, and I’m all for it. At this point in our journey in detection engineering, I don’t see why you wouldn’t have agentic rule writing to assist you. Here’s why:

MITRE ATT&CK serves as a rich knowledge base of tradecraft references that we all fundamentally agree is the standard
Telemetry sources are well documented, and the startup cost of booting up an environment for testing is decreasing more and more
Threat intelligence companies and blogs help piece together attack chains that you can generalize
Sigma serves as a universal language that forces rule content structure and documentation, and has a rich library of converters to your SIEM of choice
Detection as code pipelines serve as a quality gate for human review and for testing
SIEM APIs have capabilities to ingest a candidate rule and make sure it’s valid in its native language

Karaduman’s approach here follows the pattern I listed above, and it’s functionally sound. It follows a lot of the fundamentals of the detection engineering lifecycle. The agents take ideation as an input, and continuously research, design, and validate candidate rules. Once the Sigma rule is created, Karaduman leverages sigconverter.io to translate the rule into SPL and has a separate SPL validation agent to make sure it can run in production.

It’s a clever setup with several “smaller” agents performing tasks, which looks to be the optimal setup for this agent-to-agent workflow. I’m impressed at the simplicity of their architecture, and they were kind enough to include the fully visualized n8n workflow for readers to experiment with.

Can you guess what the most crucial step is here? The red box of course! It compiles every piece of documentation in the rule, validates it against Claude’s Sonnet 4.5 model, generates a report and messages the hypothetical detection engineer in email and on Teams.

☣️ Threat Landscape

GTIG AI Threat Tracker: Advances in Threat Actor Usage of AI Tools by Google Threat Intelligence Group

Unlike the cyberslop post from last week, where researchers at MIT made some bold claims on AI usage by ransomware operators, Google’s intelligence group brings the receipts on threat actor usage of LLM tools during operations.

I quite like the coining of “just-in-time” malware leveraged by two families they track as PROMPTFLUX and PROMPTSTEAL. These both generate malicious code on demand, and it looks like a multi-agent step that creates the code and obfuscates it during malware execution.

U.S. Nationals Indicted for BlackCat Ransomware Attacks on Healthcare Organizations by Steve Alder

Two American security professionals were indicted for allegedly working as initial access brokers for BlackCat ransomware. This is a wild story: they both worked for a threat intelligence company named DigitalMint, conducting RANSOMWARE NEGOTIATIONS on behalf of victims. Talk about insider threat, right?

In a classic case of insider threat motives, the main conspirator was in debt and went into business with BlackCat to help relieve that debt. This is a common tactic employed by spy agencies, so, logically, it would also work for criminal gangs.

Ex-L3Harris Cyber Boss Pleads Guilty to Selling Trade Secrets to Russian Firm by Kim Zetter

Is it insider threat week? It feels like insider threat week. Zetter reports of a man who was arrested and found guilty via a plea deal for selling trade secrets to an “unnamed Russian software broker”. The accused worked for L3Harris Trenchant, a U.S.-based developer of zero-day and exploitation tools, and earned over seven figures in the process.

Interview with the Chollima V by Mauro Eldritch, Ulises, and Sofia Grimaldo

This series by the Bitso Quetzal team highlights their research (and shenanigans) with live interviewing DPRK IT Workers. The interesting part of this interview, and potentially a change in WageMole's TTPs, is that they are interviewing and recruiting collaborators to conduct interviews on behalf of WageMole. There were early reports of this happening, but Grimaldo, Ulises, and Eldritch brought receipts in the form of chat logs, Zoom screenshots, and LinkedIn profiles.

LANDFALL: New Commercial-Grade Android Spyware in Exploit Chain Targeting Samsung Devices by Unit 42

LANDFALL is a Samsung Android-based spyware family discovered by Unit 42 researchers. They found this family while hunting for exploit chains related to the DNG processing exploit that Apple disclosed earlier this year. DNG is a file format that both Android and iOS can process, and it’s within this processing logic that the vulnerability and subsequent exploit chain exist.

It’s pretty neat how the Unit 42 team came across this malicious file: they were hunting for DNGs to replicate the iOS exploit and found one that had a Zip file appended to it, but was exploiting Samsung’s recently patched vulnerability from earlier this year. The team pulled apart the malicious DNG, found two .so files and mapped out the command and control network associated with it.

🔗 Open Source

OSINTI4L/Paper-Pusher

A Bash script for sending spam to WiFi-connected printers over LAN.

😭😭😭

karlvbiron/MAD-CAT

MAD-CAT is a chaos engineering tool that implements data wiping and corruption attacks against databases to simulate database failures and data wiping-style attacks for detection engineers. It supports six database technologies: MongoDB, Elasticsearch, Cassandra, Redis, CouchDB, and Apache Hadoop.

FoxIO-LLC/ja4

JA4 TLS fingerprinting library referenced in Alves’ post above. I’ve linked JA4 before, but it’s a seriously effective tool to add to detection arsenals, especially if you can instrument it in publicly accessible servers.

EvilBytecode/NoMoreStealers

A Windows minifilter driver that blocks filesystem access to specific file paths to prevent infostealers. The hardcoded paths it protects include browser secret data, cryptocurrency wallets and secrets, and chat applications.

Idov31/EtwLeakKernel

Event Tracing for Windows (ETW) consumer that requests stack traces to leak Kernel addresses. This can help with exploit development if you need to exploit a Kernel vulnerability and require base addresses, potentially defeating ASLR.

DEW #136 - ATT&CK V18 deep dive, Cyberslop @ MIT & Aisuru repurposes to residential proxies

Detection Engineering Weekly

By: Zack Allen

5 November 2025 at 14:03

Welcome to Issue #136 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week:

I’m trying something different here and performing a deeper analysis on content where I think it matters for y’all. It won’t happen often, but whether it’s a Gem or a piece of Threat Landscape news, I want to give you all my take beyond what you normally see, especially if it’s a story I’m particularly passionate about!
I just hit my 4-year anniversary at Datadog, so time is flying by. My 3-year anniversary for the newsletter is in a few weeks and it feels wild thinking about doing this for 36 months.
I stole every adult-sized candy bar from my kids at Halloween, and I didn’t think twice about it.

This Week’s Sponsor: Hack The Box

Your Tools Don’t Defend. Your People Do.
Threats evolve faster than your tech stack. Hack The Box keeps your teams ahead of attackers with hands-on, continuous upskilling that powers real Continuous Threat Exposure Management (CTEM).
Equip your people with the skills to validate, prioritize, and respond effectively and build the true resilience that keeps your organization ready for whatever comes next.
Get Your Team Started

💎 Detection Engineering Gem 💎

ATT&CK v18: The Detection Overhaul You’ve Been Waiting For by Amy L. Robertson

New ATT&CK version drops always deserve a feature in this newsletter, and I’m very pleased to see the changes in v18!

There are several techniques and procedures added to the ATT&CK arsenal, but I’d like to focus my analysis on the usefulness behind Detection Strategies for detection ideation and tuning.

Detection Strategies

The new version shipped a large change in how ATT&CK approaches detections via Detection Strategies. I wrote about this in Issue 121, but the common gap with ATT&CK is linking a technique or procedure to detection guidance. Through the use of STIX Domain Objects, defenders can now leverage these detection opportunities via machine-readable data, rather than relying on freeform text. Here’s an example leveraging Scheduled Task/Job Abuse:

I used Linux as an example here. You have three data components associated with finding scheduled job attacks. Each of these components has a log source name and channel. So, for line 6 (DC0061), you can use auditd syscall monitoring and look for writes and renames of cron files. The mutable elements part helps with detection tuning, and this can be everything from frequency analysis to environmental context, such as unusual users scheduling jobs.

Enterprise Updates & ESXi Detection Strategy Example

The team added several new tactics, and there seems to be a big push on cloud-native technologies. For example, adding the Container CLI or API (in the case of Kubernetes) is a great step to capturing how threat actors are moving away from on-prem technologies but using similar techniques to move through the kill-chain.

Local Storage Discovery, for example, highlights typical discovery tradecraft for finding interesting volumes on a victim machine. But there’s nuance here with whether you are on a cloud server, Windows host, or a Hypervisor. Looking at the Detection Strategy DET0188, a detection engineer can switch between Analytics platforms and perform their own testing based on the data components and channels. Now let’s work through tuning, and I’ll pick on this Sigma rule, ESXi Storage Information Discovery Via ESXCLI.

Nas’ and Maurugeon’s rule successfully implements the Data Component → Name → Channel analytic, but the rule may be broader (high recall) and requires tuning. If you study the Mutable Elements table, you can scope this rule down to restricting alerts based on ssh_source_ip being from outside your perimeter, or by tuning the esxcli_command_scope. Let’s tune via the command scope.

Reading the developer portal for esxcli, and with a bit of help from Claude, the command scope namespace looks like the following:

Lines 40-42 could be potential tuning updates to the Sigma rule to make it more precise. This would obviously need some testing, but moving from Analytic → Sigma Rule → ESXi command line documentation (thanks, Claude) to tuning was much easier.

For a deep dive into this type of detection research, check out Nathan Burns’ blog on the topic, which I posted in Issue 100 as a Gem.

Why is this important?

In my example, I walked through a tuning opportunity for ESXi. I’m not an ESXi expert, but I have good knowledge of Linux threat detection and MITRE ATT&CK. The Detection Strategy quickly oriented me to understand core detection opportunities, but also provided tuning ideas for broad to precise esxcli commands to alert on. Additionally, it took it a step further with SSH source IP environment hardening.

The ATT&CK knowledge base can now serve more than just a reference table for techniques. You can dive into each technique, get relevant examples for threat actors, and it points you to strategies with specific data sources and channels to alert on. It cuts down the time I would spend on Googling or setting up environments to smash my head on the keyboard until I get the right logging configuration to generate the alert telemetry.

☣️ Threat Landscape

CyberSlop — meet the new threat actor, MIT and Safe Security by Kevin Beaumont

This new series by Kevin Beaumont revolves around a new term he coined, “CyberSlop.” The definition I’ve gleaned from his writing is taking traditional FUD marketing techniques in cybersecurity and leveraging trusted institutions (like MIT in this story) to make AI-threat claims even more credible, especially through research papers and blog posts that lack evidence.

The story in this first edition revolves around a bold claim by MIT researchers in a paper that 80% of ransomware gangs use AI in their operations. After digging into the paper and publicly calling it out, it disappeared from the MIT website. Two of the authors are from Safe Security, a cybersecurity startup. As it turns out, the principal MIT researcher is on their board, with no disclosure of this conflict of interest in the paper.

Aisuru Botnet Shifts from DDoS to Residential Proxies by Brian Krebs

DDoS-for-hire botnets don’t pay enough to criminals who run them. At the end of the day, it's an inconvenience that sites suffer, and the Googles and Cloudflares of the world have gotten so good at soaking traffic, making me think they are even more irrelevant than before.

Residential proxies, on the other hand, are where money CAN be made. And this piece on the Aisuru botnet, a DDoS-for-hire botnet turned into residential proxy provider, is a good breakdown of these intricacies. In this post, Krebs exposes a web of proxy services, parent companies and the grayhat style recruitment they have of unsuspecting devices to build their new-age botnet.

Ukrainian National Extradited from Ireland in Connection with Conti Ransomware by U.S. Department of Justice

The U.S. DoJ extradited a suspected Conti member residing in Ireland. Lytvynenko was first arrested in 2023 at the request of the FBI, and has been facing extradition proceedings since then. There are some wild numbers cited in this report, which highlight the prolific nature of Conti. Lytvynenko is accused of extorting $150 million in ransomware payments from Conti victims alone.

SesameOp: Novel backdoor uses OpenAI Assistants API for command and control by Microsoft Incident Response

This is the first threat report I’ve read where a threat group leverages OpenAI as a C2 channel. SesameOp is the name of a new malware family by Microsoft Incident Response which uses OpenAI’s now-deprecated and slated-for-removal next year Assistant API. The malicious DLL queries the Assistant API vector store to find infected hostnames and then leverages the Assistant’s description field to execute a command.

The vector store part here is interesting because I imagine it makes detecting abuse much more challenging for security teams at OpenAI. You can typically scan platforms for victim or malicious domains, but do you now need to scan every vector store for the same thing?

A new breed of analyzers by Daniel Stenberg

Stenberg, the creator and head maintainer of cURL, triages and patches numerous security vulnerability submissions. In the before AI times, these submissions were (mostly) done by humans with some level of automated slop from fuzzers. Since then, a large number of LLM-generated slop submissions have burdened the cURL team.

It was cool seeing this update almost as a Part 2 of the post I linked. AI-backed vulnerability discovery and submission platforms are getting much better, especially those that have venture capital behind them, rather than a “researcher” running some LLM locally to find security weaknesses.

🔗 Open Source

kas-sec/version.dll-sideloading

Neat proof of concept abusing OneDrive.exe and DLL sideloading to gain execution in the OneDrive process. Once it gains execution, the malware registers exception hooks via Vectored Exception Handling (VEH) to bypass EDR detection. The registered exception handler hopefully avoids being hooked by the EDR process so you can evade detection.

center-for-threat-informed-defense/attack-workbench-frontend

ATT&CK’s frontend application that serves as a self-hosted knowledge base for detection engineers and the ATT&CK library. With the latest v18 release, you’ll see additional resources leveraging Detection Strategies.

loosehose/SilentButDeadly

EDR killer technique that leverages the Windows Filtering Platform to prevent EDR agents from phoning home to cloud infrastructure. Super useful for preventing alerts from being sent to the cloud, but could still be noisy as an EDR evasion technique.

zopefoundation/RestrictedPython

Sandbox-like Python runtime execution environment for running untrusted code. It’s not a sandbox like a virtual machine, but it’s a subset of the Python language that restricts risky primitives in Python that can be used maliciously.

malwarekid/OnlyShell

Go-based reverse shell handler that integrates several types of reverse shells into one interface. So if you have a bash reverse shell and a PowerShell cmdlet reaching out, it will automatically detect the environment and shell type so you can select via its TUI-like interface.

DEW #135 - Chaos Detection Engineering, Connecting Policy to IR playbooks & Spooky AWS Policies

Detection Engineering Weekly

By: Zack Allen

29 October 2025 at 13:03

Welcome to Issue #135 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I’m helping host the second edition of Datadog Detect tomorrow! We have an excellent lineup with folks I’ve featured several times on this newsletter. It’s fully free, fully online, and also available on-demand. We have a small capture the flag afterward to win some socks.
- 👉 Register Here 👈 and don’t forget to meme out in the webinar chat like last time.
- We had close to 1000 chatters so it felt like a Twitch stream
I’m all booked for London and got some excellent pub and restaurant recommendations. Please keep them coming :D

This Week’s Sponsor: detections.ai

Community Inspired. AI Enhanced. Better Detections.
detections.ai uses AI to transform threat intel into detection rules across any security platform. Join 9,000 detection engineers leveraging AI-powered detection engineering to stay ahead of attackers.
Our AI analyzes the latest CTI to create rules in SIGMA, SPL, YARA-L, KQL, and YARA and translates them into more languages. Community rules for PowerShell execution, lateral movement, service installations, and hundreds of threat scenarios.
Join @ detections.ai
Use invite code “DEW” to get started

💎 Detection Engineering Gem 💎

How to use chaos engineering in incident response by Kevin Low

Hey look, security steals SRE concepts again and it’s a beautiful thing! Jokes aside, this is a concept I’ve believed heavily in since I started working professionally with SRE organizations 10+ years ago. Chaos engineering is a practice that intentionally injects faults into a production system to test resiliency and build confidence in the face of resiliency failures. Basically, it challenges you to break something to see how fast you can react and recover to an outage, almost like intentionally popping a tire on your car to see how well you react and can change it.

This seems applicable to security, no? That’s where Low’s post comes in to test the idea. First, Low makes a gentle introduction to the concept and then presents a test architecture and a threat model in an AWS environment to experiment with.

Figure 2: Architecture after GuardDuty detects unexpected activity and the security team isolates the EC2 instance

In this scenario, a microservice experiences some unexpected security activity and GuardDuty generates an alert. If you shut down an EC2 instance, what exactly happens? Enter Chaos Engineering!

There are five steps in a Chaos Engineering experiment: defining the steady state, generating a hypothesis, running the experiment, verifying the effects, and improving the system. This has a nice carryover for testing detections and their infrastructure in production states.

Steady State: What is our baseline for MTTR and MTTD? What is the general uptime of our log sources? What configurations are in place to prevent attack paths?
Hypothesis: When a workstation queries a known malicious domain, our SIEM will detect it within 15 minutes, notify the security team within 2 minutes, and the machine will be contained 1 minute after that
Running the experiment: Load a benign domain inside your threat intelligence look up tables, remotely connect to a machine and perform a DNS lookup for the benign domain.
Verifying the effects: Did we generate an alert in the SIEM? Was there a Slack notification to contain the host? Did it fall within our hypothesis’ parameters?
Improving the system: The Slack alert did not defang the domain, the containment tooling only blocked the domain and not the resolution IP

I love this approach, and I’m unsure whether any companies are considering this type of fault or “adversary injection”- style testing. Breach Attack Simulation products focus on coverage of rules, but I haven’t seen anyone think about this from a Detection & Response validation angle.

🔬 State of the Art

A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises by Filippo Valsorda

In this post, Valsorda performs a retrospective survey analysis of all open-source supply chain attacks from 2024 to 2025. At Datadog, we collect 100s to 1000s of these types of malicious packages to help defend our environment, but a supply chain compromise is more than just a malicious package. These last 3 months alone have had compromises that made mainstream news, such as Shai-Hulud and s1ngularity.

Valsorda grouped the root causes of 17 major attacks to help readers understand initial access and subsequent attack paths. Funny enough, phishing was the number one root cause of these package takeovers, and the number two was a new attack path I haven’t been able to put into words: control handoff. The basic premise behind control handoffs is that it’s part social engineering, and IMHO, part insider threat. For example, the infamous xzutils attack originated when a developer gradually added a backdoor to the library over time. The polyfill[.]io attack involved purchasing a domain that had expired and the new owner served malicious Javascript to victims.

It’s a fascinating read as a survey blog, but it highlights how fragile the open-source software ecosystem is. It’s unfair how large companies and organizations demand feature and security work from some of these projects without pay, and, understandably, burnout from these demands has become a real security issue once attackers exploit them.

Re-Writing the Playbook — A detection-driven approach to Incident Response by Regan Carey

Merging governance, risk and compliance documents and policies across an organization is difficult. I think the most salient example of incorporating a policy into practice is mandatory 2FA. You write a policy that mandates 2FA, perhaps based on a SOC2 or ISO27001 audit, and your IT team buys physical YubiKeys and Google Workspace to ensure that all authentication requires a USB-C dongle.

This gets harder and more nebulous in the threat detection space. 2FA is clean and measurable; you can pull reports of the number of employees enrolled in 2FA and drive it to completion. But, how do you drive a Ransomware Response Playbook into completion? Is it that you have a playbook? Is it that you have EDR tooling, plus a playbook? Or is it that you have a playbook, you have EDR tooling, and you have Bob from IT who presses a button when an EDR fires?

But what about individual rules that respond to ransomware? Are they firing accurately? Is the SPECIFIC response playbook inside the rule up to date? When do you know it's out of compliance with the overall playbook? I think the answer is: you don’t and you won’t. This is where Carey begins their exercise and proposes their Incident Response Diamond concept.

Translation and mutation of data can result in loss of specificity, which is no different from a data engineering pipeline problem. Data engineering solves this through meticulous field mapping and clear documentation. I think this is what the Diamond concept Carey is proposing here. Basically, they define a handoff between non-technical playbooks into rules, but they keep a lineage of how certain playbooks are invoked by rules so you know which policy it falls under.

I think this is a great approach, but it means your security response and GRC teams need lots of alignment to pull it off. Documentation is one of the hardest parts of security, and keeping rules up to date is already hard enough.

Fantastic AWS Policies and Where to Find Them by David Kerber

The hardest thing in Computer Science is cache invalidation. The second hardest thing in Computer Science is naming things. For security, I think the hardest thing is understanding cloud identity models. The second hardest thing is also naming things.

One of the best ways in AWS to reduce the blast radius of attacks, or prevent attacks altogether, is to leverage the myriad of AWS policies that they make available to customers. But a word of caution from Kerber: the amount of tools you have at your disposal here can also be your downfall. In fact, as Chester Le Bron puts it:

You now need to become a SME in the operating system called AWS and its core services, some of which (like IAM) could be considered its own OS due their complexity

So, in this post, Kerber outlines every type of AWS policies to help manage access. There are several types, some allow you to Allow or Deny access, while others only Deny, and you can split these types across things like Users, Resources, Service Accounts and even GitHub Actions.

Luckily, each section is split up to help folks use this blog as a reference post in case you need to come back to remember. They also open-sourced a tool called iam-collect to help retrieve all of these policies locally for analysis. I’ll list the tool at the open-source section at the bottom of this week’s issue!

Introducing CheckMate for Auth0: A New Auth0 Security Tool by Shiven Ramji

CheckMate is a free Auth0 tenant configuration tool that operates as a CSPM for Auth0 deployments. They have several checks for all kinds of misconfigurations present in the Auth0 environment, and you can run them on an interval to detect drift of the environment and fix it before it becomes a problem. One of the cool parts here that is less CSPM-y from a pure security product perspective is their extensibility runtime checks. It’ll do several checks against custom Auth0 runners to find everything from hardcoded passwords to vulnerable npm packages.

☣️ Threat Landscape

UN Convention against Cybercrime opens for signature in Hanoi, Viet Nam by United Nations Office on Drugs and Crime

The United Nations host their “Convention on Cybercrime” in Vietnam last week. Besides sounding like a sick conference (I hope someone wore a hacker hoodie), they had 72 countries sign an international treaty that provides guidance and guardrails for nations to battle international cybercrime. The post has some interesting highlights from the treaty, including standards for electronic evidence collection, the ability to share data easily, and it recognizes that the dissemination of non-consensual sexual images is an offense.

Lessons from the BlackBasta Ransomware Attack on Capita by Will Thomas

Cyber threat intelligence G.O.A.T. Will Thomas dissected the 136-page ICO report on Capita Group’s breach by BlackBasta in 2023 for some juicy intelligence and lessons learned. The cool part of this is that Will found messages from the BlackBasta chat leak that line up with the timeline published in the ICO report.

It’s nice to get commentary from a CTI expert on publicly facing penalty notices and disclosures. Lessons learned are great at a high level, but digging into exact TTPs from BlackBasta and comparing them to the material failures within the security program at Capita are way more useful to the rest of the security community.

CVE-2025-59287 WSUS Unauthenticated RCE by Batuhan Er

This week, Microsoft released an out-of-band vulnerability update for its Windows Server Update Service (WSUS) product. WSUS allows Microsoft administrators to manage the installation Windows updates in their fleet. The deserialization vulnerability results in Remote Code Execution, so Microsoft labeled CVE-2025-59287 as a 9.8.

In this vulnerability walkthrough, Er follows the vulnerable code path and ends with a PoC to exploit the vulnerability. The discovery here is that WSUS deserializes encrypted XML objects unsafely in the GetCookie() endpoint. You can send over any arbitrary object (or a specially crafted one) to get RCE.

Exploitation of Windows Server Update Services Remote Code Execution Vulnerability (CVE-2025-59287) by Chad Hudson, James Maclachlan, Jai Minton, John Hammond and Lindsey O’Donnell-Welch

As a follow-up post to Er’s above, the Huntress team found in-the-wild exploitation of CVE-2025-59287. A handful of their customers had Internet-exposed WSUS servers. When the vulnerability details and subsequent PoCs dropped, attackers leveraged the exploit against exposed servers. Most of the activity looked like initial reconnaissance, but this post goes to show how fast you have to react to emerging vulnerabilities, especially when you have misconfigurations that could have prevented exploitation.

The team also dropped a Sigma rule and IoCs for readers to hunt on.

Hugging Face and VirusTotal: Building Trust in AI Models by Bernardo Quintero

This is a ~small product update for VirusTotal’s integration into HuggingFace’s registry of AI models. I usually don’t post product updates, but both VirusTotal and HuggingFace are community-driven products. It’s nice to see the VirusTotal team commit to helping developers identify malicious models hosted on HuggingFace.

🔗 Open Source

auth0/auth0-checkmate

GitHub link for the Checkmate project that was open-sourced by the auth0 team. You can see all of their checks in code and it looks like it operates similarly to how prowler works.

cloud-copilot/iam-collect

Kerber’s iam-collect repo from the story I linked in State of the Art above. Give it access to your AWS environment and it’ll rip through the IAM policies and download them to disk. It links to a separate GitHub project called iam-lens to help simulate and evaluate effective permissions.

EmergingThreats/pdf_object_hashing

PDF Object hashing is a technique similar to imphash where you compare structures of PDF documents without focusing on the content inside. impash is a helpful technique with identifying similar binary features and symbols so you can cluster malware samples to find new ones. This follows the same philosophy so you can cluster malicious PDF documents using similar techniques.

chainguard-dev/malcontent

I’ve been following Chainguard’s malcontent project for a while and it looks like they’ve been throwing a lot of development at it. It’s a supply-chain compromise detection system that uses a butt-ton (yes, a butt ton) of analysis techniques, including close to 15,000 YARA detections, to help detect these compromises before they make it into your build and production systems.

ForensicArtifacts/artifacts

Machine-readable knowledge base of forensic artifact information. It has a good amount of yaml files that store metadata around specific sources and what files and directory paths you can use during forensic analysis.

DEW #134 - Prioritizing Critical Assets, AI SOC means MORE alerts and Microsoft CoPilot Phishing

Detection Engineering Weekly

By: Zack Allen

22 October 2025 at 14:03

Welcome to Issue #134 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I popped and tore muscle/cartilage in my ribs on Friday. Urgent care sent me to the ER, and the ER laughed at me and said I’m too young to hurt my ribs and come to the hospital, so they sent me home D:
I’m booking a (small) solo trip to London in December. Who’s got restaurant and more importantly Pub recommendations in the Soho area? Shoot me a message and i’ll buy you a (virtual or not) pint!
I get some AMAZING content sent to me in all kinds of mediums, but it’s hard for me to keep track. So, I made a submissions form @ https://submit.detectionengineering.net that sends your blog details straight to my Notion. If you are writing something, I want to know!

This Week’s Sponsor: detections.ai

Community Inspired. AI Enhanced. Better Detections.
detections.ai uses AI to transform threat intel into detection rules across any security platform. Join 9,000 detection engineers leveraging AI-powered detection engineering to stay ahead of attackers.
Our AI analyzes the latest CTI to create rules in SIGMA, SPL, YARA-L, KQL, and YARA and translates them into more languages. Community rules for PowerShell execution, lateral movement, service installations, and hundreds of threat scenarios.
Join @ detections.ai
Use invite code “DEW” to get started

💎 Detection Engineering Gem 💎

Critical Asset Analysis for Detection Engineering by Gary Katz

If everything is Priority, nothing is Priority

I think about this mantra when I am looking at team planning for our security org at $DAYJOB. Security has a thankless job in many ways: when things go wrong, we are both in the spotlight and under a microscope. When things go well, we may seem invisible to others. This means scrutiny comes at the worst times, such as during an emergency, and the amount of planning and prioritization you do beforehand can really showcase how mature you are as a security program.

Lots of detection blogs I read talk about sending telemetry into a SIEM or a logstore and how to run detection logic over that telemetry. These blogs have a large assumption: every piece of telemetry is created and maintained equally. In the real world such as business, this is the furthest from the truth. A workstation going offline versus a domain controller going offline is an example here, and what Gary calls a “chokepoint.”

These chokepoints are assets that become the biggest target for adversaries, and labeling them as Critical Assets provides clarity to your security team and your leadership team that you are putting focus in the right spots. The Critical Asset approach here requires conversations up and down your reporting chain, but it should render insights into what a detection team should prioritize first:

I love this approach because it shifts the conversation away from 100% MITRE coverage across everything to focused and directed coverage on your organization’s most critical services and assets. According to Katz, this methodology should output a prioritized list of assets, relevant attack paths, and coverage metrics that you can provide to others in your organization to showcase the value in peacetime (not during an incident).

The only part of this approach that I struggle with, not specifically with Katz’s but in general, is that it’s hard to highlight coverage on assets as the list grows.

🔬 State of the Art

How AI Transforms Detection Engineering by Filip Stojkovski

Like most things in security, detection engineering is a capacity problem. Every security operations function has three knobs to dial to scale their org, and they all come at some cost: people, process, and technology. SOCs address the need for scale through people, but it’s not linear because you can only triage more alerts with more people. This is where process and technology help scale the function, especially if you have a solid engineering foundation and a healthy department culture that constantly updates processes.

One of a detection engineer’s most potent “knob” is tuning how much or how little threat activity and benign traffic you capture. So, according to Stojkovski, this knob has always leaned towards precision (what we capture is relevant), as we don’t want to overwhelm the capacity of triage analysts. But, does this change with the advent of AI SOC technology?

Stojkovski argues it does, and I will have to agree here. LLMs help us turn the “technology knob” way way way up, which means we gain a scale advantage that isn’t pinned to linear growth of humans. I also really like their nuance that the focus of this tech should be on true positive benigns and false positives, which means analysts focus more on real incidents versus wasting time tuning alerts that can waste 10-15 minutes of an analyst’s time.

CoPhish: Using Microsoft Copilot Studio as a wrapper for OAuth phishing by Katie Knowles

~ Note, Katie works at Datadog and is my colleague ~

AI-based features introduce risk we’ve never seen before, and it’s easy to see why the hype matters. Prompt injections lead to some funny outcomes, but the more overlooked part of AI implementation is tried-and-true vulnerabilities. Developer teams are being forced to push features out so they don’t last to market, and misconfigurations and non-standard development workflows creep into production, leaving users and organizations alike vulnerable.

This is the case with Katie’s latest research into Microsoft’s CoPilot studio. CoPilot studio is Microsoft’s workbench product for developers who want to create AI chatbots. According to Katie, it has some confusing UI/UX workflows for authenticating to a chatbot, as well as poor permission structures, which allow attackers to create OAuth Consent Phishing attacks.

An attacker can use a malicious Copilot Studio agent to trick a target into an OAuth phishing attack. The attacker or agent can then take actions on the user's behalf (click to enlarge).

Desired State Configurations by smash_title

This is the first time I’ve heard of Microsoft’s infrastructure-as-code and configuration management policy language, Desired State Configurations (DSC). So, this was a helpful post for me to understand Microsoft’s approach to DevOps using native tooling from the hyperscaler. smash_title came across this technology set while creating a detection engineering-style lab for Azure Virtual Machine Windows and Linux detection testing.

It does look similar to the likes of Terraform and Ansible depending on which of the three versions you are using. There are some neat features that I don’t think I’ve seen in other similar technologies, such as drift detection and correction, and workstation resource management. It looks like Microsoft is sunsetting the earliest version that relies on PowerShell, and wants to move to a pure JSON/YAML-style declarative format, but they seem to be pretty far away from feature completeness on the newer versions.

Introducing HoneyBee: How We Automate Honeypot Deployment for Threat Research by Yaara Shriki

HoneyBee is an open-source toolset that automates the creation of honeypot stacks leveraging LLMs. Unlike other honeypots that put LLMs inside the web-app to mimic an environment, this one focuses on the configuration management and infrastructure component, which I think is a much more fruitful approach for detection engineers.

You provide access to your favorite foundational model, select a technology stack, and select one or many misconfigurations in the Wiz catalog, and it generates docker-compose files for use. This is helpful when you are building detections for specific stacks and want to see how telemetry is generated after a misconfiguration is exploited. Alternatively, you can deploy this on a honeypot listening on the Internet to collect indicators of compromise.

From Logs to Leads: A Practical Cyber Investigation of the Brutus Sherlock by Adam Goss

This is an in-depth walkthrough of the forensics challenge “Brutus” on hackthebox. I like Goss’s approach of splitting the investigation into four distinct skillsets: interpretation, collection, capability comprehension, and manipulation. Each one of these skills involves understanding a target system’s technology stack, gathering necessary data from various sources, and then using the tooling you have at your disposal to interpret the timeline of events.

☣️ Threat Landscape

[RESOLVED] Increased Error Rates and Latencies by Amazon Web Servers

What a crazy turn of events: a cascading DNS failure starting in AWS DynamoDB, which then affected an internal service supporting launching EC2 instances, which then messed up health checks on load balancers and spread through 142 separate services.

To Be (A Robot) or Not to Be: New Malware Attributed to Russia State-Sponsored COLDRIVER by Wesley Shields

This GTIG blog is a great example of how threat actors can rapidly adjust their malware development as they deploy it. Shields profiles COLDRIVER (aka Star Blizzard)’s new malware delivery chain. It uses phishing as the initial lure, which leads to a ClickFix infection. During the infection, COLDRIVER leveraged a clunky Python-based backdoor, then began simplifying the malware away from Python and focusing on PowerShell. It looks like COLDRIVER abandoned Python because it needed a Python runtime to execute, whereas PowerShell is native functionality in their victim set.

Email Bombs Exploit Lax Authentication in Zendesk by Brian Krebs

Threat actors bombarded customers of large Zendesk customers last week using flaws in how Zendesk is configured. The misconfiguration allows people who have access to company Zendesk portals to send out ticket creation notifications that come from the company domain. Most of these were spam and troll-style messages, even some accusing Krebs of breaking the law.

But it goes to show how SaaS apps have multiple layers of configuration and can lend themselves to abuse scenarios like this if someone looks hard enough.

Revelations on Group 78, the secret US task force that fights cybercriminals by Martin Untersinger and Florian Reynaud

I was skeptical reading this headline because I’ve been burned by mysterious marketing-style blog posts, but then I realized it was an expose from Le Monde. Untersinger and Reynaud provided readers some extraordinary background into the alleged FBI Ransomware Disruption Taskforce, Group 78. The goal of the group is to perform ransomware disruption operations, up to and including arrests of suspected ransomware operators. They leverage a variety of legal and more modern tactics, such as exposing criminals' identities.

The hope is to pull all the levers they can find to degrade the trust between ransomware groups, and to be honest, I like this approach. For example, Untersinger and Reynaud assert that the ExploitWhisperer leak of over 200,000 BlackBasta Telegram messages may have been from Group 78.

🔗 Open Source

smashtitle/DesiredStateConfigurations

smashtitle’s GitHub repository for their DesiredStateConfigurations research, I posted above in the State of the Art section. The cool part about this is that it’s a single PowerShell script that sets up a lab environment tailored for detection engineering on Windows. It removes a lot of B.S. out of the box services and applications that may cause a lot of noise for people who run the lab.

yaaras/honeybee

Repository from Shriki’s research on building honeypots using LLMs. They have a neat misconfiguration index that you can use as a dropdown in your prompt on specific technologies, so that you not only build the honeypot but also intentionally misconfigure it for detection rule coverage and lure the bad guys to exploit it.

dobin/DetonatorAgent

Detonation platform for malware development and telemetry collection. The initial idea was to develop malware and test it with Windows via the DetonatorAgent Virtual Machine. It can collect telemetry from the environment as well as from EDR.

google/osdfir-infrastructure

Helm Charts for various open source DFIR infrastructure built at Google. You can run things like minikube locally to take advantage of this, or even deploy it up on managed Kubernetes on AWS or GCP.

DEW #133 - Redefining Security Visibility, TTP-First Hunting & F5 breach

Detection Engineering Weekly

By: Zack Allen

16 October 2025 at 14:03

Welcome to Issue #133 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week:

I did a family road trip for the long weekend to my hometown. I’m happy to report to other parents that I’ve had my first experience of a kid throwing up in the backseat. Do I earn a badge of honor here?
Datadog Detect is BACK for round 2, so please sign up and see some excellent Detection Engineering talks! It’s free, fully remote, and there will be activities (yay!) and labs for conference goers.

⏪ Did you miss the previous issues? I’m sure you wouldn’t, but JUST in case:

💎 Detection Engineering Gem 💎

What Does “Visibility” Actually Mean When it comes to Cybersecurity? by David Burkett

The most frequent question I get from my boss at Datadog is “Are we covered?” It’s a simple question, but it’s extremely hard to answer. What does covered mean? Are we covered now, before, or in the future? Do you mean MITRE rule mappings, operational maturity, incident readiness, or threat intelligence awareness? It turns out that agreeing on a singular definition of anything in security is difficult!

It was nice to read Burkett’s post here discussing the varying definitions of visibility. Like most industry standards, several companies and organizations have attempted to define visibility, but no single standard or definition has emerged as the true winner. David adapted Splunk’s blog on observability into the security operations space, and I think it works beautifully:

Visibility is the holistic state wherein a system generates telemetry, is subject to robust monitoring for known conditions, and possesses observability, enabling deep, exploratory analysis to diagnose novel problems. Full visibility is achieved only when these three elements are cohesively integrated, allowing operators to move fluidly from detecting a known issue (monitoring) to exploring its unknown root cause (observability), all supported by a common foundation of high-quality data (telemetry).

He then fits this mental model into a 3 tiered definition based on who is asking about visibility. The three tiers look like they are inspired by tiered types of threat intelligence: strategic, operational and tactical. This is also a great approach because visibility means something different based on the customer you are talking to.

Senior leaders typically care about the full visibility of the business, not necessarily the individual elements along the ATT&CK chain. When you get to operational, you focus on the attack surface, such as endpoint, network, and SaaS. Each one of these attack surfaces can have many telemetry sources, think EDR and Secure Web Gateway for domain visibility. Lastly, he rounds out tactical visibility by examining specific telemetry sources, like EDR, and moving through MITRE ATT&CK to assess visibility in each stage.

All models are wrong; some are useful. This may not be “perfect” in terms of defining visibility, but in my opinion, it’s a good mental model. It pulls inspiration from SRE concepts like observability and fits that into the context of a security program’s healthiness based on the customer who is asking.

🔬 State of the Art

Hunting Beyond Indicators by Sam Hanson

Threat Hunting is the art of managing false positives. The basic idea is that you switch the premise of triage. Detection engineering and hunting means you want to cast a wide net in your queries to find needles in a haystack, but in the former, you want as little hay as possible. Maybe I can keep this imagery going and talk about separating wheat from chaff?

Alright, alright, enough farming analogies. I included this post because it shows the tradeoffs of hunting when starting with threat intelligence indicators versus adversary TTPs. When you plan and execute a threat hunt, the expectation is to find many results and have time to sift through them, using down-selection techniques to determine if there is an intrusion. The order of down-selection matters, though. According to Hanson, you want to start with tactics and techniques first (which I agree with), and then filter by other components like threat intelligence indicators.

If you start with threat intelligence indicators, you introduce a selection bias because they are brittle selectors and, by nature, won’t catch unknown IOCs. Focus on TTPs first, down-select to find unknown IOCs, and feel free to use IOCs after for additional enrichment.

Intuition-Driven Offensive Security by Andy Grant

When I first started working in security, becoming a red-teamer or a pentester felt like a class of jobs reserved only for the most technical experts in the field. There’s something beautiful in deconstructing assumptions of systems, building tools to probe those assumptions for weaknesses, and then exploiting those assumptions to achieve that objective. At the time, I was only aware of jobs at consulting firms that had intense interview processes, so I never felt I could make it.

As I progressed in my career, I started to meet and work with red teams. They typically fit into a mold where they engage and produce a report. As a blue teamer, it was hard for me to understand the value of a report when the engagement with that same team stopped after the delivery. I think this was the same feeling that some other companies felt after engaging a pentesting firm. The hard work started with the findings, not the engagement.

Grant visits this concept and provides a better working model for red teamers that he dubs as intuition-driven security. The three principles he lays out focus on understanding the risk behind an implementation rather than hunting and reporting bugs. IMHO, this is a much sounder approach because it forces red teamers to think like a security engineer rather than a pentester. If the outcome is risk reduction, the incentive structure rewards knowledge of the engineering behind a service. This knowledge drives empathy of the problems the service solves and serves as a forcing function on closing the security gaps the team finds during an engagement.

Practical Resources for Detection Engineers. || Starters 🕵🏻 and Pro || by Goodness Adediran

I love reading “Introduction to Detection Engineer” posts because you get a good diversity of thought around how to break into the field. Some folks focus on the expertise required to break into it, but can leave it vague enough to make it easy to retrofit into your life situation. Others may look at more tactical details like technologies to learn, such as SIEMs or languages. Adediran took an approach that I first saw from Katie Nickels’ in her series on self-studying for Threat Intelligence.

This post provides a self-study roadmap for readers who want to break into detection engineering. Adediran splits this up into foundational blogs on the subject, studying MITRE to get a better understanding of how it maps to rules, and then crescendos out to specialist subjects across several mediums like blogs, videos, books, open-source repositories and podcast episodes.

Purple Team Maturity Model: From Chaos to Controlled Chaos by Silas Potter

I’m a big fan of maturity models, because they set a clear direction and roadmap for a program or function, but leave enough wiggle room to add, remove, or change milestones to fit your business context. In my professional experience, they’ve helped me set a tone for reporting maturity to leadership and provide an excellent north star for folks reporting into my org. So, when a new “maturity” model pops up in my feed, I almost always read it and steal ideas to use for my own purposes :).

Purple Teaming is an excellent way to improve the operational robustness of your detection program, so I was pleased to see Potter’s approach here to quantify how to achieve a well-oiled purple teaming function. Notice that this isn’t about a specific team doing purple teaming; instead, it’s a program across multiple teams, the obvious one being the joining of red and blue teams. I like this approach because it helps unite two teams who may not be talking to each other and showcases the value of both functions by driving detection outcomes rather than churning out rules or red team reports.

☣️ Threat Landscape

K000154696: F5 Security Incident by F5

Network and security appliance F5 posted a harrowing security incident update involving a “highly sophisticated nation-state threat actor.”. This threat actor had long-term access to their product development environment, and according to cvedetails, F5 has close to 300 products. With the ability to download code and knowledge bases, a well-resourced actor could use that access to do product research and reverse engineering for competitive products in their home country or for the ease of vulnerability research.

Securing the Future: Changes to Internet Explorer Mode in Microsoft Edge by Gareth Evans

The Microsoft Edge security team installed a new secure-by-default configuration for Internet Explorer Mode in Microsoft Edge. This is the first time I’ve heard of Internet Explorer Mode, and I already had a chuckle reading this because I had a feeling it had to do with active exploitation of legacy Internet Explorer code shipped inside Edge, and voila!

The team seemed to plug the holes of some of the exploit vectors, but they switched off certain UI elements by default to limit the blast radius of threat actors abusing the backward-compatible technology. Basically, if you have to use this mode, it’s shipped with minimal functionality to access the resources you need, and an administrator must turn on any additional functionality.

Rubygems.org AWS Root Access Event – September 2025 by Shan Cureton / Ruby Central

Long-lived access key security incidents strike again! Cureton, the Executive Director for Ruby Central, published a detailed security incident report after a blog post disclosed to the open source community that a former maintainer had production access to Ruby’s AWS account. The blog showed several screenshots and a CLI command that purported the open source maintainer maintained access via an AWS Access Key.

In response to the post, the Ruby Central team performed a series of containment actions to remove this access, and did not accuse the maintainer of anything malicious. But the post and this incident report show how hard it is to maintain a governance structure for an open-source non-profit that relies on contractors and volunteers to maintain the project.

Singularity: Deep Dive into a Modern Stealth Linux Kernel Rootkit by MatheuZSec

Two weeks in a row, I’ve read some great pieces on modern Linux Kernel Rootkits, so it was nice to see this one looked at a rootkit leveraging ftrace style hooking for its persistence and evasion capabilities. MatheusZ breaks down the source code within the rootkit itself, including the hooking techniques, and highlights some differentiators between this rootkit and others in the space. The attention to detail the rootkit creator put towards concealment of directories, for example, shows how much of a cat-and-mouse game this is.

When you hide a directory, you may not be able to see its name or contents via list commands, but you may leak metadata that a hidden directory exists. For example, if a directory contains three subdirectories and you hide one, ls will show only two subdirectories. However, the parent directory’s link count (visible via stat or ls -ld) would still reflect three subdirectories unless adjusted.

This discrepancy between the visible subdirectory count and the link count is a forensic artifact that can reveal hidden directories. This rootkit accounts for the discrepancy and hooks a function to compute the number of links for backdoored directories accordingly.

🔗 Open Source

ngsoti/rulezet-core

This codebase serves the complete application running on rulezet.org. It looks like an open source version of detections.ai that you can host yourself. It pulls in open-source rulesets, and you can use it to manage your own rules via a community-style setup.

eset/malware-ioc

ESET’s long-running repository of malware IOCs is based on blog posts and investigations they’ve done over the years. It’s cool to see commits from close to a decade ago. Each subdirectory has a README describing the malware family and contains the associated IOCs.

KittenBusters/CharmingKitten

For the last two or so weeks, KittenBusters has been publishing commits to this repository that detail the operations behind Iran’s IRGC-IO Counterintelligence division. It is split up into “episodes”, and so far, three episodes have been published. It contains sensitive documents and malware code, and it looks like they will start doxxing certain officials in upcoming episodes.

cisagov/LME

Logging Made Easy (LME) is CISA’s initiative on leveraging open source tools to enable a security operations function on a budget. It uses Wazuh and Elasticsearch, and the target audience is for smaller shops with a small security team or none at all. Probably very helpful for state and local municipalities that CISA works with during incidents.

DEW #132 - Linux Rootkits Evolution, LLM Rule Evals, Oracle 0-day exploitation

Detection Engineering Weekly

By: Zack Allen

8 October 2025 at 14:03

Welcome to Issue #132 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I spent the weekend hiking in the White Mountains in New Hampshire with my family. Turns out hiking is much harder when you have to carry kids who are strapped in a backpack
I got excited for a new season of The Amazing Race, and all of the competitors are from a separate reality show?? It’s not good
I’m staying away from all discussion around Tayler Swift’s new album

⏪ Did you miss the previous issues? I’m sure you wouldn’t, but JUST in case:

This week’s sponsor: Material Security

No More Babysitting the Security of Your Google Workspace
While your employees communicate via email and access sensitive files, Material quietly contains what’s lying in wait—phishing attacks in Gmail, exposed Drive files, and suspicious account activity. Agentless and API-first, it stops attacks and triages user reports with AI while running safe, automatic fixes so you don’t have to hover. Search everything in seconds, stream alerts to your SIEM, and audit with detailed access logs.
Simplify Your Google Workspace Security

💎 Detection Engineering Gem 💎

FlipSwitch: a Novel Syscall Hooking Technique by Remco Sprooten and Ruben Groenewoud

I first cut my teeth on writing malware when I was the red team captain at my alma mater’s yearly cybersecurity competition. I took a special interest in writing malware for Linux for several reasons. It was a special combination of operating systems knowledge and nuanced differences between kernel versions and Linux distros. It also felt harder than Windows in peculiar ways. For example, Windows is extremely good at backwards compatibility, so writing a piece of malware that interacts with the Kernel in all kinds of ways stays consistent between versions. Whereas in Linux, a single Kernel version update can break backwards compatibility with legitimate and malicious software alike.

That’s what brings us to FlipSwitch. Elastic Security Researchers Sprooten and Groenewoud did a deep dive on the latest 6.9 version of the Linux Kernel and inspected how changes to an array that stores syscall addresses render a classic Kernel rootkit technique useless. The method relies on hooking addresses in the sys_call_table array to point to attacker-controlled code before trampolining back to the original syscall.

Line 10 is the change that killed rootkits like Diamorphine. This is where flipswitch comes in.

The Elastic team did a fantastic breakdown in their blog, so I’ll give my synopsis. The technique involves searching the running kernel’s memory for the specific opcode associated with syscalls that FlipSwitch wants to hook. This opcode is unique, as in, when you load the malicious Kernel module, you can leverage its privilege to look for 0xe8 , enumerate each offset address for the specific function you want to hook via the new x64_sys_call, then patch it.

It’s pretty elegant, and it shows how a singular protection can kill one class of techniques but open up another class to exploit.

🔬 State of the Art

Bridging the Gap: How I used LLM Agents to Translate Threat Intelligence into Sigma Detections by Giulia Consonni

I’m glad to see more research and homelab-style blogs on how to build detection engineering agentic systems. It demystifies some of the hype surrounding products in this space, and just like Splunk did with SIEM by creating a community edition, it makes it easier for people to enter our field. I immediately clicked on this post because the title really excited me, and the post didn’t disappoint!

Consonni’s project here involves building out an LLM Agent system that translates threat intelligence into detection rules. They leveraged http://crewai.com/ (which I had never heard of), a platform that helps host AI Agents, provides an SDK for writing those agents, and makes it seem easy to focus on building the system rather than worrying about architecture and scale. Consonni started with a prompt that included the whole workflow of “read report → extract TTPs → create rules,” and it did a terrible job due to the broadness of the request. They refined the process with a multi-agent setup, some more specific prompting, and switching foundational models; the resulting rules were impressive.

More than “plausible nonsense”: A rigorous eval for ADÉ, our security coding agent by Bobby Filar and Dr. Anna Bertiger

This post is an EXCELLENT read after the LLM detection rule creator post by Consonni listed above.

Determining the performance of a machine learning model is as old as the field of statistics itself. The basic premise behind performance measurement is building a predictive system, testing it against real-world data, and measuring its performance efficacy. Sound familiar, like detection rules, right?

Naturally, LLMs should have the same type of evaluation criteria for implementers to trust and verify performance. I haven’t seen a comprehensive evaluation framework for detection rules until I came across this post by Filar and Dr. Bertiger. The Sublime team built a detection evaluation framework for their LLM-backed detection engineer, dubbed ADÉ. The idea here is that the team tried to encode success metrics for new detection rules written in the Sublime DSL. These success metrics should be familiar to long-time readers of this newsletter and to those who have read my Field Manual posts.

They split evaluations into three steps: precision, robustness, and cost to deploy and run. The lovely thing about these three evaluations is that they really capture how detection engineers think about testing rules before they deploy them.

Precision measures accuracy and net-new coverage, which, according to Filar and Dr. Bertiger, is the marginal value a rule adds when running alongside existing detections against known campaigns.
The robustness steps dissect the rules’ abstract syntax tree to identify and penalize lower-value detection mechanisms, such as IP matching. Think of this as penalizing the lower parts of the Pyramid of Pain
The cost step looks at how many times the model took to generate a production-quality rule, the time to deployment of that rule, and the runtime cost of the rule in production

They list evaluations of several rules towards the end of the post, and I’m impressed by their performance. They compare the results to a human-written rule, and it appears to have performed well in some detection types against humans but underperformed in others. However, the idea here (in my opinion) isn’t to replace humans, but to augment us, and I think this framework helps achieve that.

How to Create a Hunting Hypothesis by Deniz Topaloglu

The best way to threat hunt is to challenge assumptions. In my experience, these assumptions typically fall into several buckets, including:

Rules that fail to capture threat activity
Telemetry sources contain threat activity that we haven’t accounted for
Threat intelligence informs us of something we should be aware of in the pyramid of pain

Forming a hypothesis, then, takes assumptions and tries to challenge them to uncover gaps in rules or telemetry, and in the worst case, find an incident that you’ve missed. It’s a formulaic process, but this post shows how powerful threat hunting can be when you lay out your assumptions and what you know so you can deep dive into a hypothesis.

Topaloglu starts with a piece of threat intelligence, maps out potential TTPs in MITRE, shows an example network diagram, and then creates a hunting plan. They lay out several scenarios and their corresponding SIEM search queries in several languages, and continue on to post-hunt activities for aspiring hunters to follow up on because threat hunts should provide more value than just confirming whether activity is present or not in a network.

The Great SIEM Bake-Off: Is Your SOC About to Get Burned? by Matt Snyder

Choosing a SIEM is like selecting a business partner. You need to ensure that you understand the strengths and weaknesses of each other and create an operating model to compensate for them. It’s great to see a blog exploring the topic of procuring a SIEM and the pain associated with switching from one deployment to another. This piece is beneficial for aspiring analysts or detection and response engineers who’ve never been through this type of exercise, because it truly feels like a mountain to climb that can put your company and productivity at risk.

Snyder points out five key areas of concern where switching costs can kill productivity: ingest, search, enrichment, rules and administration. SIEM vendors should help you understand each component during a demo. Even then, many demos showcase the best parts of the technology, so a bake-off between SIEM vendors, via proofs of concept, and Snyder’s linked Maturity Tracker, can alleviate much of the uncertainty behind these exercises.

☣️ Threat Landscape

CrowdStrike Identifies Campaign Targeting Oracle E-Business Suite via Zero-Day Vulnerability (now tracked as CVE-2025-61882) by CrowdStrike

The large vulnerability news du jour is a remote code execution in Oracle E-Business Suite tracked under CVE-2025-61882. The CrowdStrike research team made this post detailing their observations as threat actors and researchers alike conduct mass exploitation to take advantage of the vulnerability.

The exploit chain involves a series of crafted payloads to two jsp endpoints, where an unauthenticated attacker uploads a malicious xslt file. This, in turn, creates an outbound Java request to an attacker-controlled command and control server to load a webshell on victim machines.

The remarkable aspect here is how the exploit was disseminated. Oracle made a public post with IOCs, a PoC was posted on October 3, and according to CrowdStrike, threat actors under the ShinyHunters moniker posted an exploit file to their main Telegram channel.

Red Hat Consulting breach puts over 5000 high profile enterprise customers at risk — in detail by Kevin Beaumont

Red Hat Consulting, the technology services arm of Red Hat, allegedly suffered a data breach from a threat actor group dubbed “Crimson Collective.” It’s unclear how this breach happened, but they began posting screenshots of the pilfered victim data. Beaumont uncovered some interesting details about this threat actor group, thanks to the assistance of Brian Krebs. They seem to overlap with Scattered Spider/Shiny Hunters, and one of the Telegram posts made by the group had a “Miku” signature at the end. Miku is an alleged member of Scattered Spider and was arrested last year, but is on house arrest.

The victim details were posted on the Scattered LAPSUS$ Hunters victim leak site, and it appears to contain a trove of customer data from Red Hat Consulting, including some sensitive information.

DPRK IT Workers: Inside North Korea’s Crypto Laundering Network by Chainalysis

My favorite thing about reading Chainalysis blogs is getting a glimpse into how money laundering works at a cryptocurrency scale. Unless you’re a freak of nature and read indictments or court documents with detailed notes on traditional money laundering techniques, it’s rare to see how criminal and nation-state operations do the hard work of funneling money.

So, in this blog, the Chainalysis team studied the tactics, techniques and procedures of DPRK IT Worker laundering. They have a structured approach to taking payment in stablecoins, laundering it to a “consolidation” worker, and eventually offloading the consolidated funds to fiat.

Don’t Sweat the *Fix Techniques by Tyler Bohlmann

When I first read about ClickFix, I didn’t think it would be a successful approach to infection and initial access. The premise was a bit crazy: you funnel victims to a website, socially engineer them to believe there’s a problem with their computer, and convince them to willingly copy and paste a malicious command into their terminal.

Well I was wrong; this technique works beautifully, and according to Bohlmann, Huntress has observed a 600%+ increase in these styles of attack since their inception last year. In this post, they review the different styles of ClickFix, the attack chains and how they use clever ways to trick users to running the malicious payloads.

🔗 Open Source

1337-42/FlipSwitch-dev

Sprooten’s FlipSwitch PoC repo is referenced in the Gem above. It does more than just demonstrate the technique; you can use this as a rootkit kernel module in the latest versions of the Linux Kernel, and it supports some fun obfuscation techniques to make it harder to find.

ti-to-sigma-crew

Threat intelligence report to Sigma rule generator. This repository is based on the research linked above by Consonni. It looks pretty easy to use a templated CrewAI application, add knowledge files like detection rules as examples, and it looks like a SQLite database for RAG components.

matt-snyder-stuff/Security-Maturity-Tracking

Simple yet effective security maturity tracking framework for a security operations program. The repository lists each capability you want to track, such as SIEM, Threat Hunting and Threat Intelligence, and you can create maturity matrices for each one and track progress. These are generally pretty good at presenting up to leadership on program development.

thalesgroup-cert/suspicious

Open-source anti-phishing and investigation application for investigators, analysts and CERT folks. You set it up, tie it to an inbox, have users forward suspicious emails to it, and it’ll pull apart the email, perform threat intel lookups and present a report for further analysis.

CERT-Polska/karton

A dynamic malware analysis platform where you can build malware processing backends all in Python. It comes with several backends out of the box, including a malware sandbox, an archive extractor, and a malware configuration extractor. It looks pretty easy to write your own, and you can submit it via an API or the dashboard to extend functionality.

DEW #131 - ❄️New EDR bypass❄️, CTI Poverty, AWS Infra Canaries & Hunting in IMDS

Detection Engineering Weekly

By: Zack Allen

1 October 2025 at 13:27

Welcome to Issue #131 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

My new office desk is done, and my office feels so much more organized with better use of space
I learned you can 3D print How To Train Your Dragon toys and stole one of these from my kid, who got it as a present
Got a ticket to DistrictCon, so I’ll hopefully see you some you in person!

⏪ Did you miss the previous issues? I’m sure you wouldn’t, but JUST in case:

🚨 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀, 𝘁𝗵𝗿𝗲𝗮𝘁 𝗵𝘂𝗻𝘁𝗲𝗿𝘀, 𝗖𝗧𝗜 𝘁𝗲𝗮𝗺𝘀—𝘁𝗵𝗶𝘀 𝗼𝗻𝗲’𝘀 𝗳𝗼𝗿 𝘆𝗼𝘂.
Join us LIVE on October 7th for “𝗙𝗿𝗼𝗺 𝗧𝗵𝗿𝗲𝗮𝘁 𝗜𝗻𝘁𝗲𝗹 𝘁𝗼 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻 𝗥𝘂𝗹𝗲𝘀 𝗶𝗻 𝗠𝗶𝗻𝘂𝘁𝗲𝘀 (𝗡𝗼𝘁 𝗛𝗼𝘂𝗿𝘀)” — a hands-on webinar with 𝗱𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻𝘀.𝗮𝗶

Presenters: Aaron Mog & Tim Peck from detections.ai

Let’s stop drowning in intel and start deploying smarter.
📅 Save your spot now 👇

🌍 APAC: 10:00 AM SGT: Register
🌍 EMEA: 2:00 PM GMT: Register
🌍Americas: 11:00 AM PST: Register
Join @ detections.ai - Use invite code “DEW“ to get started

💎 Detection Engineering Gem 💎

EDR-Freeze: A Tool That Puts EDRs And Antivirus Into A Coma State by Zero Salarium

This is a clever attack against EDR tooling that exploits a vulnerability in Windows Error Reporting (WER), which can cause target processes to enter a suspended state. The race condition, known as EDR-Freeze, exploits a clever method to leverage the MiniDumpWriteDump function, a debug feature in WinDbg, to trick it into thinking it’s creating an object dump of the EDR process. However, since EDRs are protected by ProtectProcessLight (PPL), an anti-tampering method introduced in Windows 8, the attacking process must also be initiated with PPL.

So, the attacker starts the WER executable, WerFaultSecure.exe, and suspends the EDR process via MiniDumpWriteDump function. EDR-Freeze then monitors the EDR to be suspended, creating the race condition. They then suspend the WER calling executable, which means it blocks the EDR process from ever “unsuspending” itself.

It appears that some EDRs are affected, but it was interesting to see the various responses from different companies. For example, Elastic Researchers noted that the technique doesn’t work due to a rule they implemented to block the use of WerFaultSecure.

🔬 State of the Art

Intelligence Poverty and the Commercial Data Economy by Joe Slowik

Cyber threat intelligence (cyber threat intelligence) has been challenging to convince people outside of security of its usefulness in my career. Once you think about the kind of cool stuff you get to do in this genre of security, it seems evident that others would want it. However, I think this bias can cloud people’s perception of the usefulness of a security organization. It’s intangible in many ways, and depending on how mature your program is, it focuses on the “what’s out there” versus “what’s in here”.

Some of this bias is cultural, since CTI was born from the military and spy operations. The stakes are higher when someone’s life is at stake. But, when you introduce the cyber element, it becomes a more frustrating practice in information asymmetry. This information asymmetry is what breeds the market and vendors to sell it: they have data that you don’t.

This is why Joe’s post here is so timely and relevant (just like threat intel!) Many people, including myself, are using VirusTotal (now Google Threat Intelligence) this year. When a company has a monopoly on crowdsourced and expert-created cyber threat intelligence data, it can essentially charge what it wants. According to Joe, this economy of scale creates an “intelligence poverty” for those outside large organizations with a budget to compete.

It makes it even harder for people trying to break into the industry, or for those who do it as independent researchers, to take advantage of data that can be the difference between a breach discovery and not. I really wouldn’t know what to recommend for people who want to do more OSINT-style CTI using these platforms. I’m fortunate enough to be a consumer of these platforms or to have been given researcher accounts. Still, this commercialization may force new analysts to work in fewer places than before.

Our plan for a more secure npm supply chain by Xavier René-Corai

GitHub’s Director of Security Research published a post about GitHub’s response to the the last several weeks of supply chain attacks against npm. The biggest offender, the Shai-Hulud worm, demonstrated how fragile some of these ecosystems can be in terms of security. The open-source community reacted swiftly, starting to analyze the malware code and issuing warnings to GitHub. However, according to René-Corai, GitHub itself needs to take stronger action against these types of attacks.

The GitHub security team is moving towards three publishing options, which are a combination of reducing long-lived publishing tokens and “Trusted Publishing” via means like 2FA. They are also removing several publishing options, and some seem harder than others to implement. For example, they recommend moving 2FA away from OTPs to FIDO-based 2FA, but that can be cost-prohibitive or can be a logistical nightmare to get implemented.

IMDS Abused: Hunting Rare Behaviors to Uncover Exploits by Hila Ramati and Gili Tikochinski

Wiz researchers Ramati and Tikochinski perform a threat hunting deep dive on unusual IMDS usage across their customer environments. IMDS is a beast of a service - without instance metadata, it’s much harder for applications to understand configuration and service data related to the infrastructure they are running on. It’s run on 127.0.0.1, so theoretically, only the applications and the instance can access the service.

This configuration service is an attractive target for attackers, so if they can devise creative ways to access the API, they can use it to steal credentials and move from the instance to the cloud environment. Attackers, unlike services and code, don’t usually fall in the behavioral patterns of accessing the service, so this is where Ramati and Tikochinski start to hunt for compromises.

Once they baselined cross-customer usage of IMDS, they found three compromises related to N-day exploits against various services. I feel that threat hunting is primarily about baselining behavior and identifying outliers, and this blog is a great demonstration of that.

Introducing the AWS Infrastructure Canarytoken by Marco Slaviero

This is a neat feature update from Thinkst Canary, one of the OG companies offering canary token capabilities to security teams. Free-tier and paid users can now leverage their AWS Infrastructure Canarytoken, which is a specialized feature that deploys canary infrastructure. It leverages a combination of AWS permissions, Terraform files, and some special sauce to “learn” your AWS environment and deploy what it thinks is the best canary-style cloud resource. There are two required cross-account integrations: one involves giving temporary access to Thinkst so they can “learn” your infrastructure. The second is a long-term cross-account access that sends your CloudTrail events from the canaries to their main AWS account for alerting and processing.

☣️ Threat Landscape

I’m posting two quick-hit podcast episodes from friends of the newsletter, The Three Buddy Problem.

In this interview, Ryan & Juan interview Aurora Johnson and Trevor Hilligoss from SpyCloud. They gave an overview of a Com-like community in China that performs similar harassment and insider threat style crimes. The difference between this group, dubbed “Internet Toilets” (and the name of their talk), and The Com, is the access to much more persona data due to corrupt officials in local Chinese governments.

This episode is a 12 year lookback on Mandiant’s first ever threat report on APT1. This was a pivotal moment for cybersecurity as it showed how much visibility that private firms possess and how it can overlap nicely with government spy operations. I was 1 year into my career when I first read this report and I was blown away. I entered my first threat research job that same year and the rest is history :).

That Secret Service SIM farm story is bogus by Robert Graham

The big news last week involved the Secret Service busting a SIM farm. The PBS story I linked here claims it could have been used to “collapse telecom networks”. One of the agents had a quote suggesting that a nation-state might have run it.

Several news outlets started poking holes at that claim, and Graham’s piece here points out why. A possible reason why it sounded like a nation-state operation was it financial scale, but also the lead that led to the farm involved a text from this possible “spam farm.” It’s kind of like saying AWS was responsible for a nation-state hack from a China-nexus actor because it originated from an AWS IP.

September 26 Advisory: SNMP RCE in Cisco IOS and IOS XE Software [CVE‑2025‑20352] by Censys Security Research

The Censys team’s threat advisory on the latest Cisco vulnerability provides valuable information on the Internet exposure of these vulnerable devices. A specially crafted SNMP packet can lead to a stack overflow on these 192,000 devices. The prerequisite here is that the attacker must be authenticated to the device. A guest account or a low-privileged account can initiate the attack and get a DoS, whereas a high-privileged account can get RCE to pivot into internal networks.

Canary tokens: Learn all about the unsung heroes of security at Grafana Labs by Mostafa Moradian

Grafana’s Security Research team published this post as a follow-up to their security incident they experienced in May. I really enjoy reading about lessons learned from companies that suffer an incident like this, because firms tend to be risk-averse and not publish details. The follow-up by Moradian involves the use of canary tokens in their infrastructure to identify leaks in their source code.

The team had tokens placed throughout their codebase, and the incident involved exfiltrating their codebase during the attack. Logically, the threat actor leveraged TruffleHog to scan the codebase for exposed secrets. TruffleHog can sift through code, configuration files, and even code commits. You can configure TruffleHog to check the validity of the secret, so it’ll reach out to the various service platforms and look for a response indicating that the secret is live. Once the actor reached out to AWS, it issued a critical alert to Grafana’s Detection & Response team, and they were able to identify the repository from which the secret was stolen.

These tokens offer a cheap and effective way to get some high-fidelity alerts, especially in the case of exfiltration, such as what happened to Grafana.

🔗 Open Source

acquiredsecurity/forensic-timeliner

Security incident timeline builder for DFIR investigators. It can clobber output from several forensics tools, such as Chainsaw and Hayabusa, combines them into a singular format and it creates a nice queryable timeline. It has a GUI that looks a lot like Wireshark which I got excited about :).

gabriel-sztejnworcel/pipe-intercept

This is a neat named pipe interceptor for Windows that leverages WebSockets so you can view named pipe communication via tools like Burp. You specify a target named pipe via the command line argument, connect to the WebSocket via your preferred tool, and see the live IPC traffic over the wire.

awslabs/amazon-bedrock-agentcore-samples

Amazon recently launched AgentCore, their service providing agentic infrastructure. I linked their samples here because it seems pretty straightforward to get a full agentic infrastructure up for security use cases. For example, you can load in system prompts for security triage, leverage S3 as a vector database and upload runbooks and rule descriptions, and connect to their MCP servers for telemetry querying using natural language.

microsoft/avml

Memory acquisition tool for Linux. You compile it as a binary, load it on a target system and capture memory for offline analysis. Has some native functionality to upload to Azure blob storage. Uses the LiME output format once retrieved, though I’m unsure if Microsoft devs read that LiME is no longer being developed.

DEW #130 - God-mode Azure vulnerability, Composite Detections & Detection Observability

Detection Engineering Weekly

By: Zack Allen

24 September 2025 at 14:03

Welcome to Issue #130 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I am putting together an Uplift desk and woo let me tell you there are a lot of pieces (electronics)
Had a fantastic time in NY for $DAYJOB and managed to spill pasta on my jeans at a team dinner, in front of my boss and CISO
Brought the family to an organic farm fair and watched a demonstration with a Border Collie gathering sheep. Made me miss my dog Pasha a lot :(
Vibe coded some infra for the newsletter to make news collection easier. Post to Discord → auto add to new issues in Notion

⏪ Check out last week’s issue if you’ve missed it!

This Week’s Sponsor: detections.ai

Community Inspired. AI Enhanced. Better Detections.
detections.ai uses AI to transform threat intel into detection rules across any security platform. Join 7,500 detection engineers leveraging AI-powered detection engineering to stay ahead of attackers.
Our AI analyzes the latest CTI to create rules in SIGMA, SPL, YARA-L, KQL, and YARA and translates them into more languages. Community rules for PowerShell execution, lateral movement, service installations, and hundreds of threat scenarios.
Join @ detections.ai
Use invite code "DEW" to get started

💎 Detection Engineering Gem 💎

One Token to rule them all - obtaining Global Admin in every Entra ID tenant via Actor tokens by Dirk-jan Mollema

I’ve rarely included vulnerability writeups in previous Gems, but this one was just way too good not to do so. For those not in the cloud space, the foundation of cloud security lies in hyperscalers’ shared responsibility models. Since the cloud involves running your stuff on someone else’s computer, there needs to be some guarantees that they are doing so in a secure manner. The keyword here is “shared”, as in, you are responsible for the security of your cloud deployment, but AWS, Azure, and GCP are accountable for the underlying technology, making it possible to run your apps on their services.

If you are a bug bounty hunter targeting Azure, for example, you have your target list above. You could pursue cloud customer deployments and hopefully receive some substantial bug bounty payouts. But the holy grail here is the boxes labeled “Microsoft”. Bragging rights aside, the infrastructure that they manage is opaque, both as a security control and probably because they don’t know everything they have turned on and off.

This is where Mollema’s vulnerability comes into play. Based on their post, I believe Mollema identified one of the most critical cloud vulnerabilities ever disclosed, located under Microsoft’s “Identity and directory infrastructure” category in the SaaS column. The vulnerability, CVE-2025-55241, involved combining three separate flaws in Azure, and then employing brute force once these flaws were combined.

While researching hybrid Exchange setups, Mollema uncovered an undocumented service called “Access Control Service”. This was used for intercommunication between services in the backend of Azure, and it issued what he called were impersonation tokens.

The first flaw was this service could create unsigned impersonation tokens and you could specify any user, as in, Entra ID wasn’t required to issue the token. The second flaw was that Azure AD Graph API, a legacy API, would accept these unsigned tokens as valid. Mollema could then issue Graph API requests to ANY AZURE TENANT GLOBALLY without the Graph API checking if he owned the target tenant.

This is impressive because it breaks the fundamental trust boundary you see in the shared responsibility model, and he could theoretically “hop” into any Tenant that he wanted. Most of this wasn’t generating any telemetry until you started creating or modifying resources in a victim account, so luckily, Mollema includes a KQL rule to detect this activity.

🔬 State of the Art

Threat-agnostic detection of co-occurring MITRE ATT&CK® events using Composite Detections by Ryan Tomcik

Composite detection rules are an evolution of atomic rules, where the presence of techniques across different pieces of telemetry can show threat actor activity within a sea of noise. The idea here is to mitigate the precision versus recall tradeoff by simultaneously using two or more rules for the alerting scenario. This provides flexibility in several dimensions, such as identifying activity presence using one strategy, like a string match, versus another, like a windowed threshold.

In this piece, Tomcik leverages the Google SecOps platform to show examples of composite detections via the lens of MITRE ATT&CK. Co-occurrence is another alerting technique that attempts to identify an intrusion through a sequence of events. Tomcik’s example of detecting co-occurring MITRE techniques alongside composite rules is a fitting illustration of composite rules in action. Discovery techniques are potentially the noisiest of techniques to catch in a live environment because they yield a lot of legitimate activity.

However, as Tomcik points out here, if you reconcile this telemetry with a singular source and find multiple discovery attempt techniques from one host, you achieve a high-fidelity alerting situation. The assumption here is that one host or log source shouldn’t be using several discovery mechanisms at once, so it could indicate a threat actor landing on a box and discovering their environment to collect information on where to pivot to next.

Resource Gathering by Amitai Cohen

This may not be a purely technical post, but I believe it’s highly relevant for anyone in detection engineering or security, especially as you begin your career. Amitai, a friend of the newsletter, describes situations in the workplace where we are bombarded with messages, emails, and requests to do stuff. And this is especially worse if you have meetings on top of your requests. So how do you manage all of that?

He draws the analogy of resource gathering from none other than Real-Time Strategy (RTS) games. It ultimately comes down to the question of return on investment. It’s good to notice when you see lengthy Slack DMs or channels, documents, or emails and see whether it’s worth your investment to generate what you want to impact in your day job.

In management land, a useful tool for this is the Eisenhower Box:

Granted, individual contributor folks don’t have the ability to delegate, but it’s still a useful mental model to focus on what matters.

Zero-Log Checker: Automating Log Absence Detection in Wazuh by Hanif Kurniawan A.

I’ve pontificated throughout this newsletter about how security tends to steal of concepts from other software engineering disciplines and rebrands them into something cool or sexy. Regression testing? THREAT EMULATION PEW PEW. Monitoring for weird or malicious logs? THREAT DETECTION BANG.

In all seriousness, it’s nice to see when concepts from software engineering enter our space and how we use them to solve real security problems. I’ve been shilling the idea of “observability for security” at my $DAYJOB, and every time I see a new post like Kurniawan’s here, it’s a good indicator that the health of your detection systems matters just as much as your rules. So, in this post, Kurniawan shows their system on how to detect when a log source goes down in Wazuh and how you can respond to it.

It’s a neat lab to me because I haven’t seen a lot of Wazuh content in the mix of Splunk, ELK and Sigma. The basic premise is you write a Python script to read in nd.json files on the local lab environment, compare them to historical files from the log source, check if it falls outside a window, and generate its own alert. Once alerted, you can review the log source health and ensure that forwarders are working, starting from the agent to the network setup.

🥊 Quick Hits

How I Built My First SIEM Detections by Garv Kamra

This is a neat “beginner” post on Kamra’s adventures in writing their first detection rules. It’s a lab environment where they set up a basic ELK stack, and focused on creating correlation rules, which are definitely a challenge if it’s the first time they’ve written rules!

Kamra wrote three correlation rules for credential stuffing, DNS tunneling and suspicious firewall usage. Kama wrote their assumptions, findings and lessons learned in each subsection. This is a great way for folks who are also trying this type of lab environment for the first time to learn.

Hunting Ideas in AWS Part 1 by Jake Valesky

This is Valesky’s first blog post ever, so I was excited to see how they wrote about a topic near and dear to my heart: threat hunting in the cloud. The blog explores effective log strategies, the various types of access keys that interact with your environment, and how to monitor unauthorized changes. I’ve linked to and written about AWS threat detection in detail in this newsletter. Blogs like this demonstrate the complexity of the environment and how it should be treated as its own operating system.

☣️ Threat Landscape

Microsoft seizes 338 websites to disrupt rapidly growing ‘RaccoonO365’ phishing service by Steven Masada/Microsoft Digital Crimes Unit (DCU)

I’ve always found DCU posts where they’ve legally seized actor infrastructure fascinating. When I think of legal action against threat actors, my brain immediately goes to law enforcement actions. This is logical, considering the numerous takedowns and arrests of criminals worldwide. But a private company doing this is on a whole other level!

The DCU team seized approximately 338 websites associated with an emerging phishing-as-a-service kit, RaccoonO365. It’s a subscription model, meaning you pay monthly, and according to the tech giant, it has accrued nearly $100,000 in payments paid directly to the creator. They’ve also allegedly identified the creator itself, a Nigerian man, and issued a “criminal referral”, which I’m guessing means they dropped the doxxing info to someone here in the FBI or to the Nigerian government.

Two teenagers charged over Transport for London cyber attack by Joe Tidy and Graham Fraser

UK police charged two teenagers, whom they believe were responsible for the months of disruption to Transport for London’s information systems. There were some shutdowns of the transit vehicles themselves, but most of the damage was in relation to their online services. According to the report, these two individuals were already in trouble for other cybersecurity-related crimes, with one having evidence on their machine that they were targeting U.S. healthcare companies. Smells like Scattered Spider to me!

SystemBC – Bringing the Noise by Black Lotus Labs

Detecting command-and-control (C2) servers and traffic is a pastime of mine. There’s something exciting about dissecting a piece of malware, analyzing its traffic, and then trying to identify where the bad guys host their command-and-control (C2) server so you can fingerprint it and gather additional threat intelligence as they move the C2 around the Internet.

This becomes much more challenging when the C2 server you are communicating with isn’t the actual C2 server. According to Black Lotus Labs, proxy networks are becoming increasingly popular among malware families, and proxy malware, such as SystemBC, helps facilitate this type of activity. The basic idea here is that an infection can route its traffic to the C2 server via an infected proxy to help masquerade the origin IP address. The Black Lotus team uncovered an extensive network of SystemBC Linux variant infections, attributed them to several grayware proxy providers, and blocked traffic to affected users.

Tech Note - BeaverTail variant distributed via malicious repositories and ClickFix lure by Oliver Smith

GitLab threat intelligence researcher Smith outlines some notable TTPs in DPRK’s BeaverTail malware. BeaverTail was first detected via Contagious Interview attacks, which relied on victims to clone a repository to perform a coding test and execute a backdoor embedded within the code.

According to Smith, they’ve pivoted towards ClickFix-style attacks, and my tinfoil hat says it’s because platforms like GitLab/GitHub are getting much better at squashing this malicious repository. They outline the chain and how the campaign is smart enough to differentiate between operating systems, making sure to maximize their victims who visit their ClickFix websites.

🔗 Open Source

Cyb3r-Monk/Microsoft-Vulnerable-Driver-Block-Lists

Cyb3r-Monk, a.k.a. Mehmet Ergene, whom I’ve featured extensively in this newsletter, has dropped a great resource for Windows Security administrators to help block vulnerable drivers. Microsoft apparently removed the webpage that lists these drivers and instead publishes a ZIP file with some XML data, making it more challenging for users to ingest and implement controls. Ergene automates this process with this repository and exposes some great metadata, allowing people to understand what they are loading into their Windows environments.

dis0rder0x00/obex

Yet another anti-EDR repository for tampering with security tools that you don’t want turned on :). The methodology here is that obex spawns itself as a process, and it attaches a debugger. The debugger hooks the LdrLoadDll function, patches a specific pointer, and uses it to catch DLLs being loaded and subsequently block them.

rotemreiss/malifiscan

With the number of open-source supply chain attacks occurring over the last few weeks, it’s encouraging to see open-source projects like this one come into play to help individuals understand the impact of malicious packages in their environment.

firezone/firezone

WireGuard-based, zero-trust networking project that allows users to deploy gateways and a management server for secure access. I love WireGuard, so I was already excited to read about this. The control plane/management server is written in Elixir, and all the gateways run in Rust, so you can expect this to be performant. They went the extra mile and created client applications for iOS, Android, Linux, Windows, and macOS.

DEW #129 - Malicious browser extensions, npm gets pwned (again) and AI weaponizing CVEs

Detection Engineering Weekly

By: Zack Allen

17 September 2025 at 13:54

Welcome to Issue #129 of Detection Engineering Weekly!

I’m in NYC this week, and I underpacked, so I walked over to Hudson Yards to grab some T-shirts. I picked out two from Uniqlo, and when I got to self-checkout, I looked like a confused tourist, since you just “drop” your shirts into a bucket and it automatically finds the shirt and price. It was black magic
It’s been a busy few weeks at work with all of these supply chain-style attacks, and I’m sure a lot of you have been as well. But, I am continuously underwhelmed that these elegant package takeovers result in cryptominers and wallet stealers. If anyone wants to turn heel with me and go on a villain arc, the first thing I’d recommend is to stay away from cryptominers
I’ve begun to make small structural changes to the newsletter issues. I am removing italics, I changed how my From field looks on emails, and titles have a more descriptive sneak peek into the content. Don’t worry, I’m still keeping the snark and xkcd-style commentary throughout the issue, but this has already helped boost my open rates and engagement

📣📰🌐 Interested in sponsoring the newsletter and placing your ad right here?
I’m happy to see the engagement of folks reaching out to sponsor the newsletter. I have slots filling up for the rest of the year, so if you want to run an ad and get eyeballs and clicks from practitioners, CISOs and everything in between, shoot me an e-mail and let’s chat.
Sponsor Detection Engineering Weekly

⏪ Did you miss the previous issues? I'm sure you wouldn't, but JUST in case:

💎 Detection Engineering Gem 💎

Even if many plugins are fine, the bad ones are BAD by John Tuckner

As readers have seen in last week's issue, supply chain security affects the entire open-source ecosystem, which includes numerous registry-style marketplaces. For this post, a friend of the newsletter and security researcher John Tuckner shares a more in-depth look at how browsers manage the supply chain of extensions, and how we have a long way to go before we have complete visibility and detection opportunities on malicious extensions.

Unlike npm or pip, all modern browsers employ a sandbox that includes various security features. Memory protection, file system restriction, and process isolation are just a few of the many features of the sandbox, so it's tough for exploit developers to break out of the sandbox. But, browsers need to be modern, and any modern technology can extend its functionality, so this is where extensions come in.

Extensions have marketplaces, and the browsers can either officially own these marketplaces or have them as third-party registries. Anyone can write an extension and publish it, and while some marketplaces have more stringent requirements, according to Tuckner, you can side-load them just like a mobile app. This opens up a significant risk, and if you aren't careful, you can install an overly permissive app that can read and write to your computer in ways you may not want an app to do.

It's great that these guardrails help prevent a full breakout from an extension to the operating system, but it doesn't stop someone from willingly installing a malicious one. The expectation that an end-user can read and understand permission models and assess their maliciousness across any registry, such as a browser, open-source software, or IDE, is an impossible task.

🔬 State of the Art

Can AI weaponize new CVEs in under 15 minutes? by Efi Weiss and Nahman Khayet

If you've ever wanted to see how to solve a security use case with agentic systems, this is an excellent post by Weiss and Khayet on how to build and deploy one. They started with a pain point that we all suffer from: given a CVE, how fast can a researcher create and publish a proof-of-concept (PoC) exploit, and should we patch if that code makes it to the wild? It's a pain when someone releases a PoC, but I think that it helps create detection opportunities to validate impact. So, whether it comes from a researcher or their LLM agent, I'm happy to take in more data. Here's their workflow:

They focused their research solely on open-source packages. They used a combination of NIST and GHSA, and this type of structured data, alongside the patch diff, is an excellent source of data to feed into an agentic system to generate the PoC. They encountered some issues along the way, such as using a single, generalized agent for the full PoC lifecycle instead of multiple specialist agents. The other part I found pretty funny was when their agent was "refining" the PoC; the LLM focused on making the code work rather than ensuring it was vulnerable.

If I had to suggest more research into this area, I'd love to see folks take the PoC environments from their GitHub and instrument them to create the correct logs to generate detection rules. The time from CVE publication to PoC to detection rule coverage would be lightning fast and help at least some of us sleep better at night.

Automation for Threat Detection Quality Assurance by Blake Hensley

So many people ask, "How many rules do you have?" and never ask, "How are your rules doing?" Jokes aside, detection quality is a topic that is near and dear to my heart, and it's not talked about enough. Hensley is breaking that barrier, and what I love about this post is that he argues that detection quality isn't always about rule formats, linting, and emulating in CI/CD pipelines. It's also about ensuring your rules perform as intended in your live environment through experiments.

Hensley structures this post with several examples of detection quality assurance tests. You have unit tests and purple teaming emulation within the mix, but there are some really unique tests here I've never considered before. For example, Hensley's "results diff backtest" compares the output of a previous rule version's results with those of the new version. All super clever, and I'm adding this to my detection backlog.

The Present and Future of Managed Detection and Response by Migjen Hakaj and Amine Besson

This post offers a thorough overview of the Managed Detection & Response (MDR) market, aiming to address and answer the question, "What's next?" for companies in this space. MDRs evolved from the MSSP space, where most of their value proposition revolved around being "alert forwarders", as Besson and Hakaj put it. The market MDR providers discovered involved taking sprawling detection toolsets, unifying them, providing expert analysis, and then presenting the key information that matters to customers.

You can probably see why I liked this post, starting at the section …And what is my MDR doing to improve it? The differentiator, according to Hakaj and Besson, is detection engineering. Scaling detections as more data sources are added to a technology stack becomes the moat. Adding AI on top of this will disrupt the scaling efforts, for good or for worse. This is especially interesting if MDRs hire expert analysts, while their customers hire security generalists or engineers.

So, if you do partner with an MDR provider, press them on detection coverage and adding sources, while proving to you that they can be nimble in both areas.

Building a Detection Lab That Fits in Your Laptop by Joseph Gitonga

This is an excellent home lab tutorial for folks who want to get started with detection engineering and threat hunting. Gitonga approaches this lab targeting individuals who want to break into security operations without incurring expenses on cloud-based lab environments. You may need a lab machine that can be cost-prohibitive, but you can find beefy servers on eBay or build something with parts to meet the minimum lab machine requirement here. By the end of the lab, you'll have a Splunk SIEM, Active Directory environment, and a separate Splunk response and automation server.

The Cost of a Wrong Word in Threat Intelligence by Rishika Desai

I recently had a sit-down with a senior leader in my company to discuss how I can assist them with security challenges. This person has been with my company for years, both before and after the IPO, and is now leading a massive engineering organization. Whenever I meet with folks I don't work with often, my number one rule is to ask lots of questions. So I asked: Where has security given you the most pain? And he described threat intelligence in one request, as he wanted the risk context for what he was building, so he could do it securely.

Risk context is a much better name for threat intelligence. It's supposed to inform; whether you are a detection engineer building rules or a CISO looking at the latest threats, you can choose to use the information or not. Desai examines this concept, but from the perspective that the information may be incorrect. A missing word, clobbered threat actor names, or overly confident language can make or break a threat intelligence report.

I love this blog because it highlights the importance of risk context, and the context can be screwed up if you don't effectively communicate in your writing. This same context applies to detection rules, whether in the documentation or your response playbooks. Accuracy and clarity matter, and that includes whether or not you don't know something as much as what you know.

☣️ Threat Landscape

Geedge & MESA Leak: Analyzing the Great Firewall’s Largest Document Leak by Mingshi Wu

The big news this week from the People's Republic of China (PRC) is one of the largest document leaks related to the Great Firewall of China. I love leaks like this for several reasons. One, it gives insight into a culture that is vastly different from what we are used to in the West. Two, we get to see the technical implementations and architecture of a serious Internet censorship apparatus. If you set aside ethics, it's an impressive feat for a country with the world's largest population.

Wu is updating this page in real-time with findings, so it should update automatically as time passes. They also link net4people's GitHub issue tracker as they comb through the nearly 500 GB of data from GitLab, Confluence, and JIRA, so expect numerous findings in the coming weeks.

Inboxfuscation: Because Rules Are Meant to Be Broken by Andi Ahmeti

Permiso researcher Andi Ahmeti releases a Microsoft Exchange Inbox malicious rule creator and analyzer based on their research into threat actors abusing Inbox rules. Whenever I meet people in my day job who worry about advanced attackers using advanced techniques, I try to ground them back in situations like Ahmeti describes in this post. You may be a victim of a highly sophisticated adversary, but they would prefer to use tradecraft that is simple before resorting to their Rolodex of advanced attacks.

The most interesting aspect of this research is the exploration of Unicode obfuscation techniques and their associated detection opportunities. It reminds me a lot of malicious domain research, where using different character sets can confuse a rendering application (such as email) and not display the Punycode representation, thereby confusing the victim into thinking they received a legitimate domain to click on.

Uncloaking VoidProxy: a Novel and Evasive Phishing-as-a-Service Framework | Okta Security by Houssem Eddine Bordjiba

The Okta Security Research team uncovered a new phishing-as-a-service kit dubbed VoidProxy. The campaign they uncovered started from phishing lures from compromised email addresses on marketing platforms. The level of indirection in this particular kit is impressive. Particularly, they put a lot of time and energy into catching security scanners and researchers. They abuse Cloudflare's CAPTCHA turnstile and edge workers infrastructure to funnel potential victims into the phishing attack itself. It uses a standard Attacker-in-the-middle workflow to capture non-phishing-resistant authentication codes for the attack.

Bordjiba gained access to the panel itself, providing a unique inside look at how these kits are constructed, distributed, and managed by threat actors.

Ongoing Supply Chain Attack Targets CrowdStrike npm Packages by Kush Pandya and Peter van der Zee

The news of the npm supply chain attack from last week continues, this time targeting CrowdStrike's npm package library. The Dune-themed attack has a rather unique attack chain. Once you install the backdoored package, it'll download Trufflehog and extract secrets on your local machine or CI/CD environment. It also backdoors GitHub actions workflows with a file named shai-halud-workflow.yml. The peculiar part here is that the attacker publishes a public repository named Shai-Halud, which can be searched for across GitHub. Still, no one has (as of this post) figured out what these repositories do.

🔗 Open Source

Permiso-io-tools/Inboxfuscation

Powershell-based Microsoft Exchange rule exploitation toolkit that I linked above under Threat Landscape. It does some really neat stuff with Unicode manipulation to defeat traditional regular expression and word-based detections.

thekibiru03/splunk-ad-lab

Splunk and Active Directory repository referenced in the lab post above by Joseph Gitonga. Has a ton of out-of-the-box logging capabilities with Sysmon, PowerShell logging, and Audit policies. Also ships with some useful threat emulation tools, including Atomic Red Team, Disabled Defender, and fake users.

BlakeHensleyy/kql-tester

KQL testing repo from Blake Hensley's detection quality assurance piece under State of the Art. You provide it with a rule and your Azure credentials, and it'll perform a series of checks referenced in the blog to verify the efficacy against live data. You can adjust the query parameters or testing logic for your own detection risk tolerance.

magisterquis/sneaky_remap

Sneaky remap is a Linux defensive evasion technique that hides shared object files from detection. These shared object files can be used for persistence techniques, privilege escalation, and other defensive evasion operations, so relying on the Linux filesystem commands to look for peculiar shared objects can stop things like LD_PRELOAD and others in its tracks. The theory section here explains the algorithm itself, which relies on taking file-backed memory mappings, preserving their permissions, and then moving them to anonymous memory to evade detection.

lwthiker/curl-impersonate

This tool utilizes the cURL library to simulate modern browsers by replicating their TLS and HTTP handshake techniques. This differs from basic techniques like setting User-Agents or emulating a browser, as it relies on manipulating packets over the wire (Layer 7 HTTP handshakes or Layer 6 for TLS), which are library- or application-specific.

DEW #128 - AI Detection Engineering Uncertainty, 3D Threat Hunting and Salesloft Drift Shenanigans

Detection Engineering Weekly

By: Zack Allen

10 September 2025 at 14:03

Welcome to Issue #128 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I’ve had to cancel work travel to Datadog’s Paris Office due to an Air Traffic Controller strike
I’m very close to canceling ChatGPT and using Claude exclusively. It’s an excellent threat research and engineering co-pilot
You’ll read more about this below, but I’m getting more freaked out about developers as targets by threat actors. Attacks against dev tooling ecosystems and macOS are increasing, and both of these target sets are a mainstay for many firms’ operations

⏪ Check out last week’s issue if you’ve missed it!

This Week’s Sponsor: Material Security

Fortify Your Google Workspace, from Gmail to Drive.
Protect the email, files, and accounts within Google Workspace from every angle. Material Security unifies advanced threat detection, data loss prevention, and rapid response within a single automated platform so your lean team can do more with less. Deploy in minutes, integrate with your SIEM, and let “set-it-and-forget-it” automation run 24/7.
Gain enterprise-grade security without enterprise overhead.
Simplify Your Google Workspace Security

💎 Detection Engineering Gem 💎

The experience of the analyst in an AI-powered present by Julien Vehent

When I think of detection engineers, I think of three disciplines: software engineering, security analysis, and statistics. I wrote about the first two of these disciplines in Field Manual #1, and started touching on statistics in Field Manual #3. The Detection Engineering Mix is born of evolution and the necessity of a business. With limited human capacity paired with rapidly developing technology, you’ll need to adjust these three knobs in different configurations to prevent ticket queueing and alert fatigue.

What happens when you add a fourth circle to the Detection Engineering Mix? It would seem overwhelming and infeasible because the expectations of software engineering, security expertise, and statistics set the bar reasonably high. This is where the AI element comes into play. If you had told me it was a necessity two years ago, I would have told you to kick rocks. Now I’m not so sure.

In this post, Vehent offers his commentary on the uncertainty surrounding the integration of AI engineering and automation into our collective approach to threat detection. His issue with the mix, AI or not, is that new security engineers overemphasize aspects other than security expertise. This makes sense in some ways because software engineering is how we are supposed to “scale”, and it’s a hard requirement for security positions in more modern organizations.

He argues that AI will move us further away from the security expertise circle above, and we’ll lose the analytical rigor along with it. Detection and response teams should have their work cut out for them to triage and investigate, as it helps them experience the pain that software engineering, data science, and now AI can help solve. If they don’t feel the pain, how can we know when a security problem is solved?

🔬 State of the Art

Can't Hide in 3D by Certis Foster

Time-Terrain-Behavior is a threat detection modeling framework developed by MITRE that examines how three types of metadata from security telemetry can help pinpoint compromises. It’s a neat paper, so if you have time, get a cup of tea/coffee/drink and some other hygge items and check it out. These dimensions combine three alerting capabilities that we typically use in an atomic sense. Time is the baseline for when the machine is active. Terrain examines security tooling that generates observations on data from the machine, and behavior baselines for benign workstation activity.

Foster’s approach here is to apply the concept of TTB to a real dataset and assess its performance. They took Splunk’s Boss of the SOC (BOTS) Frothly dataset, loaded it into a SIEM, and laid out a blueprint to implement and find compromises using TTB. They broke down each TTB component into three sets of labels. Time{Morning, Evening, Weekend}, Terrain {Windows, Network, Cloud} and Behavior {Authentication, Execution, Access} all as distinct counts. Each distinct label was categorized even further, they plotted workstations out using some incredible SPL queries and got this:

This identified one station as an outlier, and Amber was the one that was compromised! No detection rules, just some clever labeling, intuition after reading the TTB paper, and security expertise. I’m a big fan of this type of vectorized approach to threat detection, though I see a few problems that Foster also addresses:

They can be cost-prohibitive if you have 1000s to 10000s of assets to protect
They lose specificity when an actor does something during core hours, so you’ll see a larger distance on sensor hits, but it may underfit on other components like time
Sensor hits here can also mess up specificity if your tooling is inadequate

I do love this mathy approach!

AWS CloudTrail Event Cheatsheet: A Detection Engineer’s Guide to Critical API Calls — Part 1 by Muh. Fani (Rama) Akbar

This piece contains a comprehensive compendium (alliteration, woo!) of security and detection-relevant AWS API calls for folks who want to learn more about AWS detection engineering, the most critical and cost-effective way to do it via AWS CloudTrail. CloudTrail is the control plane log source for AWS, and control plane events refer to administrative actions in this context. Administrator activity, Role creation, and Secret and Key creation are all examples of administrative actions.

So, Akbar took this a step further and mapped out as many security-relevant CloudTrail logs as possible across the MITRE ATT&CK chain. Each MITRE section contains the key events, attacker activities using the AWS CLI, and the relevant SQL detection rule. The reason I love these types of blogs is not just for the educational content, but for the provable elements of the detection. You can run the attack commands on the red team side and observe how they log on the blue team side.

Leveraging Raw Disk Reads to Bypass EDR by Christopher Ellis, Andrew Steinberg and Austin Munsch

This piece by the offensive security team at Workday exposes how to leverage raw disk reads to the harddrive to bypass controls and monitoring from a modern EDR. The most interesting aspect of this research is that it extends beyond the Windows Operating System (OS) and directly into the process of building your own driver to interact with physical devices. Windows contains several layers of APIs in user-land and kernel space. Some of these layers are a result of compatibility, which Microsoft obsesses over, so that you can ship code and have interoperability with their ecosystem.

These APIs are what EDRs monitor. Function hooking via Kernel callbacks, ETW monitoring, and system filter drivers provides EDR vendors with several ways to detect malicious activity in a process, files, or a user session. But if you can connect to a driver directly, you can avoid these mechanisms and, in this case, read data from sensitive files.

Ellis, Steinberg, and Munsch did just that. If you have a BYOVD or a user with permissions to access the driver, you can run their PoC, which performs the raw disk read, parsing and searching for something like the Windows SAM file; you can do a fileless read without triggering an EDR. Here’s a mermaid diagram I created with Claude to visualize the attack flow:

Software Development Nuggets for Security Analysts Part 2: The Browser by David Burkett

This is a continuation of a blog post from David’s Part 1, released almost three years ago. I love the framing of trying to understand a security concept by first understanding why a software developer, or in this case, a critical piece of software like a web browser, would implement their code in a particular manner. In this case, Burkett studies why browsers are an attractive target for threat actors. But, to understand why it’s appealing, breaks down the threat model of their architecture.

The threat model revolves around how a Browser stores and protects sensitive data, such as cookies, session, and other browser artifacts like extensions that store this type of data. These have to be stored somewhere, so he uses Velociraptor, a forensics tool, to analyze file directories and file names where a threat actor may attempt to read and exfiltrate that data.

☣️ Threat Landscape

When interesting threat landscape news emerges that warrants a deeper discussion, I typically post it under Threat Landscape. Still, I minimize the amount of analysis I provide due to space constraints. I’m going to try something a little different here and go deep on stories from time to time.

⚡ Emerging Spotlight: Salesloft Breach and the Victimology of Developer Tooling and Environments

Update on Mandiant Drift and Salesloft Application Investigations by Salesloft

Myself, like many other detection and threat intel folks, have availability bias. We apply our analysis of the news stories and incident details based on the available information, and in detection-speak, that falls within our expertise and the sources we typically build detections around. Without news stories or internal incident data, it’s challenging for us to determine whether something is worth defending.

I posted the Drift compromise details in my last issue, but what I’ve linked here is their investigative notes on the breach timeline. I said this on my LinkedIn post, but it reads like a modern red team report. The initial access is unknown, but the actor gained access to Drift’s GitHub organization, performed reconnaissance, and established persistence. They exfiltrated their codebase and then pivoted to their AWS environment, where they extracted OAuth secrets from their customers with the Drift integration. This is where the threat actor began accessing Drift customers and extracting additional data, including Salesforce information, extra secrets, and customer data.

My availability bias here is that this is the first time this has ever occurred, with pivoting techniques affecting GitHub, AWS, and an Application during the compromise chain. I’m amazed and terrified because it’s showcasing that the new victimology, which threat actors target more and more, is developers and their tools.

Below is the breach diagram I made with Claude and posted on social media.

⚡ Quick Hits

NPM debug and chalk packages compromised (HackerNews post)

A single NPM maintainer had their NPM account compromised, resulting in the backdooring of 18 packages with over 2 billion weekly downloads. This is the HackerNews post (his reply is the first one), and it reaffirms my point in the above spotlight post: Threat actors are targeting Developers and their tools.

Contagious Interview | North Korean Threat Actors Reveal Plans and Ops by Abusing Cyber Intel Platforms by Aleksandar Milenkoski, Sreekar Madabushi (Validin) and Kenneth Kinion (Validin)

SentinelOne and researchers from Validin identified a cluster of DPRK actors leveraging Validin to hunt for their command and control (C2) infrastructure in Internet-wide scan data. The amusing part here is that as they set up monitoring and queried these platforms to identify their infrastructure, they in turn provided Validin and SentinelOne with the queries to locate their infrastructure.

s1ngularity's Aftermath: AI, TTPs, and Impact in the Nx Supply Chain Attack by Rami McCarthy

Something is in the air regarding open-source software supply chain attacks. McCarthy does an excellent job summarizing the s1ngularity security incident, which involved the theft of an npm publishing token via a malicious pull request on GitHub. Users who downloaded the latest version had their secrets stolen; many had their private repositories uploaded to GitHub as public, and a peculiar technique of using AI to identify interesting files was employed.

🔗 Open Source

andrewkolagit/DetectPack-Forge

Neat full-stack application that uses Gemini and n8n to turn natural language into Detection rules. You’ll see many companies offering this type of service, but it’s impressive to see something implemented and shared on GitHub.

almounah/orsted

Yet another for educational use, post-exploitation, and C2 framework. Their docs site looks promising in terms of how it works, so it’d be good to add this for detection backlogs to check for telemetry.

DeepSpaceHarbor/Awesome-AI-Security

awesome-* list this time for AI Security resources. This list is much more academically focused, but you’ll find a few blogs and code examples in there too.

tclahr/uac

Unix-like Artifact Collector (uac) is a forensic tool designed to collect breach artifacts across various flavors of Unix and Linux. It’s modular, allowing you to add artifact collections via configurable YAML files. It’s no install, meaning the core functionality is a big ol’ shell file, which means it can run on everything from NAS boxes to IoT devices.