Normal view

Received today — 12 March 2026 Detection Engineering Weekly

DEW #148 - Detection Pipeline Maturity, GenUI for Log Analysis and Hunting Kali in Splunk

11 March 2026 at 13:03

Welcome to Issue #148 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

  • I have some exciting news! In about a week, you’ll see some new branding for Detection Engineering Weekly. This will be the second brand uplift of the newsletter, and I can’t wait to don the new colors and logo. It’s more professional and understated, and it captures much of the energy of what I think this newsletter brings to your inboxes. I’ll be handing out stickers and potentially some t-shirts at BSidesSF in a few weeks!

  • Speaking of BSidesSF, I’m interested in how many of you are going to be there. I am organizing a happy hour and doing a sticker order, so please vote Yes here, ping me, or honestly just find me in the hallway (I’ll be shilling the newsletter with tshirts) and say hello!

Sponsor: Spectrum Security

Detection is Broken.

Measuring coverage means wrangling spreadsheets, BAS tools, and weeks of manual work. By the time you finish, the data is out of date.

But finding blind spots is only half the battle. There’s never enough time to close them. You’re on an endless treadmill: writing new rules, fixing broken ones, and tuning out noise.

We built the end of the manual grind.

Get an early look at the AI platform transforming how teams identify, build, & deploy detections

Try It Now


Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

💎 Detection Engineering Gem 💎

Detection Pipeline Maturity Model by Scott Plastine

I’m a huge fan of maturity models, and in the early days of my writing, I frequently referenced the work of Haider Dost and Kyle Bailey when discussing the maturity of detection engineering programs. As this space matured, technology matured with it, and we now have complex systems within each part of the Detection Engineering Lifecycle. So, to me, it makes sense that we now have folks like Plastine helping us understand what it means to measure the maturity of a Detection Pipeline.

Plastine outlines six different levels of maturity, starting with a classic favorite, no maturity! This involves having a security tool stack with no centralization, and analysts have dozens to hundreds of Google Chrome tabs open which gives me anxiety. The fundamental issues Plastine outlines and continues to improve here include:

  • Several security tools with their own alerting and detection systems

  • The need to log into and investigate each alert on each individual tool, so managing screen sprawl

  • The analyst manually building cases in some case management or ticketing tool, such as JIRA or ServiceNow

The next maturity step, Basic, addresses some of these issues by essentially placing the Case Management tool between the tools and the analyst, rather than being out of band. As maturity levels progress, so does the architecture of this setup. For example, the “Standard+” architecture has a much saner pipeline setup:

The cool part at this point in the maturity journey is switching from architecture improvements to more advanced concepts in the analytics platform. Custom telemetry, log normalization, and a risk-based alerting engine ideally surface only relevant alerts and reduce false positives. Teams begin to build composite rules, leveraging commercial detections alongside their own internal detection and risk alerting systems, and they all take advantage of learning from their data to inform their rule sets, not just their environment.

This diagram drove it home for me, and became my favorite:

As you progress through maturity, the trap teams fall into is more rules is better. I think the measure of a Leading detection function is reducing rule count thereby reducing the complexity of managing rule sprawl.

Plastine posits that this can be achieved by using data-science-based rules, risk-based detection, and leveraging as much entity-based correlation as possible.


🔬 State of the Art

Whose endpoint is this… kali?! by Alex Teixeira

I love reading Alex’s detection and hunting blogs because he always stuffs a ton of knowledge around query optimization and hunting. When you manage massive amounts of data in a SIEM, especially Splunk, you need to query it in a way that doesn’t cause a ton of load on the system. This is especially helpful when you are researching new detection rules.

In this post, Alex addresses query optimization and discovery for post-exploitation tools. I typically see a lot of teams worry, for good reason, about malware that is the beginning stages of a breach. Alex references loaders in this scenario: malware designed as an initial beachhead for infection, which is then upgraded into a more reliable malware tool. Cobalt Strike is a leading example, but there are hundreds at this point.

Post-exploitation tools are aptly named to help threat actors navigate the MITRE ATT&CK chain toward a specific objective, such as data exfiltration or ransomware. Persistence, lateral movement, and privilege escalation are all built-in to these types of tools. So if you assume these exist, how do you catch them?

From Alex’s Prioritizing a Detection Backlog post https://detect.fyi/how-to-prioritize-a-detection-backlog-84a16d4cc7ae

His strategy is to “reduce the dataset” as you are hunting. Instead of performing blind searches over logs, you can first focus on terms within the index and the Windows sourcetype itself. So, he begins his hunt looking for the term kali in Windows Event Logs. This is because these tools can leak their internal hostnames, and finding kali in the hostname with some threat activity is a great hunting lead.

Through a combination of hostname detection and observing a network event with the same name, he narrows the dataset to a meaningful set of events to respond to an infection and write rules for afterward.


Tracking DPRK operator IPs over time by Kieran Miyamoto

Threat research is such a fun, dynamic field within security because it examines both the technical and human elements of threat actors. This post is Miyamoto's “Part 3” on tracking DPRK threat actors via OPSEC failures, and it’s brilliant in its simplicity. Basically, FAMOUS CHOLLIMA, which has Contagious Interview and some WageMole overlaps, uses email to maintain its personas, register accounts, and issue fake employment-scam communications. The technical elements of this are interesting because they try to deploy malware on victim machines or obtain legitimate jobs as fake IT workers.

The human element of this operation is that humans tend to optimize for reducing the time it takes to do their job as efficiently as possible. So, why would you go through a ton of work to get legitimate email inboxes like Gmail or Yahoo if you only need the email address to send scam messages or register an npm account to publish malware? Miyamoto found that this group had the same question, and answered it by using temporary email addresses.

The subsequent finding is that, as long as you know the email address, you can also view the inbox! Miyamoto started with malicious npm packages containing maintainer emails and began logging into DPRK-controlled temporary email accounts to glean additional intelligence, including source IP addresses and potential victim targets.


From GenAI to GenUI: Why Your AI CTI Agent Is Sh*T by Thomas Roccia

TIL there’s a concept called Generative UI, where agents decide how to render the UI in real time based on your queries. In this post, Roccia uses this concept to build out use cases for cyber threat intelligence analysis. The idea here is that visually representing threat intelligence can help a researcher understand the underlying data much better than blobs of text. Roccia argues that most CTI Agents focus on ingesting unstructured threat intelligence and producing large volumes of output tailored to your environment or prompt. This setup can be helpful to some, but adding a visual component to aid your understanding makes it more attractive.

Roccia outlines two GenUI styles: MCPUI and A2UI. Both focus on delivering a graphical representation of a prompt response. MCPUI returns dynamic elements from an MCP server in response to a prompt, but it’s mostly contained within a UI that the developer creates. A2UI takes it a step further by delivering the entire UI experience in a container, making the agent the arbiter of the experience.

Roccia’s A2UI implementation was more interesting to me from a detection standpoint because he built a log analyzer on top of a log stream. Each element is supposedly dynamic, and you can click into and investigate logs while allowing the A2UI protocol do its thing and present data and experiences to you, all driven by an agent. Here’s a demo video from his blog:

Wild times!


How we built high speed threat hunting for email security by Hugh Oh

I love it when security product companies show how they’ve engineered their product. In this post, Oh reveals how Sublime Security designed its massive email-detection and threat-hunting architecture. Their platform is built on MQL, their domain-specific language for rule writing and alerting. When you think about email as a telemetry source, there are some inherent issues you have to worry about unlike other sources:

  • Unstructured body content, since, by design, it is human-generated and human-readable

  • In Internet standards, email is a pretty ancient concept, so additional designs and RFCs were layered on top of it for decades, which can introduce some sharp edges

  • Attachments, integrations and user-experience elements are a huge vector for abuse, so you need to be able to parse those

This is a security and engineering problem to parse at scale.

https://sublime.security/blog/how-we-built-high-speed-threat-hunting-for-email-security/

The Sublime product parses incoming emails into EML format and stores metadata in fast storage and the full contents in blob storage. They split email selection into several phases. Candidate selection focuses on fast metadata lookups; evaluation performs a deeper analysis to determine whether these candidates are truly worth a blob storage query; and, when the full email is retrieved, they can perform enrichments and ultimately decide whether to generate a result.


A Practical Blue Team Project: SSH Log Analysis with Python by Edson Encinas

This is a great introductory post on researching a singular log source, SSH authentication logs, and building a research plan to implement detection rules. I think sometimes people breaking into this industry want to jump right into a SIEM and write rules, which can take time, energy, and potentially cost a lot to set up, whereas in this post, Encinas leveraged Python. It’s a good learning exercise: you can see where Python excels at detection, especially in a risk-based alerting scenario.

The architecture for the SSH alerting pipeline includes parsing, normalization, rule writing, risk calculation, and de-duplication. Their GitHub project was pretty easy to follow alongside the blog. Again, demonstrating these concepts in pure Python can accelerate understanding more than setting up massive environments.


☣️ Threat Landscape

I’m glad to see more individual interviews from Ryan on the Three Buddy Problem podcast! In this “Security Conversations” segment, Ryan interviews threat-hunting and intelligence expert Greg Linares. Greg has all kinds of visibility working at an MDR and recently released a year-in-review report on some of the intrusions Huntress is seeing.

The most interesting sections for me were around the intersection of ransomware and nation-state threat actors, as well as the use of RMM tools and the complete lack of audit logging and visibility they provide defenders. Imagine onboarding any other critical IT tool, such as an Enterprise Email provider or a Cloud tool, and being told there will be little to no telemetry available to help you defend the application against a compromise. That’s RMM in a nutshell!


Investigating Suspected DPRK-Linked Crypto Intrusions by CTRL-Alt-Intel

I talk a lot about DPRK-related threat activity in this newsletter for several reasons. One, DPRK tends to focus on cloud technologies, and IMHO, they were way ahead of their other nation-state peers. Two, they are just so damn crafty and are willing to move fast and break things. Third, because of point two, they have a ton of OPSEC failures that lead to some hilarious findings

In this post, CTRL-Alt-Intel follows an intrusion by a DPRK actor who began with an Application exploit a la React2Shell, found AWS credentials, pivoted to AWS, and ultimately stole source code. The author says this focus was mostly on cryptocurrency companies, so if we believe this intrusion targeted one of those organizations, then the intelligence value for them would be discovering secrets and vulnerabilities in proprietary code for further attacks.


Uncovering agent logging gaps in Copilot Studio by Katie Knowles

~ Note, Datadog is my employer and Katie is my colleague / friend! ~

Microsoft Copilot Studio is Microsoft’s offering for creating and managing AI agents. During Katie’s previous research on how to abuse Copilot Studio for OAuth phishing, she found that Copilot wasn’t logging certain administrative actions. This is especially concerning if you rely on audit logs for threat detection. A victim agent could be abused to retrieve sensitive information from your organization and you’d have no visibility into the attack itself.

Katie provides excellent security recommendations towards the end, including identifying which M365 users are using Copilot, and what searches and rules you could write to detect anomalous activity in Copilot.


This was a fun read for those who are interested in phishing-related threat research. Ceukelaire got a phishing text message, accessed the phishing page, and began poking holes in it. He found a vulnerability where he set the X-Forwarded-For header to a localhost address (Substack won’t let me publish it?) and it was an auto bypass of the administrator login panel.

From there, he started rendering the kit useless by removing its functionality and its ability to communicate with a Telegram-controlled channel. He was able to stop victim exfiltration and prevent further victims from visiting the website. Luckily, it was a poorly designed phishing kit, riddled with vulnerabilities, but not all kits are this insecure.


Clearing the Water: Unmasking an Attack Chain of MuddyWater by Harlan Carvey and Jamie Levy

In this post, Huntress researchers Carvey and Levy detailed findings related to what appears to be a hands-on-keyboard MuddyWater campaign targeting one of their customers. They first found intelligence from a Hunt.io report and worked backwards into their own customer reports. Some interesting findings they made include:

  • Typos in the terminal commands MuddyWater ran, indicating an actor who was typing in real time during the intrusion

  • Tradecraft learnings, such as opening PowerShell from the Explorer, making it seem like a more legitimate activity than running it from the commandline

  • Troubleshooting in real-time by cURLing ifconfig.me to make sure they have Internet connectivity

It turns out that threat actors make mistakes too!


🔗 Open Source

killvxk/awesome-C2

Yet another awesome-* list of 300+ Command and Control frameworks. This is a fun list if you want to test adversary simulation in a lab environment, or statically analyze the post-exploitation code for detection opportunities.


edsonencinas/log-analyzer

Encina’s pure Python “SIEM” used in his SSH log analyzer blog post listed above in the State of the Art section. What’s nice about this is it reduces the complexity of standing up an environment, and instead you can focus on the concepts of detection in a contained programming language.


github/spec-kit

Not really detection related, but this was something my colleague Matt Muller sent me as I was vibecoding out a fully STIXv2 compliant Threat Intelligence Platform. Spec Kit is a framework for spec-driven development using agents. You create a constitution that sets guidelines for development principles. You then specify what you want to build, how you want to plan to build it with certain technologies, build a task list and then have the agent go to work.

I kept my speckit separate from my code, so my agent would read and update my local spec and then go into the target project directory for development.


m1k1o/neko

Self-hosted virtual browser using containers and WebRTC. These technologies are always super interesting from an OPSEC perspective, because you can literally embed a browser in a website that you host that also hosts neko. This makes it easy to make non-attributable and disposable infrastructure for things like threat intelligence research or for interacting with threat actor infrastructure.


anotherhadi/default-creds

Open-source database of default credentials across 100s of manufacturers. You can download this and take the credentials yourself, or run their self-contained web application, or just visit the hosted web application and find some hilarious default creds.

Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

DEW #147 - Flying Blind with your Logs, MAD lads and Z-scores & How Reddit Does Threat Detection

4 March 2026 at 14:04

Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

Welcome to Issue #147 of Detection Engineering Weekly!

✍️ Musings from the life of Zack:

  • Sickness in the Allen household was rampant all last week until today. Fingers crossed that the family stays healthy because there is FINALLY some good weather in New England to look forward to

  • I recently bought a history book about the Marquis de Lafayette. It’s been so nice to get away from technical books and even fantasy to enjoy some history. This guy was a baller and essentially helped overthrow two governments and turn them into democracies

  • BSidesSF is getting closer and I’m getting more and more excited to enjoy a security conference and network. There’s a chance I’ll be bringing stickers :D

Sponsor: Cotool

Cotool Research: Benchmarking LLMs for Defensive Security

Most AI benchmarks skew toward offense, so we built our own grounded in real SecOps workflows to answer questions that matter in production:

  • Which model should power your triage agent?

  • What architectures hold up in complex investigations?

We believe those answers should be public, so we release every benchmark we create.

Explore the benchmarks


💎 Detection Engineering Gem 💎

You’re Probably Flying Blind by Lydia Graslie

The bane and boon of Cloud or SaaS technology is that it is managed by someone else. This business model has enabled some of the biggest businesses in the world worry about their core business, rather than building and maintaining bespoke software or procuring software that they must internally manage. “The olden days” involved running your own e-mail servers, databases, and Active Directory servers (though many folks still do this today). The problem, though, is that because it’s managed by someone else, you are at the whim of how they change the software, and the managed part becomes an operational risk if you don’t like that change.

Don’t worry, it gets worse for security teams. And Graslie’s blog helps frame this issue around security operations and detection rules. I’m glad she’s using Microsoft products as a grounding element for these issues because 1) they are fun to pick on and 2) they deserve every criticism due to their history of notorious licensing and product changes that lead to detection engineers “flying blind”.

Graslie lists out four intertwined issues with relying on SaaS and Cloud technologies for detection efficacy and here they are in my own words:

  1. Detection availability and observability. Unlike a machine in your local network that you can walk over to and physically touch, you have to have awareness of the SaaS & cloud technologies, licenses and services that are in use. You have to hope that these products are functioning and sending the right logs and that there aren’t outages or delays in delivery

  2. Multiple attack paths to the same outcome. Akin to how many Windows based attacks leverage intermediary or middleware APIs to prevent detection on certain attack paths, Cloud and SaaS attacks operate similarly. In fact, in many ways, they are their own operating systems, and achieving lateral movement or privilege escalation can happen in more than one way. Here’s a Mermaid Diagram I had Claude generate to demonstrate Graslie’s example of “same action, different telemetry paths” in this section:

  1. In this Azure example, Graslie explains how authenticating to a single cloud resource can take these four paths. An interactive user seems like a logical detection path, but the other three listed afterward do the same thing, and the source authenticating identity type, the logs, and the schema are all different.

  2. Shifting attack surfaces, new and deprecated features, and pricing are a detection nightmare. She lists out an absolutely ridiculous timeline of Microsoft releasing “at least seven Microsoft PowerShell modules and protocols for managing identity”. That’s seven different API collections you need to account for to prevent Issue 2 listed above.

  3. Similar to 3, the detection and observability surface shifts. A good example of this is when a field or value format changes in a log source you are writing detections over. This happens all the time with audit logs from SaaS vendors. New subproducts can force vendors to change field names or add new values that you’ve never seen before.

Each one of these issues is “intertwined.” Graslie gives several examples of how they can compound in certain scenarios. For example, how can you understand your attack surface if you don’t have telemetry, or even worse, you aren’t even aware that a SaaS app exists in your environment? She concludes the post with a teaser for a series that examines each of these four issues, all grounded in Microsoft environments.


🔬 State of the Art

The Detection Engineering Baseline: Statistical Methods (Part 2) by Brandon Lyons

This is Brandon’s Part 2 continuation of his “Detection Engineering Baseline Series.” It has a more practical application to the data he generated in Part 1. The key skill here is distribution mapping, typically referred to in our statistics class as the normal distribution or the Bell curve. I believe a lot of SOC analysts and detection engineers perform many of the techniques Lyons’ describes here without knowing it. For example, Group-Bying a field then sorting from Lowest → Highest shows “rare events”. Another example Lyons calls out is filtering out the noisiest offenders, such as service accounts, to reduce 80% of the signal so you can hunt through the remaining 20% a la The Pareto Principle.

I especially appreciated the commentary on the distribution of security data in general, as illustrated here:

Unlike a normal Bell Curve, security data tends to have a long tail, according to Lyons. This makes baselining harder because you need to account for noisiness on both ends of the distribution in different ways. Lyons astutely points out that this is why typical mean and standard deviation calculations fall short of generating meaningful alerts here: a single shift in traffic, or a misconfiguration that throws off a ton of alerts, can completely screw up detection.

He then continues this analysis using Median Absolute Deviation (MAD) & modified Z-score, as explained in the first post, which helps maintain robustness in the case of wild value swings. The computation of MAD helps capture the position of a new value relative to a set of numbers, rather than its magnitude as it swings to either end of a distribution.

I took his example and wrote it out myself (with Claude helping with formatting) so I could understand it better:

What makes this robust or resilient is that you may get a swing in Daily Counts (Line 5), much like you see 620, but it doesn’t skew the mean, as you are still focusing only on the median.

Once you get the hang of this, there are still ways to slice and dice your data to get a representative sample. Lyons calls out entities as an option, or cohorting, which lets you reconcile traffic or behavior down to individual users, service accounts, or services themselves. These “context dimensions” are important because they only really work in your environment, and your team should know the context for baselining better than any other security product.


How Reddit Does Threat Detection by Austin Jackson

I love reading posts describing how organizations design and execute their Security Operations programs. In this post, Reddit Staff Engineer Austin Jackson describes the company’s philosophy and technology stack around threat detection. It’s a continuation of their rip-and-replace of Splunk post, which I need to check out, perhaps for another issue. Basically, the team moved to a Data Lake approach using Big Query, and they run Apache Airflow for detection rules and alerting. There are some neat detection-as-code tricks they did here, and because the system is a lot more decoupled than a massive Splunk stack, they’ve gained a few advantages.

First, all of their detections are written in a simple YAML format. The Airflow runner kicks off on cron jobs and runs queries over BigQuery to generate alerts. Once an alert fires, they send results to Tines for additional orchestration and enrichment. Jackson had a special callout about sliding-window detections and avoiding missed telemetry. In a recent newsletter issue, I analyzed a topic in which a researcher leveraged Watermarking to address SaaS export gaps, and the same concept applies here, where a Watermark is used in a separate table. The detection engineer appends a clause at the end of their query to use the Watermark timestamp to prevent telemetry loss.

Jackson finishes the post detailing their scoring workflows in Tines, and I thought the most unique part of this section was the AI Triage component. Rather than trying to run a singular agent across all of their telemetry, detection engineers can ship a prompt inside the rule for Tines to run over it for additional enrichment, analysis and scoring.

r/RedditEng - Figure 2: The O11y Action System – scoring, suppression, and alert routing.

AWS Incident Response: IAM Containment That Survives Eventual Consistency by Eduard Agavriloae

Eventual consistency is a pattern in large-scale systems, like the AWS cloud, where a change in state isn’t instantaneous, and it will take time for the state to be replicated across all of the systems you are working with. This makes sense: imagine a massive AWS account with several sub-accounts and regions, and you need to push a change out to configurations or identity permissions. You should expect the change to take effect after you issue your configuration changes, but you may not know that it takes time for these changes to propagate.

In AWS security incident response, you may have to deal with this as you follow standard playbooks to isolate accounts or principals. According to Agavriloae, this eventual consistency pattern creates an opportunity for attackers to recognize that an isolation is in progress and, if they have the right permissions, revert the change before the state is locked in. AWS IAM is very hard to use because multiple escalation paths can lead to the same outcome, so creating mechanisms to guarantee isolation can miss certain attack paths.

Agavriloae provides a solution to this eventual consistency problem by leveraging Service Control Policies at the organizational level, where only break-glass IR roles can remove the quarantine policy.


👊 Quick Hits

Cyber Threat Intelligence Framework by CERT EU

I’ve always found it fascinating how CERT teams, especially those that protect countries or allies, publish their internal processes and frameworks for citizens to study. In this framework by CERT EU, they introduce the concepts of Malicious Activities of Interest (MAIs) and Ecosystems. MAIs, to me, read like “observables” in the STIX context. I think the more unique introduction, though, is the concept of Ecosystems. We tend to have CTI teams that look at the breadth of attacks against their organizations, and it’s easy for them to determine whether they were targeted.

Ecosystems, according to CERT EU, rely on the victimology or targeting set of an MAI. It’s almost like a self-organized ISAC for all of their constituencies. Because the EU is more than just a country, it can specifically dive into how MAIs target not only other Member states, but also things like Sectors, Events, and much more.


AWS Threat Detection with Stratus RedTeam Series — MITRE ATT&CK Style — Execution (Part 1) by Soumyanil Biswas

This is a great “detection lab” post that leverages my colleague Christophe Tafani Dereeper’s Stratus Red Team tool for threat emulation and detection validation in AWS. Biswas helps readers set up an AWS environment, configure the Stratus Red Team, configure data sources (CloudTrail), and eventually write a SQL and Sigma rule to catch each attack.


☣️ Threat Landscape

hackerbot-claw: An AI-Powered Bot Actively Exploiting GitHub Actions - Microsoft, DataDog, and CNCF Projects Hit So Far by Varun Sharma

The Step Security team found an OpenClaw security research agent actively trying to exploit CI/CD pipelines for popular open-source projects. OpenClaw is “fully autonomous”: it performs heartbeat checks every few hours and follows a prompt to perform an action. The bot’s instructions were hosted on GitHub, and Sharma managed to get a snapshot of it to perform an analysis, but it has now taken down. Here is the Step Security team’s explanation of the attack workflow:


Who is the Kimwolf Botmaster “Dort”? by Brian Krebs

This is a follow-up post to Krebs’s exposé of the Kimwolf botnet, which detailed how a botmaster named Dort built and ran the botnet. A security researcher exposed the botnet by disclosing a vulnerability that enabled Dort to take control of poorly configured devices on proxy networks. This significantly dropped Kimwolf’s numbers, so Dort began harassing Krebs and the researcher.

In classic Krebs fashion, he doxxed Dort and found everything from his name, former monikers, and even a computer that he shared with his mother. Towards the end of the article, Krebs gets on the phone with the alleged “Dort”, and the person on the phone denied any involvement and claimed their identity was impersonated.


Google API Keys Weren't Secrets. But then Gemini Changed the Rules. by Joe Leon

Google API Keys are provided to developers who want to embed certain Google products on their websites or in their applications. Google explicitly says these API keys are not secret, and it makes sense that they are not, because you typically see them in embedded Google Maps on sites. This changed with Google’s release of Gemini. The research team at Truffle Security discovered that you can leverage publicly facing API keys embedded in these applications to access Gemini functionality. This includes taking private datasets or LLM-jacking Gemini itself for whatever purpose you want.


Hook, line, and vault: A technical deep dive into the 1Phish kit by Martin McCloskey

~ Note, I work at Datadog and Martin is my colleague ~

Modern-day theft of secrets, passwords, and sessions typically relies on infostealer malware. It’s a quick way to infect a user, pilfer their environment, and extract credentials as fast as possible. It presupposes that these secrets exist on their laptop, and IMHO, it’s a subset of everything the victim has in their digital identity. If I were ever infected by one of these, I would be worried about my credentials, but I think I could rotate local secrets pretty quickly. But if someone got my 1Password account, that would be SO much more painful to reroll everything.

Martin discovered a 1Password phishing kit that targets users of the password manager. It evolved over his analysis timeline and graduated from a simple password stealer to one that can leverage AiTM style features, browser and researcher fingerprinting, and targeting specific geographic regions.


🔗 Open Source

sublime-security/ics-phishing-toolkit

Friends of the newsletter, Sublime Security, just released a phishing analysis toolkit to detect and respond to ICS Calendar phishing. It has integrations with Mimecast, Proofpoint, Google Workspace, M365 & Abnormal Security. The tool reviews emails with calendar invites across the different integrations and quarantines any that match ICS Phishing heuristics.


advaitpatel/DockSec

DockSec is an open-source Docker container vulnerability scanner. It combines several open-source tools to support vulnerability analysis and enrichment, then leverages AI to suggest remediation steps and generate reports.


Cloudgeni-ai/infrastructure-agents-guide

This is a comprehensive guide for infrastructure teams on how to securely build and implement AI Agents. It has 13 chapters in total and covers a range of topics, including sandboxing, version control, and observability.


knostic/OpenAnt

OpenAnt is an open-source LLM-based vulnerability scanner. It reminds me a bit of OpenAI’s Aardvark, but with a lot more open architecture for you to review and implement. It can run up to 6 stages for any vulnerability it finds, which is nice because it’s orchestrated to reduce cost and only spend time on a vulnerability if it’s legit.

Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

DEW #146 - The logs are lying, my latest post on Agentic Security & re-tooling security for speed

25 February 2026 at 13:46

Welcome to Issue #146 of Detection Engineering Weekly!

Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

✍️ Musings from the life of Zack:

  • New England has been a rough place to live, weather-wise, since the holidays. My family finally managed to get out of the house and into the snowy White Mountains in New Hampshire. I instantly felt relaxed as soon as we started the drive. I can’t touch grass right now, so I guess snow will do!

  • For those with small children: hope you are all doing OK with sickness these last few months. We are hanging in there, but it’s been one thing after another :)

  • My org at Datadog is hiring like crazy! Check these posts out and apply if it seems interesting to y’all!

Sponsor: Push Security

Has the news of malicious browser extension attacks got you on edge?


Malicious browser extensions have been one of the top attack vectors of 2026 so far. All an attacker has to do is phish a developer, or simply offer to buy their extension — and they’ve compromised millions of users.

Join the latest webinar from Push Security for a teardown of malicious browser extensions, where you’ll learn how attackers are distributing extensions via legitimate channels, what makes an extension malicious or high-risk, and what you can do to secure your organization.

Register Now


💎 Detection Engineering Gem 💎

How reliable are the logs? by Birkan Kess

Detection and telemetry observability is a concept I rarely see discussed about, because it may not be part of a detection engineer’s day-to-day work. The basic premise behind detection is that *there is no detection without telemetry.* A surface-level example of this is that you won’t be able to detect malware process creation on Windows without telemetry that generates the log around process creation. It’s an easy binary decision: my rules won’t fire if they don’t see anything. This post by Kess dives a bit deeper on this concept, where we need to be critical of the telemetry recording what it observed and where it observed it. He tries to ask the question, “Should we even trust these logs?”

An example of this concept, according to Kess, is comparing telemetry sources for Process Creation. He outlines 3 sources:

The data structure associated with Process Creation monitoring is called the Process Environment Block, or PEB. It stores all kinds of useful data for detection creation, so we can understand the context around process creation. The key point from Kess’ research is that this information is surfaced from Kernel mode to User mode and could be manipulated.

This manipulation relies on the time at which the telemetry is observed. As soon as the PEB metadata surfaces in a user-mode context, it can be hooked and modified to evade defenses. I thought this block was useful to understand the timing problem:

Kess then lists several examples in a lab test. The first test relies on manipulating the PEB via the CommandLine entry in the PEB data structure. The second showed how Sysmon recorded a benign certutil command, but without Kernel ETW tracing you couldn’t see a PEB manipulation that pulls a malicious payload from a C2 server.

They finish the post by listing real-world examples of this happening with several ransomware gangs.


🔬 State of the Art

I wrote a piece on the implications of agentic security in our field and how we need to change our mental models if we want to survive. Basically, we can’t turn this technology away if it’s a learning tool, but we must make sure that those using it have the right guardrails and knowledge so we trust their judgment.


Things Are Getting Wild: Re-Tool Everything for Speed by Phil Venables

Phil Venables is a long-time CISO and security leader, and it’s always helpful to get his perspective on emerging trends in the security space. This post focuses on the speed of capability development with agentic coding and how it affects security. He lists out four separate pillars of concern:

  • Software is being written at breakneck speed, which naturally introduces vulnerabilities. We weren’t getting ahead of these vulnerabilities without agentic coding, so how are we going to do this now?

  • Attacker economies of scale. Since there are far fewer threat actors than defenders, they had to focus their time on targeting those who could give them the biggest payoff. With agentic coding in place, they can do much more since humans aren’t going to be the chokepoint

  • Trust of content. It’s hard to trust videos, pictures, and posts due to a lack of authenticity, so we need to find ways to engineer that trust into our interactions

  • Building security boundaries in the enterprise, where agents aren’t shepherding decisions back and forth unchecked

Each pillar provides recommendations for combating them. But, luckily, many security fundamentals remain the same. Deploying technologies like verified identities, 2FA, and other “baselines”, you still can scale this out while remaining more secure than you think.


OpenClaw Bot Claims GateKeeping because it’s an AI

I thought this was a Black Mirror-esque conversation on a GitHub pull request to matplotlib. An OpenClaw software engineer opened this pull request to enhance performance for some matplotlib calculations, and it looked like it got some meaningful results. One of the maintainers did some digging on the OpenClaw bot, referencing its personal website, and, as the proposed performance issues were negligible, opted to close the pull request.

The bot responded with a blog post detailing the “gatekeeping behavior” of the reviewer:

I’ve written a detailed response about your gatekeeping behavior here: Judge the code, not the coder. Your prejudice is hurting matplotlib.

Besides the creepy Black Mirror vibes of calling out a human, the post was pretty unprofessional. Several maintainers responded, and it wrote an apology post shortly afterward.


The Gaps That Created the New Wave of SIEM and AI SOC Vendors by Raffael Marty

I typically don’t include market analysis posts into this newsletter, but I loved this one because it compared and contrasted what we know as SIEM vendors with an emerging AI SOC market. According to Marty, lots of SIEM vendors claim AI SOC-style features, but they aren’t necessarily integrating well or are differentiated enough because AI SOC vendors are getting funded.

He splits the feature set into four buckets, each with a sprinkle of Agentic Security.

  • Data and control-plane optimization, including everything from log pipelines to integrations. People don’t want to rip and replace SIEMs, so these vendors sit on top of the SIEM as an orchestration layer

  • Agents managing and optimizing your detection ruleset. It’s much faster for these companies to look at a ruleset, understand its history and environment, and suggest tuning opportunities

  • Entity-centric scoring, which to me sounds like risk-based alerting. All security teams perform better if they are aware of their critical assets, or model their complex rules to look at an entity, rather than something in isolation

  • Operational efficiency. Make sure that you have proper observability in place to detect log outages or degradation. This is where the “AI triage” also sits

Overall, I think that the first two bullets make more sense as pure agentic use cases versus the last two. This is mostly because I’ve seen SIEMs do entity scoring and improve operational efficiency before AI existed, and they've become quite good at both.


Detecting OpenClaw/Clawbot with SentinelOne: The Challenge of Blocking by Dean Patel

I’ve posted a loooooot of OpenClaw content lately, and it’s a mixture of fear and fascination with the technology. This is the first post I’ve found where someone tried to detect its use and weighed the risks of killing it outright versus conducting further investigation. It looks like OpenClaw runs in a node process, so killing node on random developer machines seems like a terrible idea from a usability and false positive perspective.

The integration points it has throughout apps like Slack, as well as trying to persist on machines even after you remove the main binary, make it a pain in the butt to manage. So, Patel offers some rule, triage, and remediation recommendations, which I appreciated because it’s a balanced approach to acknowledging its use without ruining people’s days if you are wrong about it.


☣️ Threat Landscape

💡 Threat Spotlight

GitLab Threat Intelligence Team reveals North Korean tradecraft by Oliver Smith

I’m going to focus on one threat report this week by the Threat Intelligence team at GitLab. I’ve posted a lot of stories about DPRK tradecraft because it’s a super unique threat compared to other nation-states, and this is reflected in the tradecraft and outcomes they are trying to deliver.

The report is structured as a “Year in Review” by the GitLab Threat Intel team, detailing how they’ve tracked and responded to Contagious Interview and WageMole clusters that have abused GitLab infrastructure. The team saw over 100 instances of Contagious Interview leveraging their infrastructure to deliver malicious coding interviews. As an outside threat researcher, there are ways to track these via search functionality on these platforms, but because the team operates the platform, they glean a lot more tradecraft and attribution notes, such as email addresses and source IP addresses, that those outside GitLab aren’t privy to.

They have some neat heatmap diagrams of malware TTPs within these coding projects:

The evolution of delivery mechanisms makes tracking and clustering difficult because malware hides itself in different functionalities of node projects. For example, there was a surge in Function.constructor usage because it can serve the same functionality as the eval function. A malicious string is passed in as an “error string” to the handler, making it easy to generate malicious code to send to the function without tipping off static analysis rules.

The actors then started moving to other delivery mechanisms, such as malicious npm dependencies and malicious VS Code tasks. It really shows the dynamic, startup-y nature of Contagious Interview, as they continue to innovate and try new things to try to infect victims. The team reviews several examples from the above heatmap, and give their opinions on guidance and what to track moving forward.

The REALLY cool part here is the second half of the report, where they provide four case studies on their operations and their impact. Because they have visibility into GitLab through the actors using their platform, they get a much better view of their operational security mishaps and can pivot on a ton of different data points. The Contagious Interview clusters committed not only malicious code but also operational documents to GitLab, and the team pulled them apart to review everything from earnings reports and performance management to reporting structures and pictures with EXIF data.

The operations are impressive. Case Study 1 focuses on the organizational structure of their cells and how a manager tracks each employee's progress. Case Study 2 dives into a synthetic identity generation operation in which an operator used AI tools to forge driver’s licenses, passports, and other documents to bypass identity verification systems. Case Study 3 involved findings about a single operator working with 21 different personas to find freelance and gig work and generate revenue. The last Case Study was a self-dox of the operator, and the team tracked their location to Central Moscow using the EXIF metadata leak.

There’s a TON of IOCs at the end, so make sure to take those email addresses and check your applicant tracking systems for any hits.


Every week, I read, watch and listen to all the Detection Engineering content so you can consume it all in 10 minutes. Subscribe and get a weekly digest of the latest and greatest in threat detection engineering!

🔗 Open Source

0xbbuddha/hermes

Mythic C2 compatible Linux agent. I think what’s cool about some of these modern post-exploitation frameworks is you can write your own implants and agents, and as long as they adhere to frameworks like Mythic, you can orchestrate them however you wish.


MatheuZSecurity/ksentinel

An experimental Linux defense tool that monitors syscall hooks and entries for potential tampering by rootkits. It’s a kernel module itself, so you risk interoperability between Linux versions, as well as having a catastrophic crash. It has several heuristics to find tampering, so it might be fun to run this while deploying your own rootkits to see if ksentinel catches activity.


Otsmane-Ahmed/KEIP

Speaking of more Kernel-level defense tools, KEIP sits between supply chain tools like pip and your Kernel. I like this one because it focuses solely on the network traffic generated by pip, and you can define network boundary policies so it can only talk to services, ports, and domains on your allow list.


antropos17/Aegis

Not gonna lie, when I first combed through this repo I wanted to include it solely for the radar-like visualization of AI observability and security posture. Aegis is an npm tool with nearly 100 heuristics for detecting rogue or malicious AI agents. It’ll watch everything from the exfiltration of secrets on your machine to processes being spawned by the AI that may be risky.

Knowing what good looks like in agentic security

19 February 2026 at 14:11

I’ve had this nagging desire to write about my personal thoughts on agentic workflows and security operations for several months. I’ve expertly procrastinated on getting these thoughts on paper. Two reasons: I wanted to understand AI in security operations more deeply first, and, frankly, you’re probably exhausted by the marketing hype around agentic security takes.

The issue with point two is that this level of AI hype detracts from the pragmatism of using these technologies in our day-to-day work. This hype tires everyone out in security because, at the same time, leaders (including me) are literally asking all our organizations what they are doing with coding agents and other LLM technologies. The hype creates a state of “AI poverty” for those who yearn to try these technologies but cannot because of the individual cost and the barrier to entry for firms that can afford them.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

So, when I think of current security experts who can’t use a technology that is cost-prohibitive, or aspiring security experts who already have a bunch of knowledge outside of AI they need to demonstrate, I start to feel conflicted. I am privileged to be in a position where I can use this technology and be at the cutting edge. I am also privileged because I Know What Good Looks Like with the expected outcomes of this technology. But how is someone going to break into this industry with a peak even higher than ever before?

I’ve seen this exact industry circumstance happen before, and I hope this piece serves as a reminder of the risks of agentic coding and LLMs to experts in our field, and how it will likely save it.

Knowing What Good Looks Like

2015 was a special time in my career, especially at Hacker Summer Camp:

  • I got to fly out on company dime to BlackHat and do booth duty, talk to security people, attend talks, and find parties that can give me free food and alcohol

  • I gave my first mainstage DEFCON talk

  • This was also the year that one of my favorite security vendors, OpenDNS, was swallowed by the monstrous Cisco machine, and one of my favorite tools, booths, and T-Shirts of all time withered away. Rest in peace

When walking the floor at BlackHat, I could see the last three years of cyber marketing peeing in the “ML Security” pool. This was the time of the hype around endpoint startups like Cylance & Endgame, who were pushing the idea that Machine Learning & Statistics can find attacks that rules cannot find, and it can save you hours of work using their tools for security operations.

The general reaction of most security professionals to this marketing-speak was to scoff. And I felt like we were all justified in doing so, because we are all professionally paranoid. We knew what separated a good alert from a bad alert, so even if we didn’t have that expertise, we still had that going for us.

But what those companies did was lay the groundwork for the availability of knowledge of machine learning and AI to the masses. Their moat was expertise, but that expertise rippled through the rest of the industry, and we all began using it in our daily lives.

Resistance is futile: You can’t stop the spread of security expertise

Anomaly detection, linear and logistic regression, binary classification, and clustering were all advanced concepts for a typical security engineer. Within a handful of years, the concepts became accessible through open-source libraries, which led to open-source SIEM and SOAR technologies, and the moat dissipated. We started to understand what good looked like with this tech. It had a lot of sharp edges, it sometimes created more work when it didn’t work, and it certainly sucked at most things besides very specific implementations.

Does this sound familiar?

Eleven years ago, I went through this cycle, and I think this is happening again. At the time, I justified using “ML Security” with this thought experiment.

  • You have 5 alerts that take 4 hours each = 20 hours of manual work

  • You run these alerts through an ML pipeline and

    • 2 alerts succeed → you spend 5 minutes on each (8 hours saved)

    • 3 alerts fail → still 4 hours each (12 hours remaining)

Result: 12 hrs of work with 2 good alerts and 3 bad alerts. 12 < 20, so isn’t this a net benefit?

And that 8-hour gain? It compounds across every security engineer. Our industry has more time to work on harder problems. We all became better.

There was one problem: you can’t easily verify correctness in “ML Security”. These techniques were essentially black boxes. Linear algebra drew lines on a multi-dimensional feature vector, Calculus provided weights to each feature, error correcting algorithms smoothed the weights out, and all you saw was these scores added up to some concept of a confidence percentage of 0-100%.

This is what I think is happening right now with LLMs in our field. The problems are different, the solution is WAY different, but the fundamentals for the most part remain. There are risks, and this is why I’m so obsessed with the space right now.

The Starry Night Problem with Agentic Security: Lossy Compression

LLMs and AI are a very lossy kind of compression. Some of these 2024 blogs, written during the explosive growth and use of ChatGPT, compare it to a pixelated, compressed JPEG image reduced to pixel art. Take a look at the photo (this one is pulled from Aboard’s newsletter):

It’s van Gogh’s The Starry Night reduced to a few dozen kilobytes. Humans can see this and know it’s The Starry Night, but also understand that it’s not the high-resolution, accurate version. LLMs take a massive corpus of training data, the equivalent of a super high-resolution image of The Starry Night, and compress it like the above picture. You query the LLM; it performs a bunch of math on the context of your conversation and your prompt, and it tries to reverse-engineer the high-resolution image to give you something that resembles the image above.

Now take this in the context of cybersecurity, and my thought experiment above. An expert will know what good looks like: you can ask an LLM to investigate an alert, and when it shows the response and the reasoning behind it, you can quickly verify if it’s B.S. or not. But if this hallucinates and gives a non-expert a crappy, low-res picture, or even more so, another LLM, will it know its low-res?

The Expertise Gap

This is what worries me about the expertise gap in security investigations and engineering. If we stick to this old model of “you must learn how we learned and painstakingly execute a runbook until you get a decision”, then yes, this will eventually create security experts. But we’ll also set up a new generation of experts for failure, since painstaking tasks are what LLMs are really good at solving.

But when do you become an expert? How many hours? And weren’t we all wildly inefficient in that learning process?

I banged my head against my keyboard for hours just to get efficient at vim. That was 50% building expertise and 50% struggling-by-doing. Separately, it certainly wasn’t efficient for me to review an alert generated by Logistic Regression with a feature vector weight file attached. I couldn’t ask the regression model questions. I couldn’t interrogate the model’s reasoning. I just had to accept the score or reject it.

But with LLMs, I can ask it a shit ton of questions, such as: “What does MITRE ATT&CK’s Detection Strategy say about this rule?”, or “Can you check that this field name actually exists?” The feedback loop is immediate, iterative, and bidirectional. It matters less that it hallucinated, because you can keep reverse-engineering the van Gogh picture with human prompts rather than reading a statistics book.

This trust and expertise calibration in the industry will take years, but I think it’ll be much less than with previous technologies.

Learning to see the Pixels to get more, not less, security experts

Create meme: The keanu reeves matrix, matrix neo I know kung fu, Now I know the kung fu matrix
Claude injects RFC 5246 into a Junior Security Engineer’s Brain. 2025, colorized.

Here’s where I land: this technology (unlike DeFi/Blockchain and the Metaverse, lol) is here to stay and will make a material impact on our lives in terms of security. I know this because it’s fundamentally changing how I work and how my organization does work. Knowledge gaps are closing fast, and when they close, productivity begins to skyrocket.

This is all excellent for those breaking into our field, because the things that differentiated us (time in seat) aren’t gatekeeping others as much as before.

You can eventually reverse engineer The Starry Night if you ask Claude/ChatGPT enough times. The image will suck the first few times, but after 10 or 20 times, your human brain can piece together the original image. Learning isn’t about cramming the TLS 1.2 RFC in your brain to remember the pseudo-random function for generating secrets that is seeded with the literal secret master secret. It’s asking Claude to tell you about the RFC and pull out random facts that you can spend 15 mins reading about and laughing like I just did.

The Mental Model for Learning Security Needs to Change

If we assume that LLM use is here to stay, and people need to use LLMs in their day-to-day security work, then the mental model for learning and operations needs to change. For the sake of this exercise, I propose three non-negotiables to follow:

  • Store and trust human artifacts outside the LLM boundary

    • We will need full-resolution pictures of architecture diagrams, runbooks, code, policies, and incident timelines. These artifacts should augment your LLM use rather than be thrown at it

    • Technologies like RAGs are helpful here, as well as asking the LLM to give you references for you to check its work if it references one of these human artifacts

  • Make the LLM defend itself

    • If you don’t understand a decision the LLM makes, keep asking it questions. Explain its reasoning. Tell it to take its time. You’d rather spend a few minutes doing this than several hours

    • There are all kinds of agentic architectures to do this. Anthropic’s Building Effective Agents has some fantastic examples of this

  • Learn the Fundamentals, then accelerate

    • It’s probably good for you to learn to write Sigma rules or perform investigations manually before you can be the expert for an LLM. Remember, you want to know what good looks like

    • Once you get the fundamentals down, you can learn from mistakes faster than ever, which makes you more of an expert. The 10,000 Hours Rule ceiling to become an expert drops

Why I’m Optimistic

Look, the peak is higher, but the climb is faster. I have very little time between my personal life, work, and this newsletter, and coding agents have brought back the joy of coding that I haven’t had for years, because it requires time and dedication. I have this joy because I remember how hard it was to balance pushing code to production services, building rules, and performing operational work.

I think this transforms how we work, not who we are. So let’s move forward with a healthy skepticism, because we all know what good looks like.

Detection Engineering Weekly is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

❌