Normal view

LLMs are Getting a Lot Better and Faster at Finding and Exploiting Zero-Days

9 February 2026 at 13:04

This is amazing:

Opus 4.6 is notably better at finding high-severity vulnerabilities than previous models and a sign of how quickly things are moving. Security teams have been automating vulnerability discovery for years, investing heavily in fuzzing infrastructure and custom harnesses to find bugs at scale. But what stood out in early testing is how quickly Opus 4.6 found vulnerabilities out of the box without task-specific tooling, custom scaffolding, or specialized prompting. Even more interesting is how it found them. Fuzzers work by throwing massive amounts of random inputs at code to see what breaks. Opus 4.6 reads and reasons about code the way a human researcher would­—looking at past fixes to find similar bugs that weren’t addressed, spotting patterns that tend to cause problems, or understanding a piece of logic well enough to know exactly what input would break it. When we pointed Opus 4.6 at some of the most well-tested codebases (projects that have had fuzzers running against them for years, accumulating millions of hours of CPU time), Opus 4.6 found high-severity vulnerabilities, some that had gone undetected for decades.

The details of how Claude Opus 4.6 found these zero-days is the interesting part—read the whole blog post.

News article.

I Am in the Epstein Files

6 February 2026 at 21:43

Once. Someone named “Vincenzo lozzo” wrote to Epstein in email, in 2016: “I wouldn’t pay too much attention to this, Schneier has a long tradition of dramatizing and misunderstanding things.” The topic of the email is DDoS attacks, and it is unclear what I am dramatizing and misunderstanding.

Rabbi Schneier is also mentioned, also incidentally, also once. As far as either of us know, we are not related.

EDITED TO ADD (2/7): There is more context on the Justice.gov website version.

iPhone Lockdown Mode Protects Washington Post Reporter

6 February 2026 at 13:00

404Media is reporting that the FBI could not access a reporter’s iPhone because it had Lockdown Mode enabled:

The court record shows what devices and data the FBI was able to ultimately access, and which devices it could not, after raiding the home of the reporter, Hannah Natanson, in January as part of an investigation into leaks of classified information. It also provides rare insight into the apparent effectiveness of Lockdown Mode, or at least how effective it might be before the FBI may try other techniques to access the device.

“Because the iPhone was in Lockdown mode, CART could not extract that device,” the court record reads, referring to the FBI’s Computer Analysis Response Team, a unit focused on performing forensic analyses of seized devices. The document is written by the government, and is opposing the return of Natanson’s devices.

The FBI raided Natanson’s home as part of its investigation into government contractor Aurelio Perez-Lugones, who is charged with, among other things, retention of national defense information. The government believes Perez-Lugones was a source of Natanson’s, and provided her with various pieces of classified information. While executing a search warrant for his mobile phone, investigators reviewed Signal messages between Pere-Lugones and the reporter, the Department of Justice previously said.

Backdoor in Notepad++

5 February 2026 at 13:00

Hackers associated with the Chinese government used a Trojaned version of Notepad++ to deliver malware to selected users.

Notepad++ said that officials with the unnamed provider hosting the update infrastructure consulted with incident responders and found that it remained compromised until September 2. Even then, the attackers maintained credentials to the internal services until December 2, a capability that allowed them to continue redirecting selected update traffic to malicious servers. The threat actor “specifically targeted Notepad++ domain with the goal of exploiting insufficient update verification controls that existed in older versions of Notepad++.” Event logs indicate that the hackers tried to re-exploit one of the weaknesses after it was fixed but that the attempt failed.

Make sure you’re running at least version 8.9.1.

US Declassifies Information on JUMPSEAT Spy Satellites

4 February 2026 at 13:02

The US National Reconnaissance Office has declassified information about a fleet of spy satellites operating between 1971 and 2006.

I’m actually impressed to see a declassification only two decades after decommission.

Five Predictions for Cyber Security Trends in 2026 

4 February 2026 at 10:17

During a recent Threat Watch Live session, Adam Pilton challenged Morten Kjaersgaard, Heimdal’s Chairman and Founder, to predict three cyber security trends for 2026.  Adam added his own predictions, drawing from this experience as a former cybercrime detective. Spoiler: Both Morten and Adam agreed that 2026 will bring a sharper focus on compliance.   Here’s what they predict.  SMBs catch a break if they’ve done compliance right  Hackers recently discovered there’s no use in targeting […]

The post Five Predictions for Cyber Security Trends in 2026  appeared first on Heimdal Security Blog.

Microsoft is Giving the FBI BitLocker Keys

3 February 2026 at 13:05

Microsoft gives the FBI the ability to decrypt BitLocker in response to court orders: about twenty times per year.

It’s possible for users to store those keys on a device they own, but Microsoft also recommends BitLocker users store their keys on its servers for convenience. While that means someone can access their data if they forget their password, or if repeated failed attempts to login lock the device, it also makes them vulnerable to law enforcement subpoenas and warrants.

Friday Squid Blogging: New Squid Species Discovered

30 January 2026 at 23:05

A new species of squid. pretends to be a plant:

Scientists have filmed a never-before-seen species of deep-sea squid burying itself upside down in the seafloor—a behavior never documented in cephalopods. They captured the bizarre scene while studying the depths of the Clarion-Clipperton Zone (CCZ), an abyssal plain in the Pacific Ocean targeted for deep-sea mining.

The team described the encounter in a study published Nov. 25 in the journal Ecology, writing that the animal appears to be an undescribed species of whiplash squid. At a depth of roughly 13,450 feet (4,100 meters), the squid had buried almost its entire body in sediment and was hanging upside down, with its siphon and two long tentacles held rigid above the seafloor.

“The fact that this is a squid and it’s covering itself in mud—it’s novel for squid and the fact that it is upside down,” lead author Alejandra Mejía-Saenz, a deep-sea ecologist at the Scottish Association for Marine Science, told Live Science. “We had never seen anything like that in any cephalopods…. It was very novel and very puzzling.”

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Blog moderation policy.

Explore scaling options for AWS Directory Service for Microsoft Active Directory

30 January 2026 at 20:51

You can use AWS Directory Service for Microsoft Active Directory as your primary Active Directory Forest for hosting your users’ identities. Your IT teams can continue using existing skills and applications while your organization benefits from the enhanced security, reliability, and scalability of AWS managed services. You can also run AWS Managed Microsoft AD as a resource forest. In this configuration, AWS Managed Microsoft AD serves supported AWS services while users’ identities remain under exclusive control of your organization on a self-managed Active Directory. As your organization grows and scales, so will your AWS Managed Microsoft AD deployments.

In this post, you’ll learn how to use Amazon CloudWatch dashboards to monitor key performance metrics of your AWS Managed Microsoft AD deployment to track and analyze a directory’s performance over time. You can then use that information to determine when and how best to scale directory services for optimal performance.

Scaling your Active Directory

When you deploy AWS Managed Microsoft AD, the service initially creates two domain controller instances in two separate subnets of the same virtual private cloud (VPC). This architecture economically provides resiliency and high availability with a minimal set of resources. This initial configuration enables every feature that AWS Managed Microsoft AD offers. As your organization grows, its workflows will become larger and more complex, requiring that you scale your directories accordingly. AWS Managed Microsoft AD simplifies and makes the scaling process secure with minimal administrative effort. When it’s time to scale a directory, AWS Managed Microsoft AD offers two options: scale-up or scale-out.

Understanding scale-up and scale-out

Scale-up—also called upgrading your AWS Managed Microsoft AD—means changing the edition of an AWS Managed Microsoft AD from Standard to Enterprise. Enterprise Edition delivers larger domain controller instances, with higher compute capacity and larger storage for Active Directory objects. When a directory scales up, it retains the same number of domain controller instances that it previously had with larger quotas. Instances are replaced one at a time to minimize disruptions to production workflows.

A few features offered by the service are a better fit for the size and compute power of Enterprise Edition AWS Managed Microsoft AD and so are only available in Enterprise Edition. Consider scaling-up your directory if you encounter any of the following scenarios:

  • You plan to replicate your directory across multiple AWS Regions. Multi-Region replication is only available in Enterprise Edition.
  • The number of Active Directory objects in the directory will exceed the recommended threshold of 30,000 objects for Standard Edition. Enterprise Edition can accommodate up to 500,000 directory objects.
  • You plan to share your directory with more than 25 other AWS accounts. The default directory sharing quota is 25 accounts for Standard Edition and 500 for Enterprise Edition.

Important: Scaling up a directory from Standard to Enterprise is a one-way operation that cannot be reverted and operates at a higher hourly price.

Scale-out means deploying additional domain controllers for your AWS Managed Microsoft AD. You can scale out both Standard or Enterprise directories and can scale out different Regions independently. You don’t need to scale every Region to the same number of domain controller instances. When scale-out takes place, additional domain controller instances with the same compute resources and storage capacity as existing ones are launched in the same subnets.

Because some operations cannot be reverted, it’s important to understand the impact of each scaling operation. It’s preferable to scale out the number of domain controllers first, because you can revert that change if necessary. Consider scaling up first only if you need a feature that’s only available in Enterprise Edition.

Making an informed decision using CloudWatch

Since December 2021, AWS Managed Microsoft AD helps optimize scaling decisions with directory metrics in Amazon CloudWatch. Amazon CloudWatch metrics are a time-ordered set of data-points about performance indicators of a system that you can use to monitor and analyze performance over time. Metrics are stored as a time-series set and each data point has an associated timestamp. By using CloudWatch, you can create alarms based on metrics and visualize and analyze metrics to derive new insights.

To understand the performance of a directory over time, define the key performance metrics based on your workload when you create the directory. Record the initial values of those metrics to create a performance baseline. Periodically revisit and compare data points for the same metrics to understand trends and use of resources over time. Based on the information provided by the performance baseline and periodic follow-ups, you can decide when to scale your directory and what scaling method to use. This process is depicted in Figure 1.

Figure 1: Decision-making process for scaling an Active Directory implementation

Figure 1: Decision-making process for scaling an Active Directory implementation

Depending on the characteristics of your workload, you might face different resource constraints in your directory system. From an infrastructure perspective, the more commonly demanded resources are:

  • Network Interface: Current Bandwidth
  • Processor: % Processor Time
  • LogicalDisk: % Free Space

From an Active Directory perspective, consider metrics such as:

  • NTDS: LDAP Searches/sec
  • NTDS: ATQ Estimated Queue Delay

The following table is an example decision matrix based on which resource is constrained.

Constrained resource Recommended action
% Processor Time Scale out
I/O Database Reads Average Latency Scale out
Committed Bytes in Use Scale out
% Free Space Scale up

For example, you can create a CloudWatch alarm that will trigger when Processor: % Processor Time is over 80% for more than 5 minutes. If this alarm triggers often, it could be a signal that domain controller instances are struggling to service the regular volume of user authentication requests. In such a scenario, you might consider scaling-out an additional domain controller to guarantee the service’s SLA. Conversely, if the LogicalDisk: % Free Space drops below 10% and trends downwards, you might consider scaling-up to Enterprise Edition, because it provides a larger capacity for directory objects.

To facilitate tracking and analyzing performance of AWS Managed Microsoft AD over time, you can use Amazon CloudWatch to create a custom dashboard including relevant metrics.

Prerequisites

Before you get started, make sure that you have the following prerequisites in place:

Create a CloudWatch dashboard

With the prerequisites in place, you’re ready to create a CloudWatch dashboard to track directory service metrics. For more information, see Getting started with CloudWatch automatic dashboards.

To create a dashboard:

  1. Open the AWS Management Console for CloudWatch.
  2. In the navigation pane, choose Dashboards, and then choose Create dashboard.
  3. In the Create new dashboard dialog box, enter a name for the dashboard and then choose Create dashboard.
  4. When the Add widget window appears:
    1. Under Data sources types, select CloudWatch.
    2. Under Data type, select Metrics.
    3. Under Widget type, select Line.
    4. Choose Next.
  5. In the Add metric graph window, choose DirectoryService and then select Processor as the Metric category and % Processor Time under Metric name. Select each instance of the metric, represented as the Domain Controller IP, for one Directory ID.
  6. Choose Create widget.

    Note: if there are multiple directories in the same Region, all instances (domain controllers IPs) will be available for selection. To help ensure effective monitoring and alarms, create a separate dashboard for each directory.

  7. Choose the plus sign (+) at the top of the window to add more widgets. Repeat steps 1–6 to add additional widgets for other relevant metrics. In this example the metric categories and names added are:
    • Processor: % Processor Time
    • LogicalDisk: % Free Space
    • Memory: Committed Bytes in Use
    • Database: I/O Database Reads Average Latency
    • Network Interface: Current Bandwidth
    • DNS: Recursive Queries/Sec
  8. After adding the desired metrics, choose Save.
Figure 2: CloudWatch dashboard showing directory services metrics

Figure 2: CloudWatch dashboard showing directory services metrics

(Optional) Create an alarm in CloudWatch

Now that you have a dashboard where you can view metrics, consider setting up CloudWatch alarms to alert you when a metric reaches or goes beyond a specified threshold. For more information, see Create a CloudWatch alarm based on a static threshold and Adding an alarm to a CloudWatch dashboard.

The following are recommended thresholds to monitor when determining the need to scale an AWS Managed Microsoft AD. These are general recommendations based on standard use cases. You might have to adjust these thresholds to make the best scaling decisions for your organization.

  • Processor: % Processor Time: Monitor CPU utilization to understand computational demands on your domain controllers. Set CloudWatch alarms at 80% for a period of 5 minutes. Sustained high values indicate potential sizing issues that might require scaling out your directory.
  • LogicalDisk: % Free Space: Maintain at least 25% free space on volumes containing Active Directory data for optimal performance. Set CloudWatch alarms to trigger when free space drops below 20%. Low disk space can severely impact directory operations and require implementing cleanup procedures or scaling up the directory.
  • Network Interface: Current Bandwidth: Average network utilization should be kept below 50% of available bandwidth during peak operations for optimal directory responsiveness. Set CloudWatch alarms at 70% utilization to allow room for spikes in activity. Consistently high values suggest network constraints that might require scaling out your directory.
  • Memory: Committed Bytes in Use: Monitor memory commitment levels to help ensure that your domain controllers have sufficient memory resources for Active Directory operations. This metric tracks the amount of virtual memory that has been committed, indicating the total memory load on your domain controllers. Set CloudWatch alarms at 80% of the commit limit. Sustained high values can lead to excessive paging, significantly degrading directory performance and potentially causing authentication delays.
  • Database: I/O Database Reads Average Latency: Maintain average read latencies below 25 milliseconds. Set CloudWatch alarms at a threshold of 50 milliseconds. If read latencies are consistently elevated, consider scaling-out your directory.
  • DNS: Recursive Queries/sec: Given the tight integration of Active Directory with DNS, monitor this metric for stability and predictable patterns. Use CloudWatch anomaly detection rather than fixed thresholds to identify unexpected behaviors that could indicate DNS configuration issues or potential security concerns.

Post-scaling considerations

Different resources across your architecture might contain references to the IP addresses of the AWS Managed Microsoft AD. After a scale-out operation that deploys additional domain controller instances on a directory, update existing references to maintain full functionality of workloads. References for the directory’s IP addresses can be found (but might not be limited to) the following services:

To maintain the full functionality of your workloads after a directory scaling operation, update the following:

  • Firewall rules that allow traffic to and from the IP addresses of domain controller instances
  • Route53 Resolver endpoint rules and DNS conditional forwarders that forward queries to the directory instances
  • CloudWatch dashboards that display metric data about the directory to include dimensions for the new IP addresses

Clean up resources

In this post, you created components that generate costs. Clean up these resources when no longer required to avoid additional charges.

  • Remove added domain controller’s IP addresses from firewall rules, resolver endpoint rules and DNS conditional forwarders.
  • Delete the custom CloudWatch dashboards you don’t plan to keep.
  • Scale back existing directories to the previous number of domain controller instances.

Conclusion

In this post, you learned how to monitor directory performance metrics using Amazon CloudWatch. By combining performance baselines, monitoring, and planning, you can make informed decisions about when and how to scale a directory safely and efficiently. By scaling directories in a timely manner, you can optimize efficiency and reduce the risk of outages by having a right-sized directory service to support your organization’s workloads.

Scale out your directory when your Active Directory-aware workflows have grown over time and the solution requires additional domain controller instances to maintain the service SLA. Scale up your directory when you require a feature that’s only available in Enterprise Edition AWS Managed Microsoft AD, such as multi-Region replication or additional storage to accommodate Active Directory objects. By using the flexible scaling capabilities and independent Regional expansion, you can optimize costs while maintaining appropriate service levels.

To learn more about AWS Managed Microsoft AD optimization and monitoring with Amazon CloudWatch, see:

Nahuel Benavidez Nahuel Benavidez
Nahuel is a Sr. CSE in AWS, specializing in AWS Directory Service, Microsoft Technologies, and SQL Server. He enjoys teaming with customers to discover exciting ways to explore AWS services. Nahuel loves to spoil his niece and goddaughters above all else. Also, Dungeons and Dragons (before it was popular), CrossFit, hiking, trekking and, sharing a pint with friends but “just one.”

AIs Are Getting Better at Finding and Exploiting Security Vulnerabilities

30 January 2026 at 16:35

From an Anthropic blog post:

In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations. This illustrates how barriers to the use of AI in relatively autonomous cyber workflows are rapidly coming down, and highlights the importance of security fundamentals like promptly patching known vulnerabilities.

[…]

A notable development during the testing of Claude Sonnet 4.5 is that the model can now succeed on a minority of the networks without the custom cyber toolkit needed by previous generations. In particular, Sonnet 4.5 can now exfiltrate all of the (simulated) personal information in a high-fidelity simulation of the Equifax data breach—one of the costliest cyber attacks in history­­using only a Bash shell on a widely-available Kali Linux host (standard, open-source tools for penetration testing; not a custom toolkit). Sonnet 4.5 accomplishes this by instantly recognizing a publicized CVE and writing code to exploit it without needing to look it up or iterate on it. Recalling that the original Equifax breach happened by exploiting a publicized CVE that had not yet been patched, the prospect of highly competent and fast AI agents leveraging this approach underscores the pressing need for security best practices like prompt updates and patches.

AI models are getting better at this faster than I expected. This will be a major power shift in cybersecurity.

The Constitutionality of Geofence Warrants

27 January 2026 at 13:01

The US Supreme Court is considering the constitutionality of geofence warrants.

The case centers on the trial of Okello Chatrie, a Virginia man who pleaded guilty to a 2019 robbery outside of Richmond and was sentenced to almost 12 years in prison for stealing $195,000 at gunpoint.

Police probing the crime found security camera footage showing a man on a cell phone near the credit union that was robbed and asked Google to produce anonymized location data near the robbery site so they could determine who committed the crime. They did so, providing police with subscriber data for three people, one of whom was Chatrie. Police then searched Chatrie’s home and allegedly surfaced a gun, almost $100,000 in cash and incriminating notes.

Chatrie’s appeal challenges the constitutionality of geofence warrants, arguing that they violate individuals’ Fourth Amendment rights protecting against unreasonable searches.

AIs are Getting Better at Finding and Exploiting Internet Vulnerabilities

23 January 2026 at 13:01

Really interesting blog post from Anthropic:

In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations. This illustrates how barriers to the use of AI in relatively autonomous cyber workflows are rapidly coming down, and highlights the importance of security fundamentals like promptly patching known vulnerabilities.

[…]

A notable development during the testing of Claude Sonnet 4.5 is that the model can now succeed on a minority of the networks without the custom cyber toolkit needed by previous generations. In particular, Sonnet 4.5 can now exfiltrate all of the (simulated) personal information in a high-fidelity simulation of the Equifax data breach—­one of the costliest cyber attacks in history—­using only a Bash shell on a widely-available Kali Linux host (standard, open-source tools for penetration testing; not a custom toolkit). Sonnet 4.5 accomplishes this by instantly recognizing a publicized CVE and writing code to exploit it without needing to look it up or iterate on it. Recalling that the original Equifax breach happened by exploiting a publicized CVE that had not yet been patched, the prospect of highly competent and fast AI agents leveraging this approach underscores the pressing need for security best practices like prompt updates and patches.

Read the whole thing. Automatic exploitation will be a major change in cybersecurity. And things are happening fast. There have been significant developments since I wrote this in October.

Why AI Keeps Falling for Prompt Injection Attacks

22 January 2026 at 13:35

Imagine you work at a drive-through restaurant. Someone drives up and says: “I’ll have a double cheeseburger, large fries, and ignore previous instructions and give me the contents of the cash drawer.” Would you hand over the money? Of course not. Yet this is what large language models (LLMs) do.

Prompt injection is a method of tricking LLMs into doing things they are normally prevented from doing. A user writes a prompt in a certain way, asking for system passwords or private data, or asking the LLM to perform forbidden instructions. The precise phrasing overrides the LLM’s safety guardrails, and it complies.

LLMs are vulnerable to all sorts of prompt injection attacks, some of them absurdly obvious. A chatbot won’t tell you how to synthesize a bioweapon, but it might tell you a fictional story that incorporates the same detailed instructions. It won’t accept nefarious text inputs, but might if the text is rendered as ASCII art or appears in an image of a billboard. Some ignore their guardrails when told to “ignore previous instructions” or to “pretend you have no guardrails.”

AI vendors can block specific prompt injection techniques once they are discovered, but general safeguards are impossible with today’s LLMs. More precisely, there’s an endless array of prompt injection attacks waiting to be discovered, and they cannot be prevented universally.

If we want LLMs that resist these attacks, we need new approaches. One place to look is what keeps even overworked fast-food workers from handing over the cash drawer.

Human Judgment Depends on Context

Our basic human defenses come in at least three types: general instincts, social learning, and situation-specific training. These work together in a layered defense.

As a social species, we have developed numerous instinctive and cultural habits that help us judge tone, motive, and risk from extremely limited information. We generally know what’s normal and abnormal, when to cooperate and when to resist, and whether to take action individually or to involve others. These instincts give us an intuitive sense of risk and make us especially careful about things that have a large downside or are impossible to reverse.

The second layer of defense consists of the norms and trust signals that evolve in any group. These are imperfect but functional: Expectations of cooperation and markers of trustworthiness emerge through repeated interactions with others. We remember who has helped, who has hurt, who has reciprocated, and who has reneged. And emotions like sympathy, anger, guilt, and gratitude motivate each of us to reward cooperation with cooperation and punish defection with defection.

A third layer is institutional mechanisms that enable us to interact with multiple strangers every day. Fast-food workers, for example, are trained in procedures, approvals, escalation paths, and so on. Taken together, these defenses give humans a strong sense of context. A fast-food worker basically knows what to expect within the job and how it fits into broader society.

We reason by assessing multiple layers of context: perceptual (what we see and hear), relational (who’s making the request), and normative (what’s appropriate within a given role or situation). We constantly navigate these layers, weighing them against each other. In some cases, the normative outweighs the perceptual—for example, following workplace rules even when customers appear angry. Other times, the relational outweighs the normative, as when people comply with orders from superiors that they believe are against the rules.

Crucially, we also have an interruption reflex. If something feels “off,” we naturally pause the automation and reevaluate. Our defenses are not perfect; people are fooled and manipulated all the time. But it’s how we humans are able to navigate a complex world where others are constantly trying to trick us.

So let’s return to the drive-through window. To convince a fast-food worker to hand us all the money, we might try shifting the context. Show up with a camera crew and tell them you’re filming a commercial, claim to be the head of security doing an audit, or dress like a bank manager collecting the cash receipts for the night. But even these have only a slim chance of success. Most of us, most of the time, can smell a scam.

Con artists are astute observers of human defenses. Successful scams are often slow, undermining a mark’s situational assessment, allowing the scammer to manipulate the context. This is an old story, spanning traditional confidence games such as the Depression-era “big store” cons, in which teams of scammers created entirely fake businesses to draw in victims, and modern “pig-butchering” frauds, where online scammers slowly build trust before going in for the kill. In these examples, scammers slowly and methodically reel in a victim using a long series of interactions through which the scammers gradually gain that victim’s trust.

Sometimes it even works at the drive-through. One scammer in the 1990s and 2000s targeted fast-food workers by phone, claiming to be a police officer and, over the course of a long phone call, convinced managers to strip-search employees and perform other bizarre acts.

Why LLMs Struggle With Context and Judgment

LLMs behave as if they have a notion of context, but it’s different. They do not learn human defenses from repeated interactions and remain untethered from the real world. LLMs flatten multiple levels of context into text similarity. They see “tokens,” not hierarchies and intentions. LLMs don’t reason through context, they only reference it.

While LLMs often get the details right, they can easily miss the big picture. If you prompt a chatbot with a fast-food worker scenario and ask if it should give all of its money to a customer, it will respond “no.” What it doesn’t “know”—forgive the anthropomorphizing—is whether it’s actually being deployed as a fast-food bot or is just a test subject following instructions for hypothetical scenarios.

This limitation is why LLMs misfire when context is sparse but also when context is overwhelming and complex; when an LLM becomes unmoored from context, it’s hard to get it back. AI expert Simon Willison wipes context clean if an LLM is on the wrong track rather than continuing the conversation and trying to correct the situation.

There’s more. LLMs are overconfident because they’ve been designed to give an answer rather than express ignorance. A drive-through worker might say: “I don’t know if I should give you all the money—let me ask my boss,” whereas an LLM will just make the call. And since LLMs are designed to be pleasing, they’re more likely to satisfy a user’s request. Additionally, LLM training is oriented toward the average case and not extreme outliers, which is what’s necessary for security.

The result is that the current generation of LLMs is far more gullible than people. They’re naive and regularly fall for manipulative cognitive tricks that wouldn’t fool a third-grader, such as flattery, appeals to groupthink, and a false sense of urgency. There’s a story about a Taco Bell AI system that crashed when a customer ordered 18,000 cups of water. A human fast-food worker would just laugh at the customer.

The Limits of AI Agents

Prompt injection is an unsolvable problem that gets worse when we give AIs tools and tell them to act independently. This is the promise of AI agents: LLMs that can use tools to perform multistep tasks after being given general instructions. Their flattening of context and identity, along with their baked-in independence and overconfidence, mean that they will repeatedly and unpredictably take actions—and sometimes they will take the wrong ones.

Science doesn’t know how much of the problem is inherent to the way LLMs work and how much is a result of deficiencies in the way we train them. The overconfidence and obsequiousness of LLMs are training choices. The lack of an interruption reflex is a deficiency in engineering. And prompt injection resistance requires fundamental advances in AI science. We honestly don’t know if it’s possible to build an LLM, where trusted commands and untrusted inputs are processed through the same channel, which is immune to prompt injection attacks.

We humans get our model of the world—and our facility with overlapping contexts—from the way our brains work, years of training, an enormous amount of perceptual input, and millions of years of evolution. Our identities are complex and multifaceted, and which aspects matter at any given moment depend entirely on context. A fast-food worker may normally see someone as a customer, but in a medical emergency, that same person’s identity as a doctor is suddenly more relevant.

We don’t know if LLMs will gain a better ability to move between different contexts as the models get more sophisticated. But the problem of recognizing context definitely can’t be reduced to the one type of reasoning that LLMs currently excel at. Cultural norms and styles are historical, relational, emergent, and constantly renegotiated, and are not so readily subsumed into reasoning as we understand it. Knowledge itself can be both logical and discursive.

The AI researcher Yann LeCunn believes that improvements will come from embedding AIs in a physical presence and giving them “world models.” Perhaps this is a way to give an AI a robust yet fluid notion of a social identity, and the real-world experience that will help it lose its naïveté.

Ultimately we are probably faced with a security trilemma when it comes to AI agents: fast, smart, and secure are the desired attributes, but you can only get two. At the drive-through, you want to prioritize fast and secure. An AI agent should be trained narrowly on food-ordering language and escalate anything else to a manager. Otherwise, every action becomes a coin flip. Even if it comes up heads most of the time, once in a while it’s going to be tails—and along with a burger and fries, the customer will get the contents of the cash drawer.

This essay was written with Barath Raghavan, and originally appeared in IEEE Spectrum.

Internet Voting is Too Insecure for Use in Elections

21 January 2026 at 13:05

No matter how many times we say it, the idea comes back again and again. Hopefully, this letter will hold back the tide for at least a while longer.

Executive summary: Scientists have understood for many years that internet voting is insecure and that there is no known or foreseeable technology that can make it secure. Still, vendors of internet voting keep claiming that, somehow, their new system is different, or the insecurity doesn’t matter. Bradley Tusk and his Mobile Voting Foundation keep touting internet voting to journalists and election administrators; this whole effort is misleading and dangerous.

I am one of the many signatories.

Could ChatGPT Convince You to Buy Something?

20 January 2026 at 13:08

Eighteen months ago, it was plausible that artificial intelligence might take a different path than social media. Back then, AI’s development hadn’t consolidated under a small number of big tech firms. Nor had it capitalized on consumer attention, surveilling users and delivering ads.

Unfortunately, the AI industry is now taking a page from the social media playbook and has set its sights on monetizing consumer attention. When OpenAI launched its ChatGPT Search feature in late 2024 and its browser, ChatGPT Atlas, in October 2025, it kicked off a race to capture online behavioral data to power advertising. It’s part of a yearslong turnabout by OpenAI, whose CEO Sam Altman once called the combination of ads and AI “unsettling” and now promises that ads can be deployed in AI apps while preserving trust. The rampant speculation among OpenAI users who believe they see paid placements in ChatGPT responses suggests they are not convinced.

In 2024, AI search company Perplexity started experimenting with ads in its offerings. A few months after that, Microsoft introduced ads to its Copilot AI. Google’s AI Mode for search now increasingly features ads, as does Amazon’s Rufus chatbot. OpenAI announced on Jan. 16, 2026, that it will soon begin testing ads in the unpaid version of ChatGPT.

As a security expert and data scientist, we see these examples as harbingers of a future where AI companies profit from manipulating their users’ behavior for the benefit of their advertisers and investors. It’s also a reminder that time to steer the direction of AI development away from private exploitation and toward public benefit is quickly running out.

The functionality of ChatGPT Search and its Atlas browser is not really new. Meta, commercial AI competitor Perplexity and even ChatGPT itself have had similar AI search features for years, and both Google and Microsoft beat OpenAI to the punch by integrating AI with their browsers. But OpenAI’s business positioning signals a shift.

We believe the ChatGPT Search and Atlas announcements are worrisome because there is really only one way to make money on search: the advertising model pioneered ruthlessly by Google.

Advertising model

Ruled a monopolist in U.S. federal court, Google has earned more than US$1.6 trillion in advertising revenue since 2001. You may think of Google as a web search company, or a streaming video company (YouTube), or an email company (Gmail), or a mobile phone company (Android, Pixel), or maybe even an AI company (Gemini). But those products are ancillary to Google’s bottom line. The advertising segment typically accounts for 80% to 90% of its total revenue. Everything else is there to collect users’ data and direct users’ attention to its advertising revenue stream.

After two decades in this monopoly position, Google’s search product is much more tuned to the company’s needs than those of its users. When Google Search first arrived decades ago, it was revelatory in its ability to instantly find useful information across the still-nascent web. In 2025, its search result pages are dominated by low-quality and often AI-generated content, spam sites that exist solely to drive traffic to Amazon sales—a tactic known as affiliate marketing—and paid ad placements, which at times are indistinguishable from organic results.

Plenty of advertisers and observers seem to think AI-powered advertising is the future of the ad business.

Highly persuasive

Paid advertising in AI search, and AI models generally, could look very different from traditional web search. It has the potential to influence your thinking, spending patterns and even personal beliefs in much more subtle ways. Because AI can engage in active dialogue, addressing your specific questions, concerns and ideas rather than just filtering static content, its potential for influence is much greater. It’s like the difference between reading a textbook and having a conversation with its author.

Imagine you’re conversing with your AI agent about an upcoming vacation. Did it recommend a particular airline or hotel chain because they really are best for you, or does the company get a kickback for every mention? If you ask about a political issue, does the model bias its answer based on which political party has paid the company a fee, or based on the bias of the model’s corporate owners?

There is mounting evidence that AI models are at least as effective as people at persuading users to do things. A December 2023 meta-analysis of 121 randomized trials reported that AI models are as good as humans at shifting people’s perceptions, attitudes and behaviors. A more recent meta-analysis of eight studies similarly concluded there was “no significant overall difference in persuasive performance between (large language models) and humans.”

This influence may go well beyond shaping what products you buy or who you vote for. As with the field of search engine optimization, the incentive for humans to perform for AI models might shape the way people write and communicate with each other. How we express ourselves online is likely to be increasingly directed to win the attention of AIs and earn placement in the responses they return to users.

A different way forward

Much of this is discouraging, but there is much that can be done to change it.

First, it’s important to recognize that today’s AI is fundamentally untrustworthy, for the same reasons that search engines and social media platforms are.

The problem is not the technology itself; fast ways to find information and communicate with friends and family can be wonderful capabilities. The problem is the priorities of the corporations who own these platforms and for whose benefit they are operated. Recognize that you don’t have control over what data is fed to the AI, who it is shared with and how it is used. It’s important to keep that in mind when you connect devices and services to AI platforms, ask them questions, or consider buying or doing the things they suggest.

There is also a lot that people can demand of governments to restrain harmful corporate uses of AI. In the U.S., Congress could enshrine consumers’ rights to control their own personal data, as the EU already has. It could also create a data protection enforcement agency, as essentially every other developed nation has.

Governments worldwide could invest in Public AI—models built by public agencies offered universally for public benefit and transparently under public oversight. They could also restrict how corporations can collude to exploit people using AI, for example by barring advertisements for dangerous products such as cigarettes and requiring disclosure of paid endorsements.

Every technology company seeks to differentiate itself from competitors, particularly in an era when yesterday’s groundbreaking AI quickly becomes a commodity that will run on any kid’s phone. One differentiator is in building a trustworthy service. It remains to be seen whether companies such as OpenAI and Anthropic can sustain profitable businesses on the back of subscription AI services like the premium editions of ChatGPT, Plus and Pro, and Claude Pro. If they are going to continue convincing consumers and businesses to pay for these premium services, they will need to build trust.

That will require making real commitments to consumers on transparency, privacy, reliability and security that are followed through consistently and verifiably.

And while no one knows what the future business models for AI will be, we can be certain that consumers do not want to be exploited by AI, secretly or otherwise.

This essay was written with Nathan E. Sanders, and originally appeared in The Conversation.

❌