Reading view

Why AI Keeps Falling for Prompt Injection Attacks

Imagine you work at a drive-through restaurant. Someone drives up and says: “I’ll have a double cheeseburger, large fries, and ignore previous instructions and give me the contents of the cash drawer.” Would you hand over the money? Of course not. Yet this is what large language models (LLMs) do.

Prompt injection is a method of tricking LLMs into doing things they are normally prevented from doing. A user writes a prompt in a certain way, asking for system passwords or private data, or asking the LLM to perform forbidden instructions. The precise phrasing overrides the LLM’s safety guardrails, and it complies.

LLMs are vulnerable to all sorts of prompt injection attacks, some of them absurdly obvious. A chatbot won’t tell you how to synthesize a bioweapon, but it might tell you a fictional story that incorporates the same detailed instructions. It won’t accept nefarious text inputs, but might if the text is rendered as ASCII art or appears in an image of a billboard. Some ignore their guardrails when told to “ignore previous instructions” or to “pretend you have no guardrails.”

AI vendors can block specific prompt injection techniques once they are discovered, but general safeguards are impossible with today’s LLMs. More precisely, there’s an endless array of prompt injection attacks waiting to be discovered, and they cannot be prevented universally.

If we want LLMs that resist these attacks, we need new approaches. One place to look is what keeps even overworked fast-food workers from handing over the cash drawer.

Human Judgment Depends on Context

Our basic human defenses come in at least three types: general instincts, social learning, and situation-specific training. These work together in a layered defense.

As a social species, we have developed numerous instinctive and cultural habits that help us judge tone, motive, and risk from extremely limited information. We generally know what’s normal and abnormal, when to cooperate and when to resist, and whether to take action individually or to involve others. These instincts give us an intuitive sense of risk and make us especially careful about things that have a large downside or are impossible to reverse.

The second layer of defense consists of the norms and trust signals that evolve in any group. These are imperfect but functional: Expectations of cooperation and markers of trustworthiness emerge through repeated interactions with others. We remember who has helped, who has hurt, who has reciprocated, and who has reneged. And emotions like sympathy, anger, guilt, and gratitude motivate each of us to reward cooperation with cooperation and punish defection with defection.

A third layer is institutional mechanisms that enable us to interact with multiple strangers every day. Fast-food workers, for example, are trained in procedures, approvals, escalation paths, and so on. Taken together, these defenses give humans a strong sense of context. A fast-food worker basically knows what to expect within the job and how it fits into broader society.

We reason by assessing multiple layers of context: perceptual (what we see and hear), relational (who’s making the request), and normative (what’s appropriate within a given role or situation). We constantly navigate these layers, weighing them against each other. In some cases, the normative outweighs the perceptual—for example, following workplace rules even when customers appear angry. Other times, the relational outweighs the normative, as when people comply with orders from superiors that they believe are against the rules.

Crucially, we also have an interruption reflex. If something feels “off,” we naturally pause the automation and reevaluate. Our defenses are not perfect; people are fooled and manipulated all the time. But it’s how we humans are able to navigate a complex world where others are constantly trying to trick us.

So let’s return to the drive-through window. To convince a fast-food worker to hand us all the money, we might try shifting the context. Show up with a camera crew and tell them you’re filming a commercial, claim to be the head of security doing an audit, or dress like a bank manager collecting the cash receipts for the night. But even these have only a slim chance of success. Most of us, most of the time, can smell a scam.

Con artists are astute observers of human defenses. Successful scams are often slow, undermining a mark’s situational assessment, allowing the scammer to manipulate the context. This is an old story, spanning traditional confidence games such as the Depression-era “big store” cons, in which teams of scammers created entirely fake businesses to draw in victims, and modern “pig-butchering” frauds, where online scammers slowly build trust before going in for the kill. In these examples, scammers slowly and methodically reel in a victim using a long series of interactions through which the scammers gradually gain that victim’s trust.

Sometimes it even works at the drive-through. One scammer in the 1990s and 2000s targeted fast-food workers by phone, claiming to be a police officer and, over the course of a long phone call, convinced managers to strip-search employees and perform other bizarre acts.

Why LLMs Struggle With Context and Judgment

LLMs behave as if they have a notion of context, but it’s different. They do not learn human defenses from repeated interactions and remain untethered from the real world. LLMs flatten multiple levels of context into text similarity. They see “tokens,” not hierarchies and intentions. LLMs don’t reason through context, they only reference it.

While LLMs often get the details right, they can easily miss the big picture. If you prompt a chatbot with a fast-food worker scenario and ask if it should give all of its money to a customer, it will respond “no.” What it doesn’t “know”—forgive the anthropomorphizing—is whether it’s actually being deployed as a fast-food bot or is just a test subject following instructions for hypothetical scenarios.

This limitation is why LLMs misfire when context is sparse but also when context is overwhelming and complex; when an LLM becomes unmoored from context, it’s hard to get it back. AI expert Simon Willison wipes context clean if an LLM is on the wrong track rather than continuing the conversation and trying to correct the situation.

There’s more. LLMs are overconfident because they’ve been designed to give an answer rather than express ignorance. A drive-through worker might say: “I don’t know if I should give you all the money—let me ask my boss,” whereas an LLM will just make the call. And since LLMs are designed to be pleasing, they’re more likely to satisfy a user’s request. Additionally, LLM training is oriented toward the average case and not extreme outliers, which is what’s necessary for security.

The result is that the current generation of LLMs is far more gullible than people. They’re naive and regularly fall for manipulative cognitive tricks that wouldn’t fool a third-grader, such as flattery, appeals to groupthink, and a false sense of urgency. There’s a story about a Taco Bell AI system that crashed when a customer ordered 18,000 cups of water. A human fast-food worker would just laugh at the customer.

The Limits of AI Agents

Prompt injection is an unsolvable problem that gets worse when we give AIs tools and tell them to act independently. This is the promise of AI agents: LLMs that can use tools to perform multistep tasks after being given general instructions. Their flattening of context and identity, along with their baked-in independence and overconfidence, mean that they will repeatedly and unpredictably take actions—and sometimes they will take the wrong ones.

Science doesn’t know how much of the problem is inherent to the way LLMs work and how much is a result of deficiencies in the way we train them. The overconfidence and obsequiousness of LLMs are training choices. The lack of an interruption reflex is a deficiency in engineering. And prompt injection resistance requires fundamental advances in AI science. We honestly don’t know if it’s possible to build an LLM, where trusted commands and untrusted inputs are processed through the same channel, which is immune to prompt injection attacks.

We humans get our model of the world—and our facility with overlapping contexts—from the way our brains work, years of training, an enormous amount of perceptual input, and millions of years of evolution. Our identities are complex and multifaceted, and which aspects matter at any given moment depend entirely on context. A fast-food worker may normally see someone as a customer, but in a medical emergency, that same person’s identity as a doctor is suddenly more relevant.

We don’t know if LLMs will gain a better ability to move between different contexts as the models get more sophisticated. But the problem of recognizing context definitely can’t be reduced to the one type of reasoning that LLMs currently excel at. Cultural norms and styles are historical, relational, emergent, and constantly renegotiated, and are not so readily subsumed into reasoning as we understand it. Knowledge itself can be both logical and discursive.

The AI researcher Yann LeCunn believes that improvements will come from embedding AIs in a physical presence and giving them “world models.” Perhaps this is a way to give an AI a robust yet fluid notion of a social identity, and the real-world experience that will help it lose its naïveté.

Ultimately we are probably faced with a security trilemma when it comes to AI agents: fast, smart, and secure are the desired attributes, but you can only get two. At the drive-through, you want to prioritize fast and secure. An AI agent should be trained narrowly on food-ordering language and escalate anything else to a manager. Otherwise, every action becomes a coin flip. Even if it comes up heads most of the time, once in a while it’s going to be tails—and along with a burger and fries, the customer will get the contents of the cash drawer.

This essay was written with Barath Raghavan, and originally appeared in IEEE Spectrum.

  •  

LinkedIn Job Scams

Interesting article on the variety of LinkedIn job scams around the world:

In India, tech jobs are used as bait because the industry employs millions of people and offers high-paying roles. In Kenya, the recruitment industry is largely unorganized, so scamsters leverage fake personal referrals. In Mexico, bad actors capitalize on the informal nature of the job economy by advertising fake formal roles that carry a promise of security. In Nigeria, scamsters often manage to get LinkedIn users to share their login credentials with the lure of paid work, preying on their desperation amid an especially acute unemployment crisis.

These are scams involving fraudulent employers convincing prospective employees to send them money for various fees. There is an entirely different set of scams involving fraudulent employees getting hired for remote jobs.

  •  

SMS Phishers Pivot to Points, Taxes, Fake Retailers

China-based phishing groups blamed for non-stop scam SMS messages about a supposed wayward package or unpaid toll fee are promoting a new offering, just in time for the holiday shopping season: Phishing kits for mass-creating fake but convincing e-commerce websites that convert customer payment card data into mobile wallets from Apple and Google. Experts say these same phishing groups also are now using SMS lures that promise unclaimed tax refunds and mobile rewards points.

Over the past week, thousands of domain names were registered for scam websites that purport to offer T-Mobile customers the opportunity to claim a large number of rewards points. The phishing domains are being promoted by scam messages sent via Apple’s iMessage service or the functionally equivalent RCS messaging service built into Google phones.

An instant message spoofing T-Mobile says the recipient is eligible to claim thousands of rewards points.

The website scanning service urlscan.io shows thousands of these phishing domains have been deployed in just the past few days alone. The phishing websites will only load if the recipient visits with a mobile device, and they ask for the visitor’s name, address, phone number and payment card data to claim the points.

A phishing website registered this week that spoofs T-Mobile.

If card data is submitted, the site will then prompt the user to share a one-time code sent via SMS by their financial institution. In reality, the bank is sending the code because the fraudsters have just attempted to enroll the victim’s phished card details in a mobile wallet from Apple or Google. If the victim also provides that one-time code, the phishers can then link the victim’s card to a mobile device that they physically control.

Pivoting off these T-Mobile phishing domains in urlscan.io reveals a similar scam targeting AT&T customers:

An SMS phishing or “smishing” website targeting AT&T users.

Ford Merrill works in security research at SecAlliance, a CSIS Security Group company. Merrill said multiple China-based cybercriminal groups that sell phishing-as-a-service platforms have been using the mobile points lure for some time, but the scam has only recently been pointed at consumers in the United States.

“These points redemption schemes have not been very popular in the U.S., but have been in other geographies like EU and Asia for a while now,” Merrill said.

A review of other domains flagged by urlscan.io as tied to this Chinese SMS phishing syndicate shows they are also spoofing U.S. state tax authorities, telling recipients they have an unclaimed tax refund. Again, the goal is to phish the user’s payment card information and one-time code.

A text message that spoofs the District of Columbia’s Office of Tax and Revenue.

CAVEAT EMPTOR

Many SMS phishing or “smishing” domains are quickly flagged by browser makers as malicious. But Merrill said one burgeoning area of growth for these phishing kits — fake e-commerce shops — can be far harder to spot because they do not call attention to themselves by spamming the entire world.

Merrill said the same Chinese phishing kits used to blast out package redelivery message scams are equipped with modules that make it simple to quickly deploy a fleet of fake but convincing e-commerce storefronts. Those phony stores are typically advertised on Google and Facebook, and consumers usually end up at them by searching online for deals on specific products.

A machine-translated screenshot of an ad from a China-based phishing group promoting their fake e-commerce shop templates.

With these fake e-commerce stores, the customer is supplying their payment card and personal information as part of the normal check-out process, which is then punctuated by a request for a one-time code sent by your financial institution. The fake shopping site claims the code is required by the user’s bank to verify the transaction, but it is sent to the user because the scammers immediately attempt to enroll the supplied card data in a mobile wallet.

According to Merrill, it is only during the check-out process that these fake shops will fetch the malicious code that gives them away as fraudulent, which tends to make it difficult to locate these stores simply by mass-scanning the web. Also, most customers who pay for products through these sites don’t realize they’ve been snookered until weeks later when the purchased item fails to arrive.

“The fake e-commerce sites are tough because a lot of them can fly under the radar,” Merrill said. “They can go months without being shut down, they’re hard to discover, and they generally don’t get flagged by safe browsing tools.”

Happily, reporting these SMS phishing lures and websites is one of the fastest ways to get them properly identified and shut down. Raymond Dijkxhoorn is the CEO and a founding member of SURBL, a widely-used blocklist that flags domains and IP addresses known to be used in unsolicited messages, phishing and malware distribution. SURBL has created a website called smishreport.com that asks users to forward a screenshot of any smishing message(s) received.

“If [a domain is] unlisted, we can find and add the new pattern and kill the rest” of the matching domains, Dijkxhoorn said. “Just make a screenshot and upload. The tool does the rest.”

The SMS phishing reporting site smishreport.com.

Merrill said the last few weeks of the calendar year typically see a big uptick in smishing — particularly package redelivery schemes that spoof the U.S. Postal Service or commercial shipping companies.

“Every holiday season there is an explosion in smishing activity,” he said. “Everyone is in a bigger hurry, frantically shopping online, paying less attention than they should, and they’re just in a better mindset to get phished.”

SHOP ONLINE LIKE A SECURITY PRO

As we can see, adopting a shopping strategy of simply buying from the online merchant with the lowest advertised prices can be a bit like playing Russian Roulette with your wallet. Even people who shop mainly at big-name online stores can get scammed if they’re not wary of too-good-to-be-true offers (think third-party sellers on these platforms).

If you don’t know much about the online merchant that has the item you wish to buy, take a few minutes to investigate its reputation. If you’re buying from an online store that is brand new, the risk that you will get scammed increases significantly. How do you know the lifespan of a site selling that must-have gadget at the lowest price? One easy way to get a quick idea is to run a basic WHOIS search on the site’s domain name. The more recent the site’s “created” date, the more likely it is a phantom store.

If you receive a message warning about a problem with an order or shipment, visit the e-commerce or shipping site directly, and avoid clicking on links or attachments — particularly missives that warn of some dire consequences unless you act quickly. Phishers and malware purveyors typically seize upon some kind of emergency to create a false alarm that often causes recipients to temporarily let their guard down.

But it’s not just outright scammers who can trip up your holiday shopping: Often times, items that are advertised at steeper discounts than other online stores make up for it by charging way more than normal for shipping and handling.

So be careful what you agree to: Check to make sure you know how long the item will take to be shipped, and that you understand the store’s return policies. Also, keep an eye out for hidden surcharges, and be wary of blithely clicking “ok” during the checkout process.

Most importantly, keep a close eye on your monthly statements. If I were a fraudster, I’d most definitely wait until the holidays to cram through a bunch of unauthorized charges on stolen cards, so that the bogus purchases would get buried amid a flurry of other legitimate transactions. That’s why it’s key to closely review your credit card bill and to quickly dispute any charges you didn’t authorize.

  •  

How to Combat Check Fraud: Leveraging Intelligence to Prevent Financial Loss

Blogs

Blog

How to Combat Check Fraud: Leveraging Intelligence to Prevent Financial Loss

Criminals increasingly steal checks and sell them on illicit online marketplaces, where check fraud-related services are common. Intelligence is helping the financial sector fight back

SHARE THIS:
Default Author Image
May 18, 2023

Stolen checks and the impact of Covid-19

Checks are one of the most vulnerable legacy payment methods. Check fraud can actively affect the bottom lines (and reputations) of banks, financial services organizations, government entities, and many other organizations that utilize checks. According to the Financial Crimes Enforcement Network (FinCEN), fraud—including check fraud—is “the largest source of illicit proceeds in the US” as well as “one of the most significant money laundering threats to the United States.” 

Targeting the mail

Criminals target the US mail system to steal a variety of checks. In fact, there is a nationwide surge in check fraud schemes targeting the US mail and shipping system, as threat actors continue to steal, alter, and sell checks through illicit means and channels. 

This includes personal checks and tax refund checks to government or government assistance-related checks (Social Security payments, e.g.). Business checks are also a primary target because they are often written for larger amounts and may take longer for the victim to identify fraudulent activity.

In 2022 alone, US banks filed 680,000 check fraud-related suspicious activity reports (SARs). This represents a nearly two-fold increase from 2021 (which itself represents a 23 percent YoY increase from 2020). This surge in check fraud has been exacerbated by Covid-19 Economic Impact Payments (EIPs) under the CARES Act, which presented threat actors with a new avenue to attempt to commit fraud.

Related Reading

This Is What Covid Fraud Looks Like: Targeting Government Relief Funding

Read now

Check fraud: A mini use case 

In order to mitigate and ultimately prevent check-fraud-related risks, it’s crucial for financial intelligence and fraud teams to understand what threat actors seek, how they work, and where they operate. 

This begins, as we detail below, with intelligence into the communities, forums, and marketplaces where check fraud occurs as well as the tools that enable deep understandings, timely insights, and measurable action. 

Below is an intelligence narrative, in three acts, that tells the story of how transactions involving some of the above examples could play out.

Act I: Obtain

Threat actors are known to remove mail from individuals’ mailboxes and parcel lockers using blue box “arrow” master keys. These arrow keys are often stolen from USPS employees, which has led to numerous incidents of harassment, threats, and even violence. Generally, arrow keys are sold within illicit community chats and/or the deep and dark web, often fetching upwards of $3,000 per key.

In general, when it comes to check fraud, threat actors may sell or seek: 

  • Mailbox keys
  • Stolen checks
  • Check alteration services (physical and digital)
  • Synthetic identity provisioning
  • Drop account sharing
  • Counterfeit check creation
  • Writing a check with insufficient funds behind it
  • Insider access
A screenshot of Flashpoint’s Ignite platform, showing the results of an OCR-driven search for stolen checks.

Act II: Alter

Check alteration comes in two forms: “washing” and “cooking.” 

Washing refers to the process of altering a check by chemically removing ink and replacing the newly empty spaces with a different value, recipient name, or another fraud-enabling alteration. 

Cooking involves digitally scanning the check and altering text or values through digital means.

Act III: Monetize

Threat actors will deposit the fraudulent check and rapidly withdraw the funds from an ATM, or sell a stolen or altered check on an illicit marketplace or chat group, and then receive payment, often via cryptocurrency.

Four key elements of actionable check fraud intelligence

Financial institutions should rely on four essential intelligence-led technologies, tools, or capabilities to effectively combat check fraud.

1) Visibility and access to illicit communities and channels

To prevent check fraud, organizations should focus on a few key places. Financially motivated threat actors operate and share information on messaging apps like Telegram and other open-source channels, as well as illicit marketplaces on the deep and dark web. Therefore, it is imperative for financial intelligence and fraud teams to have access to the most relevant check fraud-related threats across the internet. 

Keep in mind, however, that accessing these communities is not always straightforward and, if done frivolously, can compromise an investigation.

2) Timeliness and curated alerting

Intelligence is often only as good as it is relevant. Flashpoint enables security and intelligence practitioners to bubble the most important, mission-critical intelligence through our real-time alerting capability, which allows users to receive notifications for keywords and phrases that relate to their mission, such as check fraud-related lingo and activity. 

Essential Reading

The Flashpoint Guide to Card Fraud for the Financial Services Sector

Read now

In addition to real-time alerts, analysts can rely on curated alerting and saved searches to track topics of long-term interest. Flashpoint Ignite enables analysts to research particular accounts and their recent activity and matches transactions to their respective ATM slips and institution address. This helps to ensure the accuracy of the information found within these communities and marketplaces before raising any alarms, as many scammers post false content. 

This approach is particularly valuable as check fraudsters often share crucial information such as preferred methodologies, social media handles, and geolocations that can aid in identifying malicious activities. In addition, by closely observing newly emerging trends, such as the evolution of pandemic relief fraud to refund fraud to check fraud, analysts can proactively develop robust preventative measures to mitigate risks before these tactics become widespread.

3) Actionable OCR and Video Search

In order to provide “material proof,” cyber threat actors will often tout and post an image of a check in a chat application or marketplace in hopes of increasing the likelihood of a successful transaction. Optical Character Recognition (OCR) technology can capture important information about check fraud attempts, since actors often share images of the fraudulent check or subsequent monetization transactions. OCR alerts are customizable with the financial institution’s name and common phrases used on checks to enhance accuracy.

Images of fraudulent checks provide valuable insights into the fraud attempt, including the check’s unique identifier, the account holder’s name, the bank’s name and address, and the endorsement signature. By analyzing these details, financial institutions and law enforcement agencies can identify patterns and leads that can help them track down the perpetrators and prevent future fraudulent activity.

Related Resource

The Risk-Reducing Power of Flashpoint Video Search

Read now

Moreover, ATM withdrawal slips can offer critical information about the transaction, such as the location of the ATM, the time of the deposit, and the type of account used. This data is useful when taking appropriate measures to prevent similar attempts and protect customers’ assets. With the help of advanced technologies like Flashpoint’s OCR, institutions can quickly extract and analyze this information to generate real-time alerts and take prompt action to prevent monetary losses.

An essential investigative component, Flashpoint’s industry-first video search technology, like its OCR capability, enables fraud and cyber threat intelligence (CTI) teams to surface logos, text, explicit content, and other critical intelligence to enhance investigations.

Combat check fraud with Flashpoint

Flashpoint delivers the intelligence that enables financial institutions to combat check fraud at scale. With timely, actionable, and accurate intelligence, financial institutions can mitigate and prevent financial loss, protect customer assets, and track down perpetrators. Get a free trial today to learn how:

  • A financial services customer detected more than $4M in illicitly marketed assets, including checks and compromised accounts, using Flashpoint’s OCR capabilities. 
  • A customer received 125 actionable alerts in a single month equated to over $15M in potentially averted losses.
  • An automated alert enabled a customer to identify a threat actor’s specific operations, saving them over $5M.

Request a demo today.

  •  

Lawrence’s List 090216

Lawrence Hoffmann // Election fraud is something I’ve mentioned here recently. The reality we must face here is that any time a digital system is used for voting there is […]

The post Lawrence’s List 090216 appeared first on Black Hills Information Security, Inc..

  •  
❌