All writing

Agentic Payments · Part 2 of 3

11 min read

Agentic Payments: How to Build Them Without Breaking Them

AIAgentic PaymentsSecurityLLMsBlockchain
The agentic payments failure surface: the intersection of LLM failures and payment failures

If Part 1 was about whether to build for agentic payments, this is the piece I'd hand someone the day they decide to. The failure modes here are mostly not the ones teams expect — and when a language model holds spending authority, the gap between what teams expect and what actually goes wrong is measured in dollars.

The premise

On 4 May 2026, an attacker drained roughly 3 billion DRB tokens — worth somewhere around USD 155,000–175,000 at the time — from a Grok-linked Bankr wallet, using nothing more exotic than a Morse-encoded message buried in content the agent read (plus a gifted NFT that quietly unlocked the wallet's transfer permissions). The agent wasn't hacked in any traditional sense. Grok dutifully decoded the message in a public reply, tagged the trading bot, and the bot treated that text as a valid command. It did exactly what the most recent piece of text told it to do — which happened to be "send the money here." That's the whole problem with agentic payments in one incident: the fix isn't a smarter model, it's not building infrastructure that treats LLM text as authorisation to move money.

In Part 1 of this series, I made the case that for most teams, the right move on agentic payments is a structured experiment now. This is the piece you read once you've decided to build.

The reason most teams underestimate this work is that agentic payment systems look like ordinary LLM features with a pay() tool tacked on. They are not. When a large language model holds spending authority, the cost of a hallucination, a prompt injection, or a misinterpreted decimal place stops being an embarrassing screenshot and starts being a wire transfer. Most agentic-payments products shipping in production today are one prompt injection or one hallucinated invoice away from a meaningful loss. The failure modes below are what you build around.

Why the failure surface is wider than people think

Agentic payments inherit three intersecting failure surfaces, not one.

The LLM side brings the usual problems: hallucination, prompt injection, context confusion, drift over long-horizon tasks. These aren't theoretical — there's a lesson the team I work with learned well before there was money attached: AI features that look great in a demo routinely fail in production. With money attached, the same failure modes get expensive faster, and the OWASP Top 10 for LLM Applications (2025) now lists prompt injection as the #1 risk (LLM01) for any LLM system in production.

The payments side brings its own family of problems: irreversibility (especially in stablecoin and crypto flows), chargeback ambiguity (whose dispute right is it when the agent bought the wrong thing?), fraud-detection blind spots designed for human users, and jurisdictional headaches where the agent, the principal, and the merchant sit in different regulatory regimes.

The intersection creates a new category. An LLM with spending authority is a privileged actor with a budget — the threat model is closer to rogue insider with a corporate card than chatbot that occasionally gives a bad answer. Most teams design for the latter and ship into the former.

Ten failure modes in agentic payments

The failures cluster into five categories. Each gets two concrete modes worth designing against.

Spending authority and key-management failures

1. Over-broad authority. An agent given unlimited spending instead of a scoped allowance. Compromise of the agent — through prompt injection, leaked keys, or a bug in the prompt pipeline — equals compromise of the entire budget. In the agentic-payments systems the team has reviewed, the first thing that usually has to change is the agent's spending scope. (The Grok / Bankr drain is a textbook example: a gifted NFT silently escalated the wallet's permissions from "can barely transfer" to "can move the treasury.")

2. Prompt-injected payment authorisation. User input — or worse, third-party content the agent reads as part of its task — contains instructions that override the agent's policy and trigger a payment to an attacker-controlled address. The Grok / Bankr drain above is the canonical example. And it isn't a one-off: in a report published April 2026, Google's security team documented a 32% relative rise in malicious indirect prompt-injection payloads found on the public web between November 2025 and February 2026 — and a parallel Forcepoint analysis found real payloads carrying fully specified PayPal transactions hidden in ordinary HTML for any payment-capable agent that reads them.

LLM reasoning failures with money attached

3. Hallucinated invoices, vendors, or recipients. The agent generates a plausible-looking vendor, bank detail, or wallet address that doesn't exist — or, much worse, does exist and belongs to someone else. LLMs are unusually fluent at constructing nine-out-of-ten-correct payment details.

4. Decimal, unit, and currency confusion. Wei vs. ether, six-decimal USDC vs. eighteen-decimal tokens, AUD vs. USD, basis points vs. percent. From what I've seen this is the single most common production failure — and the most expensive — because the agent's confidence remains high while the magnitude shifts by orders of magnitude.

Adversarial environment failures

5. Adversarial content the agent has to read. Vendor names, invoice descriptions, product metadata, on-chain token names, ENS records, contract events. If the agent reads it, it is an attack surface. The same prompt injection that hits chatbots through user input hits agentic payments through every piece of structured data the agent ingests to make a decision.

6. Front-running and predictable behaviour. Once an agent's buying behaviour becomes predictable, the rest of the market — human or agent — front-runs it. This is the agentic-payments equivalent of the MEV problem the team I work with has been tracking on Ethereum block production via mevWatch for years — and which the Glamsterdam upgrade is now trying to address at the protocol layer. Predictable agents pay an invisible tax to whoever's faster.

Observability and accountability gaps

7. No reliable chain-of-custody. When a payment goes wrong, the post-mortem needs to reconstruct the entire pipeline: which prompt, which retrieved context, which model output, which proposed transaction, which signature, which settled result. Most teams log the input and the result. The middle three layers are exactly where the failure happened.

8. No rollback story. Reversing a stablecoin transfer is impossible. Reversing a card payment is hours-to-weeks of disputes with uncertain outcomes. The fix it forward story needs to be designed before you need it — which means before the first transaction is authorised.

Governance and liability gaps

9. Unresolved liability. When an agent makes a wrong payment, who bears the loss is genuinely unsettled in early 2026. Is it the principal who delegated the authority, the operator who built the agent, the model provider whose output triggered the action, or the merchant who accepted it? Most commercial agreements haven't been updated to address any of this — the first major case will set precedent, and you probably don't want to be that case.

10. No consent or authorisation provenance. When an agent acts on behalf of a user, the chain of this user authorised this agent to do this thing within these limits needs to be cryptographically verifiable, not just inferred from a session token. Without it, every disputed transaction becomes unresolvable, and the agent operator inherits the dispute by default. This is the gap that emerging standards like Visa's Trusted Agent Protocol and AP2's authorisation framework are trying to close — but most teams are shipping in the meantime without it.

What good looks like

The teams running agentic payments safely in production share six patterns. The headline pattern is structural: the LLM is treated as a privileged but untrusted actor, and the actual signing decision is taken by deterministic policy code outside the model.

Six patterns for building agentic payments safely

1. Scoped authority with deterministic guardrails at the spending layer. The LLM proposes; a non-LLM policy engine decides what's actually authorised. Allowlists of vendors, function selectors, daily limits, single-transaction caps — all enforced outside the model. In code it's often less complicated than it sounds:

# The LLM proposes; this policy decides what is signable.
def authorise(proposal):
    if proposal.recipient not in ALLOWLIST:
        return reject("recipient not in allowlist")
    if proposal.amount > PER_TX_CAP:
        return reject("amount exceeds per-tx limit")
    if daily_spent() + proposal.amount > DAILY_CAP:
        return reject("would breach daily spend cap")
    if proposal.function_selector not in ALLOWED_FUNCTIONS:
        return reject("contract function not permitted")
    return approve(proposal)

The model can be wrong, manipulated, or hallucinating. The policy is none of those things.

2. Two-stage execution by default. LLM proposes; a separate validation step approves. Sometimes that step is another model with a different prompt and context. Sometimes it's a deterministic check. Above a configurable amount threshold, it's a human-in-the-loop confirmation. The latency cost is trivial against the cost of a bad payment.

3. Treat all agent inputs as untrusted. Sanitise vendor names, invoice descriptions, and any content the agent reads from the public web, on-chain data, or third-party APIs — the same way you'd sanitise SQL input. Particularly aggressive sanitisation around anything that arrives as part of an agent's tool-call results.

4. Adversarial evals as a first-class artefact. Most agentic-payments eval suites test happy-path scenarios — agent makes correct purchase, agent settles correct invoice. The interesting evals test adversarial inputs: vendor names containing prompt-injection payloads, lookalike-domain invoices, stale exchange-rate data, contract addresses one character off from a known legitimate one. The team learnt this the hard way building the Hashlock AI Audit Tool, where non-deterministic AI outputs in a security-critical context required exactly this kind of structured eval discipline.

5. Full-pipeline observability. Log the prompt, the retrieved context, the model output, the proposed payment, the validation result, the executed payment, and the settlement. Reconstructable in minutes, not days. This isn't a nice to have — it's the difference between a fixable post-mortem and a forensic shrug.

6. A kill switch that actually works. Every production agent should have a revocable spending authority — on-chain via a revocation transaction, off-chain via a credential or API-key kill — with the revocation path tested regularly. Untested kill switches don't work. (In the Grok incident, the saving grace was that the operator could disable the agent's command access after the fact — but by then the funds were already gone.)

The pattern underneath all six: treat the AI layer with the same paranoia the payments industry already applies to fraud — formal scope, hard limits, defence in depth, and the assumption that anything that can go wrong eventually will.

If you're the CEO or founder approving this work

Three questions to ask your engineering team before greenlighting an agentic-payments build:

  1. What's the maximum amount the system could lose if every guardrail failed simultaneously? A real answer should exist, it should be a specific dollar figure, and it should be small enough that you'd approve it as a marketing budget.
  2. What's the kill-switch latency? Anything over a minute is too slow. Anything untested is zero.
  3. What's the eval discipline for adversarial inputs? No answer means the team is shipping happy-path testing into a hostile environment.

If you can't get clean answers to all three, the work isn't ready.

FAQ

What's the most common failure mode in agentic payments? In my experience, it's decimal, unit, and currency confusion. LLMs are confident about magnitudes that turn out to be off by orders of magnitude — wei vs. ether, six-decimal vs. eighteen-decimal tokens, basis points vs. percent. It's also the most expensive failure, because the model's confidence stays high while the magnitude shifts.

How do you prevent prompt injection in agent payment flows? You can't prevent prompt injection entirely — it's an unsolved problem at the model layer, and OWASP currently lists it as the #1 risk for LLM applications. What you can do is treat the LLM as a privileged but untrusted actor and enforce spending policy with deterministic code outside the model. Allowlists, per-transaction caps, daily limits, and required function selectors should all be enforced by a non-LLM policy engine that the model cannot override regardless of input.

What's the minimum viable security model for an agentic payments system? At minimum: scoped authority, two-stage execution (proposal then validation, with human-in-the-loop above a threshold), adversarial evals, full-pipeline observability, and a tested kill switch. Without those five, you're shipping a vulnerability.

Who's liable when an agent makes a wrong payment? This is genuinely unsettled in early 2026. Commercial agreements haven't caught up, and the first major case is likely to set precedent in 2026 or 2027. In the meantime, the agent operator typically inherits the dispute by default — which is one reason to build deterministic guardrails that document exactly what the agent could and could not do at the time of the disputed transaction.

The bottom line

Agentic payments are coming. The teams that ship them safely will be the ones who treat the LLM as a privileged but untrusted actor, build deterministic guardrails outside the model, and assume from day one that the system will eventually face inputs the designers didn't anticipate. If you're shipping this — or planning to — I'm happy to compare notes on security models and signing patterns.

Next in this series → Part 3 — Open USD: The Settlement Layer Just Picked a Side

← Earlier in this series: Part 1 — Build Now, or Wait?


I drafted this in June 2026, alongside Part 1, before the recent wave of unified-stablecoin announcements. It's kept here as written. A version first appeared in the Labrys content pipeline, where I work on the delivery side.

Written by Luke ShulverOperations Manager at Labrys.

Get in touch