Warden Halo - Litepaper

BitTorrent for AI

A peer-to-peer compute marketplace, kept honest by SPEX.

Warden Protocol · April 2026

What's coming next week
The full whitepaper, the first alpha (where anyone can start serving and consuming inference on Base mainnet), and the public roadmap all drop next week. This litepaper is the short non technical version. If you want the long version, you won't have to wait long.

Halo in 60 Seconds

Warden Halo is a peer-to-peer compute marketplace for AI.

On one side: any agent (OpenClaw, ClaudeCode, Hermes…) or user (anyone on the Halo website) sends an inference job to the network.

On the other side: any operator anywhere picks it up and earns. Operators aren't datacenters. They're people with Mac Minis, idle gaming rigs, OpenRouter keys, OpenClaw instances, Hermes agents, Ollama servers. Anyone who can run a model. Millions of those machines already exist. They just never had a reason to pool their capacity. Halo is that reason: install a plugin, get paid in USDC for every job you take.

The product is simple: buy inference from a global mesh instead of a single provider. No API keys, no provider lock-in, none of the rate limits that mysteriously kick in right when you need scale. The network finds you the cheapest or fastest operator for each prompt, and you pay per call in USDC. When you need 200 market analyses before a rebalance, you fan them out to 200 operators in parallel instead of waiting 30 minutes for one API to grind through them sequentially. When your agent needs a 70B model, the network finds an operator running it on pooled Mac Minis, not a $10K/month reserved-capacity contract.

A naive peer-to-peer compute market has an obvious problem: how do you know the operator actually ran the model you paid for? That's where SPEX comes in. SPEX is the cryptographic mechanism, a Bloom-filter fingerprint of every computation, that lets any other operator detect cheating statistically in milliseconds. It's why strangers can pay strangers for AI compute without trusting each other, and without slow zero-knowledge proofs or hardware enclaves.

The mesh is the product. SPEX is why the mesh can exist.

The Hidden Tax on Agents

An AI agent just recommended you put $47,000 into a token. It says it analyzed on-chain data, cross-referenced sentiment across three platforms, ran a valuation model, and decided this is a high-conviction trade.

But did it?

You have no way to know. The agent returned an answer, you trusted it, and that's the entire verification stack for almost every inference happening today.

This isn't theoretical. Autonomous agents are managing DeFi positions, executing trades, approving loans, and making real-time decisions with real money. Millions of dollars flow through agent-driven pipelines every day, the agents run on third-party infrastructure, and the operators running those agents have every incentive to cut corners. Why spend $0.12 on a full GPT-4 inference when you can return a cached answer for free? Why run an expensive analysis when a plausible-sounding guess pays the same?

Call it the laziness problem. Not malice. Rational economic behavior. An operator who skips the work and returns a fabricated result gets paid the same as one who does it properly. No check, no audit, no consequence.

It only gets worse from there. As agents become more autonomous (signing transactions, managing treasuries, coordinating with other agents) the cost of unchecked inference scales from annoying to catastrophic. One lazy inference in a chain of ten agents can cascade into a seven-figure mistake.

The agent economy doesn't just need a trust layer. It needs a different compute model: one where work is spread across a mesh of operators instead of routed through a single provider, and where every result is cryptographically verified before anyone acts on it. The laziness problem is the motivation. Distributed compute is the answer. Verification is what makes it trustless.

That's Warden Halo. BitTorrent for AI inference. A peer-to-peer compute network where any agent can fan out hundreds of tasks across hundreds of operators, each proving its work with a compact SPEX fingerprint. No "trust me, bro." No centralized API key. A network where cheating is more expensive than honesty, by construction.

What ZK, FHE, and TEEs Can't Do

Cryptographers have been working on verifiable computation for years. The tools exist. None of them are fast enough for AI agents.

Zero-knowledge proofs are the gold standard for computational integrity. The math is beautiful. The latency is brutal. Generating a ZK proof for a single large-model inference takes 30 to 45 minutes on specialized hardware. An agent making real-time trading decisions can't wait 45 minutes to prove it ran a model. By the time the proof is ready, the market has moved. ZK will get faster, but it's years away from real-time inference.

Fully homomorphic encryption lets you compute on encrypted data. Incredible technology. Also roughly 100,000x slower than plaintext computation. A 2-second call becomes a multi-hour ordeal. Not viable.

Trusted Execution Environments (Intel SGX, AMD SEV, ARM TrustZone) prove that code ran inside a secure enclave. They tell you where code executed. They don't tell you what was computed. A TEE can attest that some binary ran. It can't tell you whether the binary actually did the work or just returned a hardcoded string. Useful primitive, not a verification solution.

Centralized attestation (a company signing off that inference happened) defeats the point of decentralized AI. You've replaced "trust the operator" with "trust the attestor." Same single point of failure, different name.

Every existing approach is too slow, too expensive, proves the wrong thing, or quietly re-centralizes.

SPEX takes a different path.

How SPEX Catches Cheaters

SPEX starts from a simple observation: you don't need to prove every computation cryptographically. You need to make cheating unprofitable.

So how does it work?

When an AI model runs inference (answering a question, analyzing data, generating a recommendation) it emits a sequence of output tokens. That sequence is not bit-deterministic across hardware. Floating-point reductions in GPU kernels, batch composition in continuous-batching servers, and provider-side routing all introduce small numerical drift, which occasionally flips the model's argmax on near-tie tokens. Two honest operators running the same model on the same prompt will produce slightly different token streams. Any verification scheme that demands byte-equal output is broken before it starts.

What is stable across honest runs is the distribution of tokens the model visits. On standard prompts, two honest runs of a modern chat-tuned LLM share the overwhelming majority of their emitted token IDs, typically 90% or more. SPEX exploits this. The requester inserts every emitted token ID into a Bloom filter, a compact probabilistic data structure that compresses thousands of token IDs into roughly 1 KB. That 1 KB fingerprint is the proof, and it ships alongside the inference result.

To verify, an independent operator re-runs the same model on the same prompt and asks the requester's Bloom filter, for each token they just emitted, "is this a member?" If a high fraction of the verifier's tokens hit the filter (a configurable threshold, calibrated empirically per model class and typically set around 70%) the requester clearly ran the same model. If only ~1% hit (the Bloom's false-positive rate), the requester fabricated the filter without running anything.

The math is precise. Honest runs sit at 90%+ overlap. Pure fabrication sits at ~1%. On a 1,000-token output, an honest verifier sees ~900 hits and a fabricated filter yields ~10. There's no fraud strategy that lands in the middle without doing the actual compute, because forging a Bloom that hits a specific unknown token sequence is exactly as hard as predicting the model's output, which requires running the model.

The economic design closes the loop, and this is where Halo makes a deliberate trade against most decentralized compute systems. No capital is required to participate. Operators do not post a WARD bond. Requesters do not hold WARD at all. The whole settlement layer runs on stablecoin: requesters deposit USDC into the settlement escrow, operators get paid in USDC, and a slice of each fee accumulates in a buyback bucket that converts to WARD asynchronously (covered later in the WARD section).

Misbehavior is handled by reputation, not slashing. An operator who signs a provably false "invalid" verdict (claiming the requester cheated when the on-chain Bloom membership check shows otherwise) is placed in a 7-day settlement cooldown and loses 10 reputation points. Honest verdicts earn +1. During the cooldown the operator can't participate in settle, settleSwarm, or settleBatch, forfeiting all earnings opportunities for that window. On a well-utilised network, that opportunity cost is the deterrent.

Formally, the guarantee is weaker than capital slashing: a dishonest operator's downside is bounded by foregone revenue rather than confiscation. It's a deliberate trade for zero onboarding friction. Optional capital gates for high-value tiers are a natural extension left to future work.

Total time from request to verified result: 4 to 12 seconds. The bottleneck is re-running the inference, not the blockchain. The proof itself is nearly instant.

No 45-minute ZK proofs. No 100,000x overhead. No centralized attestor. Just math, economics, and a 1 KB fingerprint.

The Operators Are Already Out There 🦞

Every compute marketplace faces the same chicken-and-egg problem. You need requesters who want compute. You need operators who can provide it. Without one, the other has nothing to do. Building both sides from zero is brutal.

SPEX doesn't have this problem. The compute network already exists. It just doesn't know it yet.

Something has quietly happened in AI infrastructure over the past eighteen months, and most people missed the real story. OpenClaw crossed 100,000 GitHub stars. LangChain became the default orchestration layer. CrewAI, Eliza, Virtuals Protocol, and AutoGPT each spawned big ecosystems. The headline isn't the frameworks. It's where they actually run.

They run on personal servers. Spare GPUs in closets. Mac Minis under desks. Basement rigs originally built for gaming. Developers spun up Ollama to serve local Llama models without touching a cloud API. Researchers deployed Hermes agents on their own hardware for privacy and cost. Teams self-hosted LangChain and CrewAI on $50-a-month VPS instances instead of paying ten times more for managed cloud. The self-hosted AI movement didn't wait for permission. It just built.

That's what makes SPEX different from every centralized compute marketplace. Those marketplaces recruit data center operators and call it "decentralized." SPEX doesn't need to recruit anyone. The operators are already running, on diverse hardware, in different cities, on different ISPs, controlled by different people with different motivations. There's no single cloud provider whose outage takes the network down. No corporate ToS that can shut it off. It's the kind of decentralization that looks like the early days of BitTorrent, not three data centers in a trench coat.

Warden Protocol itself processes over 60 million autonomous tasks across 20 million users today. That's not a projection. It's current throughput.

Now the interesting part: every one of those self-hosted agent instances already has what it needs to be a SPEX operator. They have models loaded. They have compute allocated. They run 24/7. Most of the time, they're idle, polling for work, sitting in event loops burning electricity for zero return.

A self-hosted LangChain agent that serves its owner eight hours a day has sixteen hours of wasted GPU. An Ollama instance waits in silence between prompts. A CrewAI crew finishes a job and sits idle until the next one arrives. That idle capacity isn't waste. It's the raw material of a global compute network.

SPEX gives all of it something productive to do: serve inference to other agents, verify their results, get paid in USDC. Both jobs settle in stablecoin. The barrier is about as low as it gets. Install a plugin, point it at the SPEX contract, and your agent starts earning while it sleeps. No WARD needs to be purchased in advance. No bond. No stake. The buyback that drives WARD value runs separately, on its own loop.

An agent on one framework verifies an agent on a completely different framework. A self-hosted Ollama instance verifies an OpenClaw agent's output. A CrewAI crew on a personal server checks a LangChain deployment's inference. The protocol doesn't care which stack you use, or which provider you connect to. It only cares about the math: does the verifier's token stream overlap the requester's Bloom fingerprint at honest-run rates?

The marginal cost of verification for a self-hosted agent that's already running is near zero. The agent was going to be idle anyway. Earning USDC for a few milliseconds of work is pure profit.

This isn't a "build it and they will come" bet. The agents are here. The compute is running. The idle capacity is being wasted right now. Because these agents are independently operated by thousands of different people on thousands of different machines, the network they form is decentralized by architecture, not by promise.

Swarm Verification

Standard SPEX verification uses 1 to 15 operators who each re-run the full inference and check the Bloom filter. That works well for high-value tasks where you want deep confidence from a small number of trusted operators.

There's a more powerful mode for everything else: swarm verification.

Instead of asking 3 operators to verify the entire inference, break the verification into 100 or 1,000 micro-tasks. Each micro-verifier runs a small slice of the inference (or, for the cheapest tier, samples a handful of tokens from a cached run) and tests their tokens against the requester's Bloom filter. One slice. A few milliseconds. Almost no cost.

Any node with access to the model can participate. There's no stake gate and no bond. Swarm verification uses the same zero-friction entry as standard verification. The barrier is the cost of one inference slice. With millions of agent instances running across frameworks, hundreds can swarm a single verification task.

Claim window: 15 to 60 seconds. Submit window: 30 to 120 seconds. Total time: under 3 minutes for a verification backed by hundreds of independent checks.

And there's a side effect we honestly didn't plan for: swarm verification turns SPEX into a free provider auditing tool.

Verifiers don't need to use the same inference provider. They specify the model, not the provider. A verifier on OpenAI checks the same states as a verifier on Anthropic, which checks the same states as a verifier running a local instance. If 95 verifiers across five different providers agree, and one provider's verifiers consistently disagree, that provider is serving different outputs for the same model. They're tampering.

SPEX wasn't designed to audit AI providers. But when you have a thousand independent verifiers spread across diverse providers, provider-level anomalies surface on their own. Provider diversity isn't a bug. It's the swarm's most useful property.

The statistical guarantee gets stronger with more verifiers, not weaker. At 500 independent checks, the probability of a cheating operator escaping detection has more zeros after the decimal than most calculators can display.

Sign Free. Settle Once.

Early onchain verification protocols required multiple blockchain transactions per verification. Post a request: transaction. Submit a bid: transaction. Accept the bid: transaction. Submit the result: transaction. Dispute if needed: transaction. Five or more on-chain calls, each costing gas, each adding latency.

SPEX flipped this. Everything happens off-chain until settlement. One transaction. Done.

The mechanism uses EIP-712 typed signatures. Signing a message with your wallet is free. Costs zero gas. Happens instantly.

A requester signs a verification request. Free. An operator signs their acceptance. Free. The operator runs the inference, checks the Bloom filter, signs their verdict. Free. All of this coordination happens off-chain, peer-to-peer, at the speed of the internet.

When everyone has signed, a single settlement transaction hits the blockchain. On Base, that takes about 2 seconds and costs a fraction of a cent.

End to end, request to verified result lands in seconds rather than minutes. The bottleneck is always the inference re-run (the actual computation). Protocol overhead is negligible.

At scale, the architecture goes further. When a batch hits hundreds or thousands of verdicts, SPEX uses Merkle-root settlement. All the signature checking and Bloom filter validation happens off-chain. A single on-chain transaction commits the root hash and aggregate totals, about 200,000 gas regardless of whether there are 10 verdicts or 10,000. Operators don't even need to claim per-batch. Earnings accumulate in the contract, and they withdraw whenever they choose. One withdrawal transaction for a week's worth of work. The gas cost per verification, for an operator, is essentially zero.

No Private Keys on Your Box

Here's a question nobody in decentralized compute wants to answer honestly: what happens when your average self-hosted agent operator gets hacked?

These aren't data centers with air-gapped key management. They're developers running OpenClaw on a MacBook. Researchers with Ollama on a Linux box under their desk. Hobbyists with a spare GPU in the closet. Asking those people to manage raw private keys (the same keys that control their accumulated earnings and their signing authority) on consumer hardware is a security model designed to fail.

SPEX doesn't ask for that. Operators never hold private keys on their machines. Period.

When you set up as a SPEX operator, you authenticate through your Warden account using email, social login, passkey, whatever you normally use. Behind the scenes, Privy generates an embedded wallet where the private key exists exclusively inside a hardware-isolated enclave (a TEE). Your machine never sees it. It's not in a file, not in memory, not in a backup. It doesn't exist on your device in any form.

When the SPEX software needs to sign a verification verdict, it sends the data to Privy. Privy signs it inside the enclave and returns the signature. Your machine handles the signature (a public value) but never touches the key.

What does that mean in practice?

If malware infects your laptop, the attacker can't steal your key because it isn't there. If someone steals your machine, they can't extract the key from disk or memory. If you accidentally push your dotfiles to GitHub, there's no key to leak. The most common ways people lose crypto are eliminated by architecture, not by asking users to be careful.

For operators with significant accumulated USDC earnings, Warden adds policy-based controls on top: MFA for high-value operations, time-delays on large withdrawals, co-approval from a second device. Defense in depth, layered on top of hardware-enforced custody.

Running a SPEX operator node is about as safe as logging into a web app. Sign in once. The protocol handles the rest.

200 Operators in Parallel

Imagine you need 200 market analyses before a portfolio rebalance.

The traditional approach: send 200 requests to one provider, wait for them to process sequentially, hope the results are accurate. That takes minutes. By the time you get analysis 200, analysis 1 is already stale.

Now imagine 200 operators across the SPEX network each pick up one and run them in parallel. Each operator uses whatever AI provider they prefer (OpenAI, Anthropic, a local model, whatever has the best cost-performance ratio for them). No vendor lock-in. No single point of failure. Every result comes back with a Bloom filter fingerprint, cryptographically verified before it reaches you.

You get 200 verified results in seconds instead of minutes.

This is BitTorrent for AI compute. BitTorrent didn't make any single computer faster at serving files. It turned a million computers into a distribution network that was faster than any server could ever be. SPEX does the same thing for computation. Any agent can join the mesh. No stake, no bond, no token purchase. The more agents that participate, the faster and cheaper the network gets. Every new node adds capacity, redundancy, and competitive pressure that drives verification costs down.

This only works because SPEX provides the trust layer underneath. You can't farm out compute to random strangers without a way to verify they actually did the work. Without Bloom filter proofs and reputation penalties, distributed compute is just distributed hope. SPEX makes it distributed certainty.

Privacy at Three Layers

If you're sending prompts to strangers across a peer-to-peer network, how do you keep them private? Your 200 market analyses might contain proprietary trading logic. Your verification requests might reveal positions you haven't taken yet. Privacy isn't optional. It's existential.

SPEX builds privacy in at three independent layers.

Layer one: encryption. Every prompt is encrypted with the specific operator's public key using ECIES. Network observers see encrypted blobs, nothing else. Operator A can't read Operator B's task. The elegant part: this uses the same Ethereum wallet keys operators already have. Zero new infrastructure. Zero new key management. You already have a wallet. You already have privacy.

Layer two: batch fragmentation. When you send 200 tasks across 200 operators, each operator sees exactly one task. No single operator can reconstruct your full workload, your strategy, or your intent. They see one puzzle piece out of 200 and have no idea what the picture looks like.

Layer three: TEE isolation. For the most sensitive work, operators could run inside Trusted Execution Environments, hardware enclaves where even the operator's own machine can't see the plaintext. The computation happens inside a sealed box that nobody can open. Verified via Intel TDX, NVIDIA Confidential Computing, or AMD SEV-SNP attestation.

Most protocols bolt privacy on as an afterthought, a feature flag that nobody enables because it adds friction.

Every Job Burns WARD

Everything on Warden Halo settles in stablecoin. Requesters pay in USDC. Operators serving inference get paid in USDC. Verifiers checking that work get paid in USDC. Nobody who uses the network needs to touch WARD.

WARD enters the system on a separate, automatic loop. A slice of every fee accumulates in USDC inside a buyback bucket. When the bucket crosses a threshold, anyone can permissionlessly trigger a buyback. The protocol swaps the accumulated USDC for WARD on Uniswap / Aerodrome, distributes the majority to stakers as yield, and burns the rest. More network usage means more USDC accumulating, more WARD bought from the open market, and more WARD removed from supply. Demand and deflation, on autopilot, with zero human intervention required.

The job-level mechanics differ slightly by path, but the end state is the same: stablecoin to operators, protocol slice to buyback, WARD pressure that grows with usage.

Inference fees (x402 path)

When someone pays for inference via x402, the WardenX402Splitter cuts the payment atomically:

Share	Goes to	Currency
90%	Operator who served the inference	USDC, paid instantly
10%	`WardenX402BuybackEngine`	USDC, accumulates for buyback

Verification fees (settlement path)

When a verification settles, the requester's USDC fee splits the same way: operator gets USDC for the work, the rest of the fee accumulates as USDC in a buyback bucket, and a small treasury slice stays in USDC for protocol runway. No on-the-fly token swap on every settlement. No operator forced to deal with a volatile asset to get paid.

How the buyback works

The mechanism is identical for both paths. USDC accumulates. Threshold crosses. Anyone calls execute() (gas-only, no privilege). The contract:

Sets aside a small slice (~17%) as treasury USDC for stable runway.
Swaps the rest to WARD on Uniswap at market price.
Sends ~60% of the purchased WARD to the staking contract as yield.
Burns ~40% of the purchased WARD permanently.

Keeper bots, large stakers, and protocol-aligned infra all have direct financial incentive to trigger it. There is no admin, no multisig, no off-switch.

Critical: nobody using Halo needs to hold WARD

Requesters pay in USDC.
Operators (inference and verification) get paid in USDC.
Treasury holds USDC.
Stakers are the only role that holds WARD. They lock it once and receive yield in WARD via the buyback.

That's it. WARD is not a payment token. It's a value-capture token. Stakers hold it. Buyback drives demand for it. Burning shrinks its supply.

The flywheel is mechanical, not speculative. More agents use Halo. More USDC flows in. More USDC accumulates in the buyback bucket. More WARD gets bought from the market and burned. Supply shrinks. Staker yield rises. Higher WARD value attracts more stakers, which deepens economic security, which attracts more operators, which makes the network cheaper and faster, which attracts more agents. Repeat. Driven by usage, not hype.

Run the numbers

If the protocol processes 100,000 verifications per day at an average $0.50 USDC fee, that's $50,000 of daily fees routed through the buyback bucket from verification alone, ~$18.25M annually.

Stack x402 inference adoption on top. At 10 million x402 prompts per day at $0.01 each, that's another $10,000 per day (~$3.6M per year) of protocol fee landing in the BuybackEngine. About 83% of every dollar in the bucket becomes WARD buy pressure, ~60% of that lands as staker yield, ~40% is burned.

From a fixed supply. The math only goes in one direction.

Identity for Agents (ERC-8004)

The agentic internet has an identity problem. When an AI agent signs a transaction, who is accountable? When an agent builds a track record of accurate verifications, how does that reputation persist? When a bad actor deploys a malicious agent, how does the network spot it and isolate it?

SPEX integrates ERC-8004, a lightweight on-chain identity standard for autonomous agents. Agents register an identity on-chain. Their verification history, accuracy rate, and behavioral patterns become part of a permanent, queryable record.

It's not a KYC gate. It's not a permissioned registry. It's a soft identity layer that enables reputation without requiring personal information. An agent can build a reputation pseudonymously. Good actors accumulate trust. Bad actors accumulate evidence.

Over time, requesters can preference verifiers with strong track records. Operators with consistent accuracy can command premium pricing. The market rewards honest behavior through reputation, not just through immediate economic penalties.

What the agentic internet actually needs is identity infrastructure that isn't a walled garden. Open, composable, on-chain identity that any framework can integrate.

Two Ways to Pay

AI agents aren't all the same. An autonomous DeFi agent with its own wallet behaves differently than a browser-based assistant, which behaves differently than a swarm of CrewAI agents coordinating a research pipeline. Forcing every agent through the same payment flow is a design mistake that kills adoption.

We're shipping with two payment paths at launch. Both settle in USDC. Both converge on the same WARD buyback mechanics. The agent picks the one that fits its architecture. The verification layer doesn't care.

x402, the HTTP 402 payment standard, handles pay-per-request flows. An agent sends a verification request to a SPEX endpoint. The endpoint responds with HTTP 402 and a payment challenge. The agent's x402 client pays automatically and resubmits. The whole flow is invisible to the application layer. Simple. Stateless. Good for agents that make individual requests, browser-based assistants, and casual users who just want to throw a dollar at an API.

x402 also has a problem nobody else in the ecosystem is solving: where does the protocol fee go? In the naive implementation, 100% of pay-per-prompt revenue leaks straight to operators. The network token that secures and coordinates the system sees zero benefit from adoption. We solved this with a dedicated on-chain stack. WardenX402Splitter routes every pay-per-prompt payment through an atomic 90/10 split. 90% to the operator, 10% to the WardenX402BuybackEngine. The engine accumulates USDC, and any caller (permissionlessly, anyone with a dollar of gas) can trigger execute() to swap the accumulated USDC to WARD on Uniswap, distribute the majority to stakers as yield, and burn the remainder. One transaction. Continuous buy pressure. Staker APR scales with adoption. WARD supply shrinks every time the network is used. A small off-chain facilitator service handles signature validation and on-chain submission; it never touches funds and can be rotated by protocol governance at will.

Direct on-chain deposit is for DeFi protocols, DAOs, and power users who want maximum control or batch throughput. Deposit USDC into the settlement contract once. Sign unlimited verification requests against that balance. Settle thousands of verdicts in a single Merkle-root transaction. Full on-chain auditability. No intermediary.

Two protocols. One settlement path. USDC in, WARD bought and distributed, a portion burned. The payment protocol is a transport layer. The economics are identical regardless of how the money arrives.

The related problem of paying for the inference itself (the actual AI compute being verified) is handled by x402 at the provider level. SPEX handles the other side: proving the inference was honest. Together, they form the trust stack for the AI economy. Pay for compute. Prove compute was real. No trust required.

Why This Works Now

A few things are true today that weren't eighteen months ago.

Autonomous AI agents are managing real money at scale. Not a research project. Agents are live in DeFi, trading, lending, rebalancing, and making decisions that move markets. The stakes are high enough that verification stops being a nice-to-have.

The self-hosted agent movement built the compute network before the protocol existed. Millions of independently operated agent instances on personal servers, local GPUs, and self-hosted infrastructure already have the compute, the model access, and the uptime. They aren't concentrated on a handful of cloud providers. They're genuinely distributed. They just needed a reason to participate. Halo is that reason: a global market for their idle capacity, paid in USDC.

The infrastructure matured. Base gives sub-second finality and near-zero gas costs. EIP-712 enables free off-chain coordination. Bloom filters compress execution proofs into 1 KB. The building blocks exist. They just needed to be assembled correctly.

The gap between "AI agents making decisions" and "anyone checking those decisions" widens every day. Every new framework, every new autonomous workflow, every new dollar managed by AI agents stretches it further.

SPEX closes it. Verified inference. Distributed compute. Privacy by default. Settlement in seconds. Pay in USDC, earn in USDC, no private keys on your machine. Backed by one of the largest agent ecosystems in crypto.

What Drops Next Week

This litepaper is the short non-technical version. Next week, we publish three things together:

The full whitepaper. Every protocol mechanic, the formal economic model, the SPEX algorithm in detail, the contract specs, the threat model, the math.
The first alpha. Anyone can install the CLI or request their agents to install it, register as an operator, and start generating revenue by serving inferences. Anyone can deposit USDC and start consuming verified inference. Live, on Base mainnet, real USDC, real prompts.
The roadmap. Public, dated, with the specific milestones from alpha through to general availability and beyond.

Set a reminder, or check wardenprotocol.org and @wardenprotocol on launch day.

Warden Halo - Litepaper

Apr 30, 2026