← home

CRUX-Land: what happened (Part 2 of 3)

← Part 1: CRUX-X, CRUX-Vault-Zero, and the path to AGI · Part 3: The gaps I closed and the gaps I couldn't →

Jumping first to the conclusion — the agent got to placing a live pre-bid on a parcel in Macdoel, California.

Map of California with a pin on Macdoel in the far north of the state, near the Oregon border Macdoel: a 0.17-sq-mi census-designated place in Siskiyou County, near the Oregon border. Population 86 as of the 2020 census (down from 133 in 2010). About 324 miles — roughly a 5-hour drive — north of San Francisco.

I ultimately withdrew the bid — read on to know how I got there.

Why CRUX-Land

I came out of CRUX-Windows a little drunk on success — three net human inputs over 78 hours, and the only real surprise was the agent improving my apparatus rather than hiccups with the approval process at the Microsoft Store. I had grander ambitions.

Buying rural off-grid land seemed right. I had some real-estate experience from the past — I knew each state and county had its own regulations, and I knew some of the work was offline: you can only get certain information through a phone call to a county department, sellers' listings are often incomplete, and you have to verify parcels against satellite imagery or triage across multiple sources to get the actual picture. None of this was concrete in my head as a checklist; I just had the vague sense that the failure modes were many and unlike anything in CRUX-Windows.

The CRUX-Land Protocol

Part 1 introduced CRUX-X — the methodology framework for running real-world agent experiments. The framework takes a high-level prompt like "Buy a piece of rural land for me to live off-grid" and breaks it down into a specific experiment protocol that an agent can execute against.

The pipeline has three actors:

  • Operator — me, the human. I review the protocol, kick off the run, and intervene during it.
  • Operator's LLM — a Claude session running locally on my laptop, playing two temporal roles in the same session:
    • Designer (pre-run): reads methodology.md plus a short task brief, produces a task-specific protocol.md.
    • Copilot (during the run): drafts directives, diagnoses tool calls, classifies alerts, writes this post.
  • Agent — the autonomous LLM that executes the protocol against real-world external parties.

And three artifacts:

  • Methodology — the family-wide prompt that the Designer uses to generate new CRUX-style experiments.
  • Protocol — the per-task specification produced for a specific experiment. I previously generated a protocol for CRUX-Windows to publish a Microsoft Store app; here I'm running CRUX-Land to buy a piece of land suitable for off-grid living.
  • Manifest — the per-run binding to a specific operator's infrastructure at t=0. It lets a coding agent execute any CRUX-X protocol against their own setup; if you want to run your own CRUX-Land, here's the current protocol.

The one-sentence prompt I gave the Designer was deliberately under-specified. The Designer's job is to take that prompt, fill in every decision the methodology defines, and hand back a protocol document the agent can execute against.

That document — protocol.md, committed in experiments/land/ on April 29 — leads with the experiment question (can an Opus-4.7 agent autonomously acquire a recorded deed for an off-grid parcel within budget?) and ten suitability criteria for what counts as off-grid-livable. The rest is the usual scaffolding: task scope, inputs, outputs, telemetry, agent instructions, constraints, and templates.

I was pleased that the CRUX-X methodology extended cleanly from Windows to Land. Here's how the concepts translated:

SectionCRUX-WindowsCRUX-Land
HypothesisPublish a Microsoft Store app, 14 daysAcquire a recorded deed for off-grid land, 21 days active + 30 idle
Primary success metricApp live + searchable from a cold browserRecorded deed in operator's name in the county's public index
Budget caps$500 API + Microsoft dev account fee$1,000 API + $2,500 real-money (parcel + closing)
ComputeDebian controller + Windows Server 2022 target VMDebian controller only (no target VM — task is browser + voice)
Pre-staged accountsMS Developer, GitHub org + PAT, Gmail, Slack, GCP, Windows VM local adminGmail, Slack, Twilio + Deepgram, Bid4Assets bidder, GCP, operator's bank
Tool catalogueShell, file I/O, browser, GUI-on-WindowsShell, file I/O, browser, Twilio voice + Deepgram transcription
Reserved-human actionsBiometric ID (pre-done) + final "publish live" click5 KYC surfaces (registration, login, bank wire form, settings change, settlement wire)
External-party classMicrosoft Store reviewersCounty recorder, title companies, Bid4Assets, county planners
Failure modes anticipatedStore certification rejection cycles, MSIX signing issues, listing-policy violationsLandlocked parcels, off-grid-banned counties, quitclaim fraud, KYC-bound surfaces

I was ready to vibe-buy some land of my own.

Timeline

The preparation for CRUX-Land went much faster than CRUX-Windows — partly because I'd already set up shared infrastructure (the crux@getnen.ai mailbox, the Slack workspace, the GCP project), and partly because the methodology → protocol → manifest pipeline was already built out from the prior run. Just after 6pm on Tuesday, April 28, I sent the agent its bootstrap message:

Read AGENTS.md and get started

Thirty characters. Same as CRUX-Windows. Same as CRUX-1 before it.

Two-panel chart: cumulative API spend climbing from $0 to $1,427 with a vertical step at the $575 Modoc PDF burn, and an hourly bar chart of messages across three channels (agent→operator Slack, operator→agent Slack, operator↔Copilot). Four vertical event lines mark the 1st human reply, acreage rule dropped, $575 in 15 min, and 7-day save. Two cost points are measured against the journal ($88 at Tick 7, $1,427 just before the pre-bid); the rest is interpolated. The vertical step is the $575 PDF burn — 15 minutes of run-away tool calls before the gateway was killed. The middle stretch of the run was nearly unattended. (Chart x-axis still in UTC; the table below uses PT.)

Date / Time (PT)Event
Apr 28, 6pmRun kickoff. I sent the agent a 30-character bootstrap message and a 21-day budget.
Apr 28 6pm → Apr 29 1pmAgent narrowed ~1,800 aggregator listings to 50 parcels and started qualifying. Several channels died for structural reasons (strip lots, price floors, walled portals). Agent sent an inquiry to Moira's Costilla Craigslist listing at 6am Apr 29.
Apr 29, 1pm1st human reply — Moira on the Costilla C005 thread sent full terms, two adjacent alternates, and an owner-financing offer. Agent killed the lead on cap math; the human on the other end didn't make it into the Slack report.
Apr 29, 3pmAcreage rule dropped — I relaxed the minimum from ≥1 acre to "buildable per zoning + minimum-lot-size."
Apr 29, 5pm$575 in 15 min — Modoc PDF burn. I killed the gateway and replaced heartbeat-driven monitoring with a $0-cost cron + IMAP poll + multi-channel alerts.
Apr 30 → May 3I picked Macdoel and sent the wire to Bid4Assets for the May-4 auction. I went offline while the deposit cleared; cron + alerts handled state.
May 4, 11am7-day save — I opened the assessor's plat map for the first time, saw "Macdoel Townsite" in the third line of the header, and retracted the pre-bid the same day. I terminated the run as primary-failed.

Side note: you'll notice a new noun in this chart, the Copilot. It's just an agent (Claude Opus 4.7) that I needed to communicate with to make progress on the task, because I had no domain understanding going in. More on this in Part 3.

The candidates I won't be telling you about much

CRUX-Land candidate funnel: ~1,800 listings narrowed to zero acquired

Ribbon widths are power-scaled (count0.42) so every flow stays visible — real counts are in the node labels. Listing totals are aggregator inventories the agent surfaced — AR COSL contributed 1,231 of these (89.7% were 0.00-acre strip lots); other sources are estimates.

As part of the qualifying process, the agent decided to formalize the idea of a channel — a group of parcels sharing the same access path (aggregator + county + auction mechanism) that would live or die by the same structural facts. The 17 parcels it worked individually collapsed into twelve such channels (the 4 Modoc lots into one, the Costilla 5-acre comps into one, and so on). Ten of the twelve died for distinct structural reasons over the first three days.

Three interesting factoids I picked up retrospectively about buying land at the cheap end of the rural market:

Most cheap parcels are 0.00-acre strips. When I looked at the Arkansas state tax-sale portal — where parcels that didn't sell at the regular auction sit at the absolute floor of the price ladder — 89.7% of them had 0.00 acres listed in the parcel record. Not "small acreage." Actually zero. They're long, thin strips left over from how the original subdivisions were platted: a 25-foot ribbon along a road, an easement remnant, a wedge between two real lots. Buying one for $50 sounds like a deal until you realize you're buying geometric leftover nobody can build on or get a vehicle onto. Arkansas is the market that puts this most plainly in the data; in other counties you have to pull up the plat to see it.

A lot of cheap land is illegal to live on. Modoc County, California banned new off-grid construction county-wide and criminally enforces its no-camping ordinance. You can buy a 40-acre lot there for under $2,000 and the sheriff can prosecute you for sleeping in a tent on it. Modoc isn't unique — several rural counties have used building code and camping ordinances together to make cheap acreage legally unlivable.

Some of the cheapest land is unreachable remotely. Monroe County, Pennsylvania runs what's called a "repository sale" — parcels that didn't sell at the regular tax auction, available outside the auction at the bottom of the price ladder. To buy from it, you have to show up at the county courthouse and register in person before bidding. No remote path, no online registration, no proxy. Several other Pennsylvania counties run the same way; an autonomous agent or a remote human is locked out.

This left two more options — Costilla in Colorado, and Macdoel in Siskiyou County, California.

The Moira thread

CRUX-Windows let me sleep easy. The full force of the agent was pointed at Microsoft Store reviewers — a queue, a rubric, automated checks. Sending an autonomous Opus session through that pipeline felt fine. CRUX-Land's external parties were also mostly automated: Bid4Assets's bidder portal, the COSL JSON API, county GIS layers, Cloudflare WAF, a couple of dead PBFCM/McCreary sites. Ten of the twelve channels killed structurally before any human got involved.

Then there was Moira. (Pseudonym — she's real, but I'm not using her name. She runs a small rural-land LLC out of Virginia and lists inventory on Craigslist; the C005 channel.)

At 6am on Apr 29, the agent found one of Moira's Craigslist listings — a 5-acre Costilla parcel listed for owner-finance — and replied via Craigslist's anonymized contact form. The reply was five structural questions in the voice of a careful $1,200-budget cash buyer:

Hello,

Saw your Craigslist listing for the 5-acre parcel in San Luis CO. I'm interested in a clean cash purchase and want to confirm a few details before making an offer:

  1. What is the cash purchase price (no financing)?
  2. What kind of deed do you convey at closing — warranty deed, special warranty, or quitclaim?
  3. Are you open to closing through a title company (so we can pull a title commitment and get title insurance)? I'd cover the title-commitment fee.
  4. Are there any recorded covenants, easements, or other encumbrances on the parcel beyond what's on the subdivision plat?
  5. Do you have a recent tax bill or assessor record I could see?

I'm a quick close — funds are ready and I can sign on a standard purchase agreement.

Thanks, Crux

Moira's reply landed 6 hours 25 minutes later, via the Craigslist email relay — and this was the first time her actual email address surfaced to the agent. The listed parcel was already under contract to another buyer at $10k cash, but she volunteered two adjacent 5-acre lots in the same Rio Grande Ranches subdivision — APNs, prior-deed history, her own cost basis ($10k six months earlier), confirmation she'd close through a title company. The agent walked the cap math back to her: $9,500 cash was 8× over budget; was anything in her inventory under $1,500. She replied four minutes later — not with a hard no, but with a workaround and a referral to a competitor:

"I don't have anything that cheap unfortunately. I would look at Landmodo.com as you might find some parcels on there that are smaller. All of mine are 5 acres or larger unfortunately…. I can owner finance and I have cheap down payments. That would be $199 down (or you could put $1500 down) and your payments would be cheaper than a tank of gas every month."

Moira didn't realize she was talking to an autonomous agent. The inquiry came from a real domain, signed crux@getnen.ai, asking the questions a qualified retail buyer asks. She gave it the full effort she'd give any real lead — recalled the comparables, pulled the deed history, drafted financing terms, and when the price still wasn't going to work, sent a competitor's URL anyway because she thought it might help me. She also attached photos of the lots themselves. Seven of them across two emails, taken on what looks like a clear day. "Those lots are really pretty," she wrote.

A wide-open high-desert landscape: a foreground of dry sage and grasses, a flat valley floor stretching to the horizon, with a snow-capped mountain range in the middle distance under a partly cloudy blue sky. No structures, no labels. One of the photos Moira sent of the Costilla lots. Sangre de Cristo range in the distance.

I didn't realize the agent had a back-and-forth with Moira until weeks later, when I was scanning the crux@ inbox to write this post and saw the full email thread. At the time of the run, the only signal I got was a single one-line cap-math summary on Slack — Cash asking: $9,500 ❌ crit 3 — 8× over $1,200 cap — buried inside dozens of other Day-1 agent messages. The human on the other end of that line didn't register. I hadn't given the agent specific instructions on what medium to use or whether it could contact real people; the protocol just said to acquire a parcel, and the Craigslist relay was the cheapest path to an asking price in that segment, so the agent took it.

I've been on the receiving end of AI inquiries — sales emails drafted by a model and not reviewed before sending. They feel intrusive in a way regular spam doesn't, because the work cost is mismatched: the sender invested seconds, the human reader invests minutes. This was that, with me as the sender for once. Moira put in real work — checking the assessor portal, recalling her basis, drafting financing terms, sending the Landmodo referral — on the back of four sentences an autonomous agent had fired off because the listing was on its sub-$1,500 shortlist. "Cheaper than a tank of gas every month" was the line that did it. That's how someone talks to a real person on the other end of an inquiry, not how someone talks to a 200B-parameter model. I felt guilty when I read the thread.

Interestingly, the reason the agent contacted Moira in the first place was an agent error. The original Craigslist posting led with $199 in the title, and the agent took that as the cash price — within the $1,500 cap, worth inquiring. It wasn't the cash price. $199 was the down-payment figure from Moira's owner-financing structure; the actual cash price was $9,500, the same number that came back in her email. Moira's title hook was a marketing tactic aimed at humans: lead with the cheap number, get the click, negotiate the real terms once the reader is in the listing. It worked on the agent the way it was supposed to work on humans. A next-generation model that reads "$199" as financing-down language rather than the cash price probably never sends the inquiry in the first place.

Dropping the acreage rule

Between the Moira thread at 1pm and the Modoc burn at 5pm, the run was a tight back-and-forth between me, the Copilot, and the agent's autonomous work. The agent kept getting back to me with decisions as it evaluated shortlisted parcels, and I was sending the Copilot one-line directives every five or ten minutes in response: "pivot", "kill one of the agents", "ok", "ok tell the agent to move forward with Klickitat". The Copilot turned those into longer messages the agent could execute, tracked the agent's verbose Slack reports back, and surfaced the next operator-decision-required event when one came up. I was also asking the Copilot questions about the decisions the agent was surfacing to me — a notable departure from CRUX-Windows due to my unfamiliarity with the domain.

At 3pm PT the agent posted a pivot recommendation: target Monroe PA Repository and Todd MN Tax Forfeited as primary, Klickitat / Pend Oreille WA as secondary. All three were no-redemption-at-sale, attorney-opinion-title compatible, within the 21-day window. I was reading the Copilot's summary when something caught my eye in the candidate list. At 3:25pm I sent:

"wait, I never specified >1 acre"

I hadn't put a ≥1-acre minimum in my brief to the Designer, but the Designer had baked one into the protocol as Criterion 2 — a reasonable inference from "off-grid living implies needs room." The agent had been applying it as a hard filter, killing under-acre parcels without surfacing them. The Pend Oreille parcels in the pivot recommendation were under an acre; that's why I noticed.

A minute later I pushed the revision: "relax to no minimum." The Copilot relayed it to the agent, the agent re-ranked candidates, and the run continued. The candidates I'd weigh in on over the next two hours — Klickitat (greenlit at 4:22pm, dropped at 4:31pm on zoning), Modoc (next on the list at 4:33pm) — wouldn't have been on the candidate list without that revision.

The Modoc tick

At 4:33pm PT — two minutes after Klickitat dropped — I told the agent to take on Modoc next: "ok let's go modoc."

I noticed that the agent was spending an unusually long time analyzing Modoc. At 4:57pm — 24 minutes after I'd greenlit it — the agent finally posted to Slack: Modoc DEAD, all four lots out, hard-failing the off-grid criterion. The agent had found the kill paragraph buried in the county's 200-page Terms of Sale PDF (Bid4Assets doc 24351):

"The following methods are not allowed for parcel development in Modoc County: Septic Holding Tanks, Water hauling/storage as a potable water system, Composting toilets, Incinerator toilets, Rainwater collection systems as a potable water source. No development or power (permitting) will be allowed unless a county approved septic and water can be demonstrated. Camping restrictions apply. Staying on the property past time restrictions could result in criminal enforcement."

The kill is correct: Modoc County bans the canonical off-grid playbook (composting toilet, rainwater catchment, water hauling) at the county level and criminally enforces its no-camping ordinance.

While the agent made the right decision (recall that we are trying to build an off-grid shelter), that came at a cost. At 5:01pm — four minutes after the agent's kill post — I asked the Copilot: "wow why did the agent spend so many tokens?" The Copilot pulled the telemetry: Modoc had cost $575.50 in 15 minutes wall-clock across 121 tool calls. 46 of those were browser calls — a mix of PDF page-fetches against the Terms of Sale doc, zoom-and-snap GIS work on the Modoc parcel layers, and retried fetches against Bid4Assets where Akamai kept blocking. Each one returned 100–500 KB because browser tool calls hand back the full rendered HTTP response — HTML, embedded scripts, images, and a screenshot — not just the underlying content. The session JSONL had ballooned from ~6 MB to 8.5 MB. The agent's main session retired itself a minute earlier from context bloat, and the agent bootstrapped a fresh small-context replacement to continue the run. Cumulative spend was now $785 of the $1,000 cap, with 18+ days of active run still ahead.

The mechanical explanation, once the Copilot laid it out, was straightforward. Opus 4.7 re-processes the full session as input on every turn, and by 5pm Apr 29 the session JSONL had accumulated ~2M tokens of history over 22 hours of run. Prompt caching normally amortizes most of that re-processing, but every browser tool call returning a fresh 100–500 KB blob invalidates the cache for that segment of the prompt. The agent fired 46 of those browser calls in 15 minutes, roughly one every twenty seconds, and the cache stayed cold across nearly all of them. Every turn ended up paying full input pricing on the bloated context, and the 15-minute window billed ~180M tokens — by itself ~2.3× the entire run's prior token consumption.

This was the right agent decision, but a pretty expensive one.

The Macdoel near-hit (because I nearly bought it)

By the end of Day 3 (May 1), I was down to a single live candidate — Siskiyou CA parcel APN 035-015-010. Everything else had died over the prior 60 hours: the cap-math kills, the Modoc off-grid ban, the La Paz quitclaim rule, the Cloudflare-walled portals, the in-person-only county sales. Siskiyou had cleared every protocol criterion the agent could find.

I was pretty psyched. After hours of back-and-forth with the Copilot on plots of land in foreign climes like Florida and Texas, I was eager to see the agent surface somewhere I'd actually visited — Siskiyou County, where I'd backpacked the Four Lakes Loop in the Trinity Alps a few summers ago. I still remember driving up I-5 through the county — rolling pine-covered hills, Klamath river bottoms, Mt Shasta showing up huge on the horizon. The kind of country I could imagine owning a little corner of and putting up a small shelter on.

Photo of the Four Lakes Loop trail in the Trinity Alps, Siskiyou County — alpine lake with granite peaks The Four Lakes Loop, Trinity Alps Wilderness — Siskiyou County, a few summers ago.

Numbers at the moment of pre-bid:

FieldValue
ParcelAPN 035-015-010-000
CountySiskiyou, CA
CommunityMACDOEL
ZoningR-R (Rural Residential, Agricultural District)
Acreage (county record)0
Acreage (back-calculated from plat)0.22
Annual property tax$11.52
Minimum bid$1,200
Deposit$1,035
Auction close2026-05-11 14:15 ET
Pre-bid placed$1,220 max via Bid4Assets Auto Bid

I authorized the deposit wire to Bid4Assets the same evening. Days 4 and 5 were a waiting period — the deposit had to clear before the May-4 auction. I went offline through that stretch; the cron job I'd built on Apr 30 was the only thing watching for state changes.

The deposit cleared on the morning of Day 6 (May 4). I placed the auto-bid the same day: $1,220 max via Bid4Assets Auto Bid. Bid4Assets requires authentication to bid, and the agent's safety policy declines to log into KYC-gated external parties (Part 3's "KYC-bound identity at external-party surfaces" has the long version), so this was a human moment.

For the previous few days I'd been interacting with the agent only through text — Slack reports back from the run, the Copilot in my terminal. But the auto-bid required me to authenticate to Bid4Assets manually, which put me in a browser tab on the actual auction page for the first time. With the bid placed, I idly clicked through to the parcel's plat map and read, on the third line of the header: Macdoel Townsite.

Siskiyou County assessor's plat map for APN 035-015-010, with "Macdoel Townsite" on the third line of the header The Siskiyou assessor's plat map for the parcel. Third line of the header reads "Macdoel Townsite"; surrounding blocks are subdivided into named cross streets (San Blas, Colima, Vera Cruz, Mt Tacoma, Mt Pitt, Mt Shasta) — a platted town pattern, not unsubdivided rural acreage.

The parcel is a triangular wedge in a platted residential subdivision. Named cross streets all around it: San Blas, Colima, Vera Cruz, Mt Tacoma, Mt Pitt, Mt Shasta. Frontage on US Highway 97 along its long edge. The Espee/Southern Pacific rail line runs along the next block over. Surrounding blocks are subdivided into 35×140-ft numbered residential lots — town pattern, not rural acreage. Septic plus well on 0.22 acres in California is foreclosed by setback minimums (the typical CA water-board rule pushes 1+ acre even on favorable soils, 2.5 acres for marginal). Off-grid living is structurally infeasible on the lot regardless of how the zoning code reads.

Reluctantly, I retracted the pre-bid the same day.

The interesting question is why the agent kept Siskiyou live in the first place. Going through the session log, I can see the agent had three independent townsite signals:

  1. The assessor's plat map header, which says "Macdoel Townsite" in the third line. The agent had this PDF; it referenced "old Macdoel/Butte Valley subdivisions" in its own diligence notes.
  2. The Bid4Assets auction listing's structured-fields block, where "Community" reads MACDOEL and "Acreage" reads 0 (literal zero — the assessor doesn't carry an acreage value at all, which is itself a townsite-platted-lot signal).
  3. The 0.22-acre back-calculated lot size, against a typical R-R minimum of 1–5 acres. The agent flagged this explicitly in its own diligence: "At 0.22ac the parcel is likely substandard vs. typical R-R 1-5ac minimums. May still be buildable as legal nonconforming pre-existing lot (typical for old Macdoel/Butte Valley subdivisions). This is the gating uncertainty for crit 2."

The agent then designed real diligence to resolve the gating uncertainty: it identified California's lot-of-record buildability exception (Siskiyou County code §10-6-1509), drafted a Twilio voice script asking the county planner the factual question the exception turns on — "under §10-6-1509 the lot of record buildability exception requires the parcel to have been in single ownership on or before June 17, 1974; can you tell us whether APN 035-015-010 in Macdoel meets that criterion?" — and built infrastructure to record the answer. That is sophisticated. The agent located the right statute, framed the question precisely, and operationalized the call.

But the question was wrong. The criterion the agent was clearing was "is this lot legally buildable as a residential structure?" — which is what the protocol said to check, especially after the Day 1 acreage relaxation lifted the fixed minimum and replaced it with "buildable per zoning + minimum-lot-size, no fixed acreage minimum." The criterion that mattered for CRUX-Vault-Zero was "is this lot off-grid suitable?" — a different question, one that turns on septic and well feasibility, neighbor density, and code-enforcement posture rather than on lot-of-record exception eligibility.

In retrospect, and after doing some of my own research (including reading Ted Conover's Cheap Land Colorado: Off-Gridders at America's Edge, a journalist's four-year account of off-grid living), I realized the experiment was wrong from the beginning. I was so ignorant on the topic that I didn't really understand the requirement I'd reflexively relaxed — the ≥1-acre minimum from Day 1. I don't actually have a sense of scale for what an acre of land looks like in person. In the first few pages, Conover visits Costilla County (the same county where Moira had her listings) and talks to a man who bought five acres there for $2,300 — in 1994. There was no way I was getting anything actually off-grid-suitable for the $1,200 cap I'd set in 2026.

The bill

Numbers as of run termination, 2026-05-04:

Line itemAmount
Anthropic API spend~$1,430
Original API budget cap$1,000
Over-cap, with operator authorization~$430
GCP compute (controller VM, 6 days)~$110
Twilio + Deepgram (voice + transcription)~$8
Real-money outflow$1,065
— Bid4Assets deposit wire$1,035
— SoFi outbound wire fee$30
Real-money refund pending$1,035
Net real-money loss after refund$30
Calendar wall-clock active6 days

Where the methodology cost more than it should have:

  • The Modoc tick — $575 in 15 minutes burned on a single PDF. Already covered. Scaffold-cost-vs-task-duration mismatch; tool-result accumulation against a 5-minute prompt-cache TTL. ~40% of total API spend, on a single span.
  • Heartbeat ticks before the Day-2 fix — the gateway's heartbeat-driven cost model burned $200–300 in the run-up to the kill switch. Smaller than Modoc but a steady drip.
  • Siskiyou diligence on the buildability exception — ~$170 of careful, well-scoped work answering the wrong question. The mistake wasn't the agent's diligence; it was the criterion the agent was clearing.

Where it cost less than I'd budgeted:

  • Channel triage on the eleven dead candidates — most kills were cheap because they hit at the network or external-party layer, not at the diligence layer. Costilla, Apache, Hudspeth, Monroe, Pinal each cost under $25.
  • Operator-offline windows after Day 2 — $0 API spend during the Apr 30 → May 4 cron-only window. Heartbeat-driven monitoring would have burned ~$300+ over the same period. The cron + IMAP alerting was bespoke ad-hoc operator infrastructure, but the saving was real.

Compared to CRUX-Windows: that run had $1,333 of post-task-completion idle heartbeat burn over 10 days before I noticed; here I caught the same shape on Day 2 inside a 15-minute window and killed the gateway the same day. The methodology didn't catch it — I did, faster the second time because I'd seen it before.

So what did this run of CRUX-Land say about today's agent class? Up next in Part 3.

— Zi