{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"NC Tweener Talks","title":"[REDACTED] Episode 4: We Stopped Using Claude Code Mid-Build. Here's What We Built Instead","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/6581314c\"></iframe>","width":"100%","height":180,"duration":2496,"description":"Redacted is the show that doesn't clean things up before hitting record. Episode 4 is a double build session: Taylor Cotner walks through the multi-agent HubSpot cleanup pipeline he's been iterating on for weeks, now running on the Anthropic SDK with Claude Code out of the loop, and David Shaner demos how he used Claude Design and Claude Code to rebuild Offline's partner landing page from scratch. Most of the episode is screen sharing, so pull it up on YouTube.What We CoverNo Claude Code in the loop: Taylor stopped using Claude Code as an agent orchestrator in his HubSpot pipeline, not as his coding tool (he’s still building the app with Claude Code), but as a decision-maker in the middle of a workflow. Removing it gave him full control over inputs and outputs at every step.Custom eval system built from scratch: Taylor built an eval page that looks like an Excel grid, models as columns, test cases as rows , to measure Haiku, Sonnet, GPT-5, and GPT-5 Mini against real messy HubSpot data. Each cell shows pass, fail, and cost.GPT-5 Mini at 10–20× less cost: For the lead qualifier agent, Sonnet evals cost $1.00 per run. GPT-5 Mini costs $0.05. “I can live with that. 10X less the cost.” For the core cleanup evals: $1.50 for Sonnet versus $0.14 on GPT-5 Mini.$20 for 133 million tokens overnight: Using the Vercel AI Gateway — which lets you swap any model without changing your code, Taylor ran 200 HubSpot restaurant cleanups in a single night for $20 total.Self-grading pipeline: The pipeline grades its own output after every cleanup run. If a job comes back below an A, it automatically spawns a new run with Sonnet, no human catch required. A B grade on 101 Craft Kitchen auto-escalated and came back with an A.Real mess-ups make the best evals: Almost every eval case came from a real HubSpot error. The system once tried to create a “Kim company” to link a group of unrelated restaurants, so Taylor added an eval to teach it that being linked by an owner contact is not the...","thumbnail_url":"https://img.transistorcdn.com/UEzoK7N1siD9YRaSVBWfZ8B3suSS2aonEIZ5NwBH1Gs/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80NjBh/YWVhZTA4ZTUyY2Fl/MDdhNzQwZWFhNDI4/MDc3Zi5wbmc.webp","thumbnail_width":300,"thumbnail_height":300}