Back to Articles
Jan 26, 20263 weeks ago

LLMs Eat Scaffolding for Breakfast

NB
Nicolas Bustamante@nicbstme

AI Summary

This article offers a visceral, insider's perspective on the breakneck pace of AI development, arguing that the most sophisticated engineering work today is often a temporary scaffold destined for rapid obsolescence. It's a crucial read for anyone building with AI, explaining why teams must celebrate deleting code and how to navigate a field where best practices have a shelf life of months. The author's core argument is that Large Language Models (LLMs) are rapidly ingesting the external "scaffolding"—complex code, pipelines, and frameworks—built to compensate for their limitations. As models grow more capable, this scaffolding becomes technical debt. Success in AI therefore depends on building minimal, adaptable systems with the foresight that today's essential infrastructure will be obsolete tomorrow. The article structures this around the concept of "scaffolding" and four foundational, improving constants: context window size, generation speed, intelligence, and cost. Key Insights The history of AI engineering is a cycle of building and deleting scaffolding. Each new model release (e.g., GPT-5, Claude 4.5) internalizes capabilities that previously required external systems for PDF parsing, math, long-context reasoning, or structured output, rendering those complex systems obsolete. The author identifies "Four LLM Constants" that are predictably improving and reducing the need for scaffolding: exponentially expanding context windows (now in the millions of tokens), generation speed approaching real-time, intelligence with no visible ceiling (especially in tool use and long-horizon tasks), and the rapidly falling cost of intelligence. Specific technologies hailed as best practices are now on the decline, including vector databases for RAG, LLM frameworks, dedicated prompt engineering teams, custom fine-tuning, and semantic caching layers. The best AI code is now simple and close to the model API. Successful AI teams cultivate a specific mindset: deep model awareness, strategic foresight, frontier vigilance, and ruthless iteration. They fight against powerful inertia forces like sunk cost fallacy, fear of regression, and resume-driven development. The paradigm inverts traditional software engineering values. In AI, complexity and abstraction are liabilities, not signs of sophistication. The most "architected" codebase from 12 months ago is likely the worst today, as the model has eaten the layers built on top of it.

It's Monday morning, 6:30 AM in San Francisco. I already know what's gonna happen this week. We're gonna delete our code. A lot of it. Because models are eating our code.

I will show up to our Mission office and delete thousands of lines of code. Again.

Each time a new LLM model comes out, that’s the same story.

LLMs have limitations so we build scaffolding around to make them work. Each models introduce new capabilities so that old scaffoldings must be deleted and new ones be added.

But as we move closer to super intelligence, less scaffoldings are needed. This post is about what it takes to build successfully in AI today.

Why We Build Scaffolding?

Every line of scaffolding is a confession: the model wasn’t good enough.

LLMs can’t read PDF? Let’s build a complex system to convert PDF to markdown

LLMs can’t do math? Let’s build compute engine to return accurate numbers

LLMs can’t handle structured output? Let’s build complex JSON validators and regex parsers

LLMs can’t read images? Let’s use a specialized image to text model to describe the image to the LLM

LLMs can’t read more than 3 pages? Let’s build a complex retrieval pipeline with a search engine to feed the best content to the LLM.

LLMs can’t reason? Let’s build chain-of-thought logic with forced step-by-step breakdowns, verification loops, and self-consistency checks.

etc, etc... millions of lines of code to add external capabilities to the model.

But look at models today: GPT-5.2 is solving frontier mathematics, Grok-4 Fast can read 3000+ pages with its 2M context window, Claude 4.5 can ingest images or PDFs, all models have native reasoning capabilities and support structured outputs.

The once essential scaffolding are now obsolete. Those tools are backed in the model capabilities.

Why it’s so hard to build in AI?

It’s nearly impossible to predict what scaffolding will become obsolete and when. What appears to be essential infrastructure and industry best practice today can transform into legacy technical debt within months.

The best way to grasp how fast LLMs are eating scaffolding is to look at their system prompt (the top-level instruction that tells the AI how to behave).

Looking at the prompt used in Codex, OpenAI coding agent from GPT-o3 model to GPT-5 is mind-blowing.

GPT-o3 prompt: 310 linesGPT-5 prompt: 104 lines

The new prompt removed 206 lines. A 66% reduction.

GPT-5 needs way less handholding. The old prompt had complex instructions on how to behave as a coding agent (personality, preambles, when to plan, how to validate). The new prompt assumes GPT-5 already knows this and only specifies the Codex-specific technical requirements (sandboxing, tool usage, output formatting).

The new prompt removed all the detailed guidance about autonomously resolving queries, coding guidelines, git usage. It’s also less prescriptive. Instead of “do this and this” it says “here are the tools at your disposal.”

As we move closer to super intelligence, the models require more freedom and leeway (scary, lol!).

Advanced models require simple instructions and tooling. Claude Code, the most sophisticated agent today, relies on a simple filesystem instead of a complex index and use bash commands (find, read, grep, glob) instead of complex tools.

It moves so fast. Each model introduces a new paradigm shift. If you miss a paradigm shift, you’re dead.

Having an edge in building AI applications require deep technical understanding, insatiable curiosity, and low ego.

But because everything changes, it’s good to focus on what won’t change

The Four LLM Constants

Context Windows: The Great Expansion

Context window is how much text you can feed the model in a single conversation. Early model could only handle a couple of pages. Now it’s thousands of pages and it’s growing fast.

Dario Amodei the founder of Anthropic expects 100M+ context windows while Sam Altman hinted at billions of context tokens. It means the LLMs can see more context so you need less scaffolding like retrieval augmented generation.

November 2022: GPT-3.5 could handle 4K context

November 2023: GPT-4 Turbo with 128K context

June 2024: Claude 3.5 Sonnet with 200K context

June 2025: Gemini 2.5 Pro with 1M context

September 2025: Grok-4 Fast with 2M context

2. Generation Speed: From Seconds to Instant

Models used to stream at 30-40 tokens per second. Today’s fastest models like Gemini 2.5 Flash and Grok-4 Fast hit 300+ tokens per second. A 5x improvement.

On specialized AI chips (LPUs), providers like Cerebras push open-source models to 2,000 tokens per second. It is rumored that GPT Codex will use Cerebras' inference speed.

We’re approaching real-time LLM: full responses on complex task in under a second.

3. Intelligence: There is no Wall

LLMs are becoming exponentially smarter. With every new model, benchmarks get saturated. On the path to AGI, every benchmark will get saturated. Every job can be done and will be done by AI.

As with humans, a key factor in intelligence is the ability to use tools to accomplish an objective. That is the current frontier: how well a model can use tools such as reading, writing, and searching to accomplish a task over a long period of time.

This is important to grasp. Models will not improve their language translation skills (they are already at 100%), but they will improve how they chain translation tasks over time to accomplish a goal. For example, you can say, “Translate this blog post into every language on Earth,” and the model will work for a an hour or more on its own to make it happen.

Tool use and long-horizon tasks are the new frontier.

4. Cost: The Cost of Intelligence is Going to Zero

The cost of intelligence is going to zero. As Sam Altman noted:

“The cost to use a given level of AI falls about 10× every 12 months, and lower prices lead to much more use.”

Today’s smartest model costs what the smartest model cost 3 years ago. This exponential decline in pricing means AI capabilities that were once prohibitively expensive are becoming accessible to everyone.

What Nobody Tells You About Scaffolding

The uncomfortable truth: most engineers are maintaining infrastructure that shouldn’t exist but they don't know it yet.

Models will make it obsolete and the survival of AI apps depends on how fast you can adapt to the new paradigm. That’s what startups have an edge over big companies. Bigcorp are late by at least two paradigms.

Some examples of scaffolding that are on the decline:

Vector databases: Companies paying thousands/month for when they could now just put docs in the prompt or use agentic-search instead of RAG (my article on the topic)

LLM frameworks: These frameworks solved real problems in 2023. In 2025? They’re abstraction layers that slow you down. The best practice is now to use the model API directly.

Prompt engineering teams: Companies hiring “prompt engineers” to craft perfect prompts when now current models just need clear instructions with open tools

Model fine-tuning: Teams spending months fine-tuning models only for the next generation of out of the box models to outperform their fine-tune (cf my 2024 article on that)

Custom caching layers: Building Redis-backed semantic caches that add latency and complexity when prompt caching is built into the API.

This cycle accelerates with every model release. The best AI teams master have critical skills:

Deep model awareness: They understand exactly what today’s models can and cannot do, building only the minimal scaffolding needed to bridge capability gaps.

Strategic foresight: They distinguish between infrastructure that solves today’s problems versus infrastructure that will survive the next model generation.

Frontier vigilance: They treat model releases like breaking news. Missing a single capability announcement from OpenAI, Anthropic, or Google can render months of work obsolete.

Ruthless iteration: They celebrate deleting code. When a new model makes their infrastructure redundant, they pivot in days, not months.

It’s not easy. Teams are fighting powerful forces:

Lack of awareness: Teams don’t realize models have improved enough to eliminate scaffolding (this is massive btw)

Sunk cost fallacy: “We spent 3 years building this RAG pipeline!” (lol)

Fear of regression: “What if the new approach is simple but doesn’t work as well on certain edge cases?”

Organizational inertia: Getting approval to delete infrastructure is harder than building it

Resume-driven development: “RAG pipeline with vector DB and reranking” looks better on a resume than “put files in prompt” right?

In AI the best team builds for fast obsolescence and stay at the edge.

The counter intuitive truth

Software engineering sits on top of a complex stack. More layers, more abstractions, more frameworks. Complexity was a sophistication. A simple web form in 2024? React for UI, Redux for state, TypeScript for types, Webpack for bundling, Jest for testing, ESLint for linting, Prettier for formatting, Docker for deployment….

AI is inverting this. The best AI code is simple and close to the model.

Experienced engineers look at modern AI codebases and think: “This can’t be right. Where’s the architecture? Where’s the abstraction? Where’s the framework?”

The answer: The model ate it bro, get over it. The worst AI codebases are the ones that were best practices 12 months ago.

As models improve, the scaffolding becomes technical debt. The sophisticated architecture becomes the liability. The framework becomes the bottleneck.

LLMs eat scaffolding for breakfast and the trend is accelerating.

By
NBNicolas Bustamante