sitting closely with this topic, and it brought back some experiences from working on a couple of projects.

Take this scenario: you ship an LLM-powered feature, the demo is clean, and all stakeholders are happy. Then three weeks into production, something breaks in a way nobody can explain.

You spend an afternoon staring at logs that tell you what happened but not why.

Then it turns out that the framework swallowed the context somewhere between step three and step four, and now you’re reading source code you didn’t write.

That is not a bug report; it is a wakeup call about the architecture.

Frameworks like LangChain let engineers build LLM-powered systems without first understanding how those systems work under pressure. At first, that sounds like the cavalry has arrived.

But trust me, the cost doesn’t show up until you’re deep in a production incident, and now you’re stuck wondering why your agent skipped the verification step it was supposed to run.

This post is about that cost and why more engineers, after discovering it, are now building the orchestration layer by themselves.

Give LangChain Its Credit

I remember watching a colleague build a working RAG pipeline in about forty minutes sometime in early 2023.

He went from the vector store through the retrieval chain, prompt templates, and the LLM call, all connected by lunchtime.

Six months prior, that would have been at least a two-week project.

Come to think of it, that’s actually how and why LangChain spread so fast.

Most engineers hadn’t built LLM applications before. Nobody had strong opinions about the right way to structure a retrieval chain or manage conversation memory and other stuff like that.

LangChain showed up with answers that were modular, composable, and documented, and of course, teams grabbed them immediately, including mine.

So when I say it creates problems in production, I’m not being dismissive. It was optimized for the phase most teams were in when they adopted it. The problems came later, when the phase changed.

Where the Abstraction Breaks

When I was learning object-oriented programming in my sophomore year, one of the first concepts that clicked was abstraction: hiding the internal details of how something works and only exposing what the user needs.

LangChain applies that same idea to LLM orchestration. It hides a lot of what’s happening inside your system so you can move faster.

But production AI systems demand something that cuts against that: clarity.

You need to know exactly what your system did, in what order, with what inputs, and why. Not roughly. Exactly.

Abstractions trade that visibility for speed. That’s a fair trade at first, until the hidden complexity becomes the very thing you need to understand.

And it shows up in more ways than one.

Debugging is worse than it sounds: When a multi-step chain gives the wrong output, you’re not just debugging your own code. You’re also trying to understand the framework’s execution flow and what the callback layer was doing behind the scenes.

I once spent three hours tracking down a failure that turned out to be a memory module silently cutting out context. The fix itself took four minutes. Finding what caused it took half a day because the abstraction made the actual behavior invisible.

Observability hits a ceiling: You can integrate LangSmith and get useful traces, but you’re still seeing things through the framework’s lens, limited to the spans it chooses to expose. When you need visibility into something specific to your business logic, you end up working around the framework’s data model instead of just measuring what actually matters.

Multi-agent state is where things really fall apart: The moment you have agents coordinating, one planning, others executing, and another verifying, shared state becomes the real problem.

Who created this information, when, and is it still valid?

One agent updates memory, another reads a stale version, and the coordinator makes a decision based on context that no longer matches reality.

Framework-managed state tends to work just fine for the happy path and quietly breaks down at the edge cases. Production systems live in those edge cases.

Latency accumulates: Every abstraction layer adds overhead through serialization, validation, callback firing, and internal routing that runs whether you need it or not.

In a prototype that overhead is invisible. Under real traffic, it shows up in percentile latency, specifically in the p95 and p99 ranges where users actually feel it.

The cost per call might be small, but in an agentic system making four, five, or even six model calls per user request, those small costs compound quickly.

At some point, you have to ask whether that overhead is still worth what it buys you.

None of this is impossible to solve inside a framework. But the fixes start to look like working around the framework instead of working with it. And once you get there, it becomes harder to tell what the framework is still giving you.

So What Does “Building It Yourself” Actually Look Like?

“Native agent architecture” sounds more complex than it really is. It just means writing the orchestration logic yourself as code you own, instead of relying on a framework’s abstraction of it.

State is something you define and update explicitly. Tools are clear functions you can test on their own. Memory is code you wrote, so it’s easier to debug, control, and understand what gets stored and how it gets retrieved.

The model call is your code, which means you can instrument it directly and trace what matters.

Let’s not forget, complex workflows map more naturally here. Things like parallel execution, conditional branching, and long-running async tasks work much better in event-driven patterns in ways that synchronous chain execution doesn’t handle cleanly.

More design work upfront means less firefighting later.


I’ve seen teams rebuild a perfectly good LangChain prototype into a custom orchestration layer just because native architectures felt more “serious.” They spent three extra weeks on it and shipped the same system with more code to maintain.

To me, that’s not progress.

If you’re checking whether a feature is worth building, then a framework gets you there faster. If three people use the system internally and nobody’s pager is attached to it, the abstraction overhead is fine.

The question isn’t “framework or native?” It’s what you need to optimize for right now. Fast iteration on uncertain requirements means the framework makes sense. Real users, real SLAs, agent coordination, and operational monitoring mean the native architecture earns its upfront cost.

Most teams hit that turning point sooner than they expect, usually at the first serious debugging session or the first time someone asks for detailed metrics, and the honest answer is “not without a lot of extra work.”

That’s the moment to rethink the architecture, not after six months of piling on workarounds.

Frameworks are how knowledge transfers in a new field. LangChain made LLM application development accessible for a generation of engineers. That contribution is real.

But maturity in a domain looks like moving from “I configure the framework to do the thing” to “I understand what the framework was doing, and I make those decisions myself.”

Not because frameworks are bad, but because owning your architecture means you know what’s happening under the hood.

The engineers building the most reliable production AI systems aren’t the ones with the most sophisticated tooling.

They’re the ones who can explain exactly what their system does at any point. What prompt is constructed, from what context, under what conditions, and with what fallback.

That clarity is hard to maintain through thick layers of abstraction.


Final thoughts

Abstraction debt is quiet until it’s loud. You won’t notice it during the build. You’ll notice it when something fails in a way the framework’s error message can’t explain.

That moment comes earlier than you expect, usually triggered by a debugging session or a monitoring ask rather than a planning meeting.

State and observability are not optional. If you can’t trace what your agent did and why, you’re not really improving the system. You’re just hoping for the best every time you redeploy.

Treat orchestration as a real architectural decision. Pick it on purpose, with the tradeoffs visible.

The engineers building durable AI systems aren’t the ones who avoided frameworks. They’re the ones who knew when to stop letting the framework decide for them.


Before you go!

I write more about the real engineering decisions behind AI systems, where abstractions help, where they hurt, and what it takes to build reliably.

Connect With Me