Blogs | Openstream.ai

Multi-Agent Systems and the Threshold of Complexity

Written by Magnus Revang | May 15, 2026 7:57:49 PM

Every complex technology eventually reaches a point where no single person can hold it entirely in their head. In engineering, we call this the threshold of complexity — the line where a system transitions from something one mind can reason about fully, to something that requires structures, disciplines, and teams to manage. We crossed  that line long ago with traditional software, and the entire discipline of software engineering evolved to cope with it. Object- oriented programming, micro-services, event-driven architecture: these aren't just technical fashions.  They're hard-won answers to a single enduring problem: how do you build and operate something larger than any one mind can fully grasp?

But every one of those answers rests on a bedrock assumption — one so foundational that most engineers have never had reason to question it. Traditional IT systems are deterministic. Feed a system the same input, get the same output. Every time, without exception. This axiom is what makes large-scale software engineering possible. Every practice, tool, and methodology we've built to manage complexity above the threshold — testing frameworks, modular design, continuous integration, formal verification — is quietly dependent on it. Multi-agent AI systems break that axiom entirely.

When you build systems where AI agents — LLMs (large language models that reason and generate) and LRMs (large reasoning models that plan and problem-solve) — collaborate, delegate, and hand work to one another, augmented by RAG pipelines (dynamic knowledge retrieval that grounds agents in current, relevant information) and context engineering (precisely shaping what each agent knows and when), you introduce genuine non-determinism into the fabric of the system. The same input won't always produce the same output. That's not a flaw — it's precisely what makes these systems capable of remarkable things.

But the consequence is serious and underappreciated: the practices and tools we've relied on to manage  complexity above the threshold no longer apply. It's not that they need updating. The axiom they were built on no longer holds. To build multi-agent systems at scale, we don't need better versions of what we have — we need to rethink the foundations entirely.

The industry is already responding. Evaluations, monitoring frameworks, trust protocols, validation layers — these are meaningful advances. But they mostly help organizations go further before hitting the threshold. They don't solve what happens when you're building above it: when you have dozens of developers, hundreds of agents, and no single person who can trace the full consequences of any given change. That's the frontier most enterprises are quietly approaching, and few are genuinely prepared for it.

Three problems become acute at that scale — each familiar in shape from traditional software, but fundamentally different in character, without determinism to rely on.

  • Resilience In conventional systems, a well-designed change in one place doesn't unexpectedly break something elsewhere — determinism makes that containment possible. In multi-agent systems, agents pass natural language to one another, share memory, and pull from dynamic knowledge pipelines. The connections are looser and far less predictable. A change made by one developer on one agent can ripple in ways nobody anticipated — and in a large system, nobody may even notice until the damage is done.

  • Quality Traditional software either works or it doesn't — a binary you can test for. Multi-agent systems operate on a spectrum. An agent responsible for classifying information might be right 91% of the time, or 97% — and that difference, invisible in isolation, compounds through every downstream agent that depends on it. A quiet degradation in one corner of the system can silently degrade outcomes across the whole. There is no green light that tells you the system is working. There is only the slow drift of quality.

  • Adaptability Real-world inputs are messy and unpredictable. Individual agents can be made quite resilient to this. But in a network of collaborating agents, a missing piece of context — an unusual input, an edge case nobody designed for — can stall or corrupt the work of every agent downstream. In deterministic systems, unexpected input produces a predictable failure. In non-deterministic ones, it can produce something far more dangerous: a plausible-looking wrong answer that propagates before anyone catches it.

None of these problems is unsolvable. But they require a fundamentally different approach to system design — one where resilience, quality assurance, and adaptability are treated as architectural concerns from day one, not retrofitted after the fact. The discipline that traditional IT built to operate above the threshold of complexity needs to be rebuilt on new foundations for a world where determinism is no longer a given.

Openstream.ai was recently named a Sample Vendor in the inaugural Gartner Hype Cycle for Agentic AI, in part due to our approach to multi-agent systems and our ability to address the Threshold of Complexity for our clients. Clients that operate in highly regulated, risk-averse environments and markets who demand that their AI systems deliver operational AI, not a proof of concept with little chance of scaling. How can we help you?