What Autonomous Agent Revolution may mean for typical Banking Infrastructure?

What you’ll learn

Core banking modernization's $180M price tag is now obsolete because autonomous agents can mediate legacy systems without replacing them, collapsing costs to ~$26M and timelines to 9 months.
The key insight is that agents make COBOL code legible, turn green-screen terminals into APIs, and render pre-built integrations worthless—all while keeping the system of record untouched and deterministic tool layers handling every transaction.
The technology is already in production, and the question is not whether this will disrupt the modernization industry, but how fast.

For fifty years, banks have paid billions to replace the systems that hold their money. Over the next eighteen to twenty-four months, autonomous agents are likely to make that math look very different.

Cost comparison: $180M legacy modernization vs $26M agent-mediated deployment — Fig 1 — A typical $180M, 4-year core modernization could plausibly be replaced by a roughly $26M, 9-month agent-mediated deployment, if current trends hold.

There is a shift gathering momentum inside the world's banks, and relatively few people are talking about it at the board level. It's not a new core system. It's not a new vendor pitch. It's a growing sense — spreading from chief AI officers to CIOs to CFOs — that the economics of core banking modernization may be approaching an inflection point.

The thesis is straightforward: everything above the system of record could be flattened by autonomous agents within the next 18 to 24 months. The $180M migration. The four-year implementation. The large external consultancy team. The risk-laden data cutover. All of it, potentially replaced by a few million dollars of agent platform and nine months of focused engineering.

This is not a prediction of what every bank will do. It's a read on where the technology is heading, and what becomes possible when the cost curve bends sharply enough. The question is not whether the modernization industry faces pressure. The question is how much, how soon, and how the smartest participants position themselves.

The stack, then and toward

To understand the shift, look at the stack. The picture below shows a typical large bank's technology stack on the left — the stack that has been the target of modernization programs for two decades — and, on the right, the architecture that a growing number of engineering teams believe will replace it.

Fig 2 — The direction of travel: five layers could collapse to three. Channels stay. The system of record stays. Everything in between may be absorbed by an agent layer and a thin deterministic tool layer.

In this emerging architecture, five layers become three. The customer channels at the top may not need to change. The system of record at the bottom may not need to change. The three layers in the middle — the connectors, the middleware, the business logic — could be absorbed into a single agent layer sitting on top of a thin, deterministic tool layer that is itself a typed interface to the core. Whether this happens in 18 months or 36 is a function of engineering velocity and regulatory comfort, not technical feasibility. The technical feasibility is already here.

The system of record may no longer be the product. It may become the substrate. And the agent could become the product.

Why the old math is coming under pressure

The reason core modernization costs $180M was never the core itself. It was everything around it. Specifically, it was the impedance mismatch problem — the cost of getting modern systems to talk to systems written in COBOL in 1987.

For decades, the industry's answer fell into two camps: replace the core (greenfield migration, 4 years, $200M, high cutover risk) or build a translation layer around it (the "thin core" or banking-as-a-service approach, which still required significant middleware investment). Both approaches assumed that legacy code was fundamentally unserviceable — a black box that had to be either replaced or wrapped in custom code maintained by teams of engineers.

Autonomous agents are beginning to challenge both assumptions. Three shifts are underway:

1. The code is becoming legible. Modern LLMs can already read COBOL, JCL, CICS transaction maps, and VSAM files with enough fidelity to be useful. The decades of undocumented business logic that lived inside the core — the kind that took an analyst six months to reverse-engineer — could potentially be extracted in weeks. The agent doesn't just read the code; it can build a structured model of the business rules, the data dependencies, and the screen flows that would otherwise take years to assemble.

2. The 3270 screen could become an API. For fifty years, the primary way to programmatically interact with a mainframe core was to either (a) reverse-engineer the database, (b) build a screen scraper that broke whenever the vendor pushed an update, or (c) wait for the vendor to publish a modern interface. Multimodal agents are now capable of navigating green-screen terminals natively — reading the screen, filling fields, parsing the response, and returning structured data. The screen effectively becomes the API. Every transaction that a human teller could perform becomes a typed, callable tool.

3. Integration may become dynamically composable. The pre-built connector catalog — the 200+ vendor integrations that core banking vendors have long sold as a competitive moat — begins to look less essential when an agent can read an API spec, generate the integration code, test it against sandbox endpoints, and deploy it in a fraction of the time. The pitch "we already integrate with 200 systems" starts to feel like a map shop selling pre-printed atlases as GPS becomes available.

The emerging architecture

Look at the picture again. The architecture on the right has three layers, and the boundary between the top two is where most of the design work happens. This boundary — the neuro-symbolic boundary — may turn out to be the most important architectural decision in agentic banking.

The top layer is the LLM. It does what LLMs are good at: parsing natural language, planning multi-step workflows, recovering from errors, holding a conversation. Under this model, it would never hold state. It would never write directly to a database. It would never move money. It would only decide which tool to call and with what parameters.

The middle layer — the deterministic tool layer — is where everything important actually happens. Every tool is a typed, versioned, idempotent function. Same inputs, same outputs, every time. When the agent calls debit_account(account_id, amount), that is not an LLM deciding how to move money. That is a transactional primitive executing against the system of record with full ACID guarantees.

Take a payment exception. Today, a human ops agent opens four systems, copy-pastes data, makes a judgment call, and takes twelve minutes. In the agent-mediated future, the same workflow could look like this:

Fig 3 — A projected workflow for payment-exception resolution, with deterministic tools at each step and a human in the loop for the approval. Twelve minutes could compress to under a minute.

Every step in that diagram is a deterministic tool call. The LLM would never invent a wire detail. It would never guess an account balance. It would only orchestrate the calls and present the result in natural language. The compliance audit trail would be trivially reconstructable: here is every tool that was called, with its inputs, its outputs, the model version that invoked it, and the human who approved the override. That is a standard many banks would welcome.

Multiply that across the operations estate — new account opening, KYC refresh, wire reconciliation, lending decisioning, regulatory reporting — and the time compression could be material. The chart below shows projected ranges based on what early-stage deployments have demonstrated.

Time required today vs agent-mediated for five common banking operations — Fig 4 — Projected time to complete, today vs. agent-mediated, across five common banking operations. Handle time could compress 80–98%, though real-world outcomes will vary by transaction type, regulatory jurisdiction, and data-layer quality.

The objection — and the emerging answer

The standard concern is well-founded: the LLM is non-deterministic. It would be reckless to put a probabilistic system between the regulator and the money.

The concern is right about the LLM. It is, in many framings, misplaced about the architecture.

Under the neuro-symbolic pattern, the LLM would not sit between the regulator and the money. The LLM would sit at the conversational boundary — parsing intent, planning the workflow, presenting the result. The layer between the regulator and the money would be the deterministic tool layer, which is designed to be exactly as auditable and exactly as deterministic as the core it wraps. Every transaction is a typed function call with a signature, an idempotency key, an input validation step, and a logged result. The fact that an LLM decided to invoke it is logged too. The audit trail is not "the model said so." It is "tool X was called with parameters Y, returned result Z, model version A, timestamp T."

This is not a hypothetical pattern. Elements of it are already in production at card networks and at large acquirers, and it is the default architecture for newer banks that never had a legacy core. The LLM is the user interface. The tool layer is the system. The core is the ledger. Three different jobs, three different guarantees, three different audit and compliance stories.

The other concern — the data is too dirty — is real, and it would be a mistake to dismiss it. Data quality has always been a prerequisite for any banking system, modern or legacy. What may change is that agent-mediated data quality could become continuous and self-correcting rather than a one-time migration project that ships once and slowly drifts out of alignment. Agents can find duplicate records, reconcile mismatched customer IDs, and flag orphaned accounts in real time. The first 60% of the gain is likely to be fast. The last 20% will take longer. But it is parallelizable, and it does not necessarily need to block the deployment.

What may contract, what may endure, what may be built

The potential pressure on the $180M modernization program is not the only story. The emerging architecture could affect several adjacent categories.

Under pressure. The "thin core" as an independent product category. The pitch was "modern API layer over modern core." If agents provide the API layer and the legacy core still functions, the modern core becomes a significant middle layer whose value proposition shifts. Banking-as-a-service platforms that aggregate other banks' cores could face a similar question: if an agent can call the underlying bank API directly, what does the aggregation layer add beyond latency and cost? Large-scale system integration practices — the kind that staff hundreds of people on multi-year migrations — may find their cost structure harder to justify against a deployment measured in months and millions.

Likely to endure but evolve. The legacy core vendors themselves. The business case for their software is unlikely to collapse — the system of record still needs to exist, and the switching costs remain enormous — but the growth narrative could shift. Their transition may be from "sell new cores" to "sell agent-grade tool interfaces to the installed base." That implies a different deal size and a different sales motion, but it is defensible, because they own the relationship. The institutions that move early are already exploring 3270-to-tool SDKs as a retention and modernization play.

Likely to be created. At least three new categories of infrastructure. First: legacy-to-agent extraction — tooling that ingests COBOL, JCL, and CICS and produces structured knowledge graphs of business rules and data models. This is what modernization could become. Second: agent governance platforms — tool registries, audit logging, model versioning, red-teaming, and regulatory reporting. Third: deterministic rule engines purpose-built for financial services — the symbolic business-rule layer that agents invoke, with explainability and version control built in. Between them, these could represent a material new infrastructure market, because every bank's rules are jurisdiction-specific and must be explainable.

What a bank might do in the next 12 months

The strategic question at the board level is likely to shift from which core do we buy? to do we want to be a bank that uses agents, or a bank that is designed around them?

The first is a cost-efficiency play over roughly two years: deploy agents over the existing core, reduce ops handle time, accelerate product changes, and reduce dependency on large-scale system integration engagements. The second is a longer strategic repositioning: rebuild the data layer for agent consumption, require every internal system to expose itself as a typed and versioned tool, stand up a dedicated agent platform team, and engage regulators early so that when agent-mediated products launch, the supervisory framework is already in place.

Both paths could start with the same three steps. First, audit the core — document every 3270 screen, every transaction code, every business rule. That inventory is the raw material. Second, run a proof of concept on a handful of high-volume, low-risk transactions: address change, balance inquiry, stop payment, account inquiry, wire status. Wrap them with an agent. Measure handle time, error rate, and ops impact. Third, expand to tier-one customer service and back-office ops. The risk envelope is similar to existing IVR plus human handoff. At that stage, no regulator necessarily needs to be in the room.

What may be worth reconsidering: signing the next multi-year, multi-hundred-million-dollar core replacement deal without first stress-testing whether a lighter agent-mediated deployment could achieve a meaningful share of the same outcomes. The cost of waiting may no longer be "we still have a legacy core" — that has been true for decades. The cost of waiting may become "our competitors have an 18-to-24-month head start on an architecture that changes the unit economics of change itself."

The bet underneath the bet

For decades, the implicit bargain of core banking was: we will pay billions to keep the ledger honest, and the rest of the system will be whatever the integrator built around it. The agent movement challenges that bargain — not by replacing the ledger, but by making everything above the ledger cheaper, faster, and more intelligent. The thing the bank was actually buying — agility, customer experience, integration speed, time-to-market — may turn out to be the thing agents are best at.

The institutions that internalize this in the next twelve to eighteen months could open a meaningful gap. The ones that dismiss it may find themselves watching a competitor ship in weeks what used to take them years. And the system integrators that built their practices on the old math may spend the same period trying to rebrand "modernization" as "agent deployment" while the economics of the engagement fundamentally shift beneath them.

The core is unlikely to go away. The core may simply become boring. Which, in an odd way, is what it probably should have been all along.

The architecture diagrams in this post describe patterns in general terms — vendor names intentionally omitted. The payment-exception flow is a composite of patterns observed across multiple institutions. The time-compression estimates are projections based on early-stage evidence; real-world outcomes will vary substantially by transaction type, regulatory jurisdiction, and the quality of the existing data layer. This post is a thesis about where the technology is heading, not a claim about what every bank is already doing.

Image credits

Cover illustration

Generated for this article

AI-generated

0 comments

Siddharth

Thoughts and essays, published with Yokush. See more posts

Comments 0

No comments yet — be the first.