How we engineered a scalable and performant enterprise AI platform

In the AI era, some long-standing engineering tradeoffs must be revisited.

For years, multi-tenant architectures were the engineering defaults. They were a default for a reason as they were proven to be simpler, cheaper and easier to scale without much engineering efforts. This logic makes sense for applications that were built to be deterministic and expect someone to set up logical data segregation rules to prevent accidental cross-contamination. This settled law breaks as soon as AI enters the arena.

When an AI model starts to learn from data, especially business sensitive client’s data, then the cost of cross-tenant contamination is not just a theoretical risk; it’s a complete failure of trust and compliance, which now poses an existential threat.

We made the most uncomfortable call. We gave each client their own database, hosted on a separate cloud infrastructure. To us, it meant total isolation and the kind of practice where AI models can train only on the data they are supposed to see. However, to everyone else, it meant operational suicide.

This article is the story of why we went single tenant, an architecture that kept us operationally sane and the advantages it offers that no multi-tenant architecture can match and how that strategic decision kept compounding in our favor.

Why AI in insurance demands true isolation

Commercial insurance runs on some of the most sensitive data on premiums, quotes, claim histories, underwriting decisions or pricing models that represent decades of actuarial refinement. When a client hands us their data, they are trusting us with all the competitive intelligence from their systems. Any leaks here would mean end of their competitive existence.

Just by seeing the regulatory landscape, one can understand how high the stakes are. Controls enforced by SOC 2, HIPAA, ISO 27001 and GDPR become extremely critical to business as it demonstrates where a client’s data lives, what guardrails are built to protect it and also have access to it. In a multi-tenant architecture, that control would mean logical separation of data using tenant ID filters and layers of access control in place. In single tenant, we simply point to a completely isolated environment and say: “That’s theirs. Nothing else touches it.”

But the AI dimension adds a risk that most compliance frameworks are neither fully aware of nor know how to deal with it. For example, consider a machine learning model trained on one client’s data that can accidentally leak its sensitive predictions for another client. Another such example is vector embeddings. Vector embedding is widely used by AI models to capture semantic relationships, but it makes it harder to audit cross-tenant leaks. The emerging research on AI data contamination suggests these risks are real and growing because when your AI model starts to see patterns from hundreds of clients, then proving that we are not leaking competitive intelligence becomes nearly impossible.

I use a simple litmus test when evaluating isolation: Could this client log in directly into their environment and see only their world?

In multi-tenant, the answer will always be “no, but…” followed by an explanation of abstraction layers. In single tenant, the answer is always yes because all client’s database is truly theirs, including compute resources. That clarity itself changes the conversation with enterprise clients.

Eliminating the middleware layer: a performance and simplicity win

The conventional enterprise tech stack looks something like

  1. Application/UX layer talks to middleware
  2. Middleware talks to an Object Relationship Mapping (ORM)
  3. ORM talks to the database

Every hop adds latency and every abstraction layer becomes a place where even sophisticated engineers end up reinventing wheels. Whether it’s building custom caching, custom connection management or custom retry logic, the result is code that exists primarily to move data back and forth, consuming IO and adding milliseconds to every operation.

We made a completely different choice and that’s zero middleware. Meaning business logic lives next to the database as database functions, exposed directly via REST endpoints. When an application needs data, it makes one simple REST call and the database does the heavy computation and transformation work using the compute resources allocated as per the client’s need. This eliminated the need for network hops, roundtrips and even serialization overheads, reducing latency.

Research on database architecture consistently shows that network roundtrips account for 30-70% of query response time in traditional multi-tier architectures. Each middleware hop typically adds 5-20 milliseconds. When you’re executing complex AI workflows that might involve dozens of data operations, those milliseconds compound into seconds.

When the data lives next to the business logic then retries become nearly instantaneous. A failed operation in a middleware architecture means re-establishing connections, re-serializing payloads and re-transmitting across the network, which can easily consume 50-100 milliseconds per retry. For systems that process thousands of decisions per minute, that difference amplifies. This approach felt counterintuitive at first. We were asked Aren’t we supposed to keep business logic out of the database? That guidance made sense when databases were expensive, hard to scale and difficult to version control. Modern databases change the equation as functions are now code. They can be tested, versioned and deployed like any other artifact. And they execute where the data lives, eliminating the network penalty in distributed systems.

We extended this philosophy to document storage, where each tenant’s database includes S3-compatible object storage, unifying structured and unstructured data in one environment. Policy documents, loss run PDFs and underwriting photos can now live alongside the relational. So when an AI model needs to process a document, it does not fetch from a separate storage service. Everything is already there, secured by the same access controls under the same tenant.

Authentication, too, can happen at the database layer. Rather than building a separate identity service that gates access to a middleware layer and then gates access to data, we collapsed the security perimeter to one defensible point, which is the client’s database. When auditors ask how we control access, the answer is simple and verifiable: show them the database configuration.

How we made single-tenant operationally boring

The honest objection to single-tenant architecture is operational complexity due to the sheer number of databases a team must manage.

The answer is infrastructure-as-code (IaC), all the way down. Every tenant’s environment is provisioned from identical templates differing in resource allocation, geographic placement and network policies, all controlled via parameters and not custom implementations. When we spin up a new tenant, we are not building new infrastructure from scratch but instead instantiating a well-tested template with tenant-specific values. The process takes minutes, not days.

Our deployments follow a canary pattern, which means an update to business logic will redeploy a set of new functions, schema migrations and performance optimizations to a subset of tenants first and when something breaks, we catch it before it affects the broader fleet. Only when all validation passes with high confidence, we then enter a blast mode to roll out to the remaining tenants. This canary approach is well-established in large-scale systems, but single-tenancy makes it even more powerful because each canary is a completely isolated environment.

Every change to the database must be idempotent and atomic by design so changes can be applied repeatedly without side effects under one transaction. This means we don’t worry about partial deployment states and in case changes/migrations fail halfway, the entire transaction rolls back. This makes fixing issues and re-running migrations much easier without impacting the user.

Our CI/CD pipelines are tenant-aware, so it knows the entire fleet, orchestrate deployments across it and can roll back individual tenants if needed. The operational complexity that seemed unmanageable becomes manageable when you treat the fleet as a programmable system rather than a collection of snowflakes.

This is where single tenant starts showing its advantages over multi-tenant. In multi-tenant, a bad deployment affects everyone simultaneously, making any incident everyone’s incident. However, in single-tenant, the blast radius is contained and one tenant’s issue stays isolated from other and hence investigation as well as remediation can be carried out in isolation without taking down the entire platform. We can technically even keep a problematic tenant on a previous version while we debug, without blocking updates for everyone else.

Why per-tenant resource isolation beats any noisy-neighbor fix

Multi-tenant architecture has a dirty secret under the hood: “Everyone gets the same resources.” A small agency and a large carrier share the same compute pool, so when the carrier runs a heavy AI workload, the agency’s performance can degrade. The noisy neighbor problem is well documented in cloud architecture and multi-tenant systems spend enormous engineering effort trying to mitigate it through resource quotas, request throttling and priority queues. These mitigations add complexity and rarely eliminate the problem.

Single tenant architecture eliminates noisy neighbors because each tenant’s resources are dedicated to their needs and more importantly, each tenant’s resources are independently tunable. This flexibility becomes a feature, not a limitation.

CPU and RAM scale vertically based on workload, so a tenant running compute-intensive AI inference gets more horsepower. On the flip side, a tenant with lighter needs pays for lighter infrastructure. Disks, on the other hand, scale independently too. Meaning tenants with large document repositories get more storage without affecting their compute allocation. Connection pools can also tune to each tenant’s concurrency patterns.

Geographic flexibility is another important dimension, and to explain this lets consider a Canadian broker who requires all of their data, at rest and in flight, to remain within Canadian jurisdiction, an Irish carrier needs EU data residency, a US-based MGA might require their instance to run within their own network perimeter for security reasons. In multi-tenant, satisfying these requirements means complex data routing and careful tenant isolation within shared infrastructure. In single tenant, we simply deploy the tenant’s environment in the required region or cloud account, making the entire architecture region as well as resource-agnostic by design.

Here is another beauty of this design, especially when it comes to cost. Cost attribution becomes transparent, so every dollar spent on infrastructure maps to a specific tenant. Pricing, margin analysis and capacity planning all simplify when resource consumption is directly observable per customer.

Zero-trust inside the tenant: RLS and CLS in Practice

Single-tenant architecture does not mean everyone inside the organization gets to access everything. Commercial insurance operations typically span multiple teams with distinct data access requirements, as seen below:

  1. Employee Benefits team shouldn’t see Property & Casualty (P&C) data.
  2. Professional Lines stays separate from Surety & Bonds.
  3. Risk Management needs broad visibility, while individual underwriters need focused access to their book of business.

We implemented these boundaries using row-level security (RLS) at the database layer itself by attaching access policies based on user identity and role. Now, when a Benefits analyst queries the system, they only see Benefits data and similarly P&C team can see records that are accessible to their team. This eliminated the need for application-level filtering by giving control back to the database, where it belongs.

Column-level security (CLS) adds another dimension of strength to this architecture. Meaning things like compensation data, personally identifiable information, proprietary pricing factors get to enter much tighter controls, as a result, a claims analyst might see claim amounts but not claimant social security numbers and an underwriter might see policy limits but not commission rates.

This layered approach makes data security not just a promise on a piece of legal paper. but instead observable in the database configuration and testable with queries.

Lessons from betting big on single-tenant

The single tenant vs. multi-tenant debate often gets framed as a tradeoff between isolation and scalability. My experience suggests that this framing is now outdated and modern practices like container orchestration and single-tenant architectures can scale to hundreds or thousands of tenants without the operational burden that made them impractical a decade ago.

What we gain is control over:

  • Resources
  • Geography
  • Upgrade paths
  • Blast radius

These controls matter more as AI systems become central to enterprise operations because the consequences of getting isolation wrong in the first place will compound with model upgrades and hence single-tenant architecture deserves serious reconsideration.

Scaling is not only about handling more tenants but also about maintaining the isolation guarantees that make enterprise AI trustworthy. That’s what single-tenant architecture delivers and why we bet big on it.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?