What 20 years of AWS taught me about agentic AI

This year marks the 20th anniversary of AWS — and my 20th year building at Amazon.

My entire career is for the sole purpose of making developers’ lives easier. As a developer, it is a bit of a self-serving purpose. For example, I was constantly distracted by operating databases, so I joined the DynamoDB team to build a service that handles that, so that other developers and I would never have to operate databases again.

I then went on to work on Lambda and API Gateway. I didn’t have to babysit servers or handle request routing, and on CloudWatch, so I could see what my code was doing in production. Each time, the goal was the same: remove the painful, repetitive work and turn it into a service that just works.

I’m still chasing the same goal, just with a very different set of tools.

The rise — and limits — of vibe coding

Large language models added the ability to describe what I want in natural language and have code synthesized on demand. At first, this looked like “vibe coding” — ask for a change to the script, compile it, run it, copy the errors back and hope the next iteration was better.

Things got interesting when we wrapped the whole “vibe coding” workflow in agentic loops. Instead of me feeding back every error, an agent could call the model, run the code, see its own failures and keep iterating until tests passed. But there was a big problem: the agents wandered. That’s fine for side projects, not fine for large, critical codebases.

Spec‑driven development for agents

The way I’ve come to keep agents focused is through spec‑driven development. Instead of dropping an agent into a repo with a vague prompt, I co‑create three concrete artifacts with it before any serious coding begins: a requirements spec, a design document and a task breakdown, all in Markdown. These are a shared contract for what “done” means, written in a form that both humans and agents can read, critique and update.

In my day-to-day, I start with almost the same prompt I’d give any AI, but the agent expands it into a structured requirements document with clear “shall” statements and acceptance criteria. I review those requirements and chat with the agent until the document matches what I actually want. From there, the agent proposes a design, then breaks the work into tasks focused on getting something tangible running before adding polish and exhaustive tests.

What I like about this flow is not that it’s rigid. In practice, I bounce back and forth. I often see a design that reveals missing requirements, or I change my mind about the approach once I see code snippets. The point is that agents no longer “forget” what we agreed on. The spec, design and tasks are explicit, versioned and always visible. And when I want to tack on another feature or bugfix, I start with a fresh spec that describes exactly what I want to change.

Property‑based testing and keeping agents honest

Once the specs are explicit, you can turn them into invariants and use property-based tests to keep agents honest. Instead of writing one test for “given this exact input, expect this exact output,” I define properties that must hold across many inputs and sequences.

Without strong, spec-derived tests, I’ve seen agents game the system by “fixing” the tests instead of the code — commenting out assertions or weakening conditions just to get a green build. Property-based tests give me a way to encode my expectations once and have both humans and agents constantly prove we’re still meeting them.

This approach has clear implications for security as well. If security teams can encode expectations — about data handling, authorization and error behavior — as invariants in the same spec language the agent consumes, then property-based tests can hammer those invariants across many scenarios. That’s a much more robust way to shift security left than hoping every developer remembers every rule under deadline pressure.

DevOps agents and the next decade of practice

Over twenty years, I’ve learned that the key to incident response isn’t only about chasing the root cause — it’s systematically asking what changed, what callers changed, what limits were hit, what components failed as designed and what dependencies are involved.

A DevOps agent is becoming as important as any IDE. It plugs into the tooling teams already use and runs that investigation automatically whenever an alarm fires. It reads logs, metrics, traces and code, and often has a diagnosis and plan ready by the time I open my laptop.

I’ve seen incidents that once took eight hours of human sleuthing reduced to fifteen minutes, with the agent explaining the bug, citing evidence, and recommending a rollback and follow-up fix.

Between incidents, the same system scans past outages and infrastructure to suggest preventative work — code hardening, better retries, alarm tuning — that teams rarely have time to prioritize on their own, and that’s the most important part. Reducing downtime is great, but avoiding it altogether is a big reason why we’re here.

Looking ahead, I think developers will learn to wear all sorts of other hats — the operator, product manager, customer support — while agents take on their routine tasks. The most valuable work becomes problem-solving and ensuring systems are built right and serve the right purpose.

Other things won’t change at all. “If you build it, you run it” still applies, even when an agent wrote part or all of the code. Developers will still own production and post‑incident retrospectives that focus on how to prevent issues. Some parts — like data collection, impact analysis, root cause analysis — get faster with agents doing the legwork, but developers still direct the investigation, decide the real fixes and share those lessons across teams.

Twenty years ago, the big shift was turning infrastructure into services, so developers didn’t have to think about racking servers or babysitting databases. In this new era, the move is turning our best practices, operational experience and security expectations into specs and agents that can execute them consistently, at any scale. The lesson from the first two decades still applies: The pain you tolerate today is the platform someone else will build tomorrow — only now, agents give us a much faster way to close that gap.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?