[D] Are there REAL success stories of autonomous AI dev agents working reliably in production?

I’m having a serious debate with a colleague, and I want to settle this with actual evidence instead of opinions.

The claim:

That it’s possible today to run orchestrated AI developer agents (multiple agents, coordinated workflows) that can autonomously build and maintain software — under supervision of a senior AI/dev — without running into unfixable errors or constant breakdowns.

I’m skeptical. He believes it’s already happening.

So I’m looking for real-world examples, not theory:

- Have you actually used autonomous dev agents in production?

- What was the setup? (tools, stack, orchestration method)

- What level of autonomy are we talking about?

- What still breaks?

- Did it scale beyond small experiments or toy projects?

Especially interested in:

- Multi-agent setups (not just Copilot-style assistance)

- Systems that run for extended periods (not one-off demos)

- Cases where human input is minimal but still controlled

If you’ve seen this work (or fail), I’d really appreciate detailed insights.

Trying to separate hype from reality here.

submitted by /u/MegaMillyMansion
[link] [comments]

[D] Are there REAL success stories of autonomous AI dev agents working reliably in production?

Want to read more?

Tagged with