top of page

Why most AI chatbots fail as UX, not as AI

Here's something I've noticed after designing AI experiences across enterprise compliance, contract management, and financial services: when an AI feature fails, it's almost never the model's fault.

The model works fine. The embedding works fine. The accuracy benchmarks look good in the demo. And then real users get their hands on it — and nothing quite works the way it should.

In my experience, the failure is almost always the same: the experience around the AI is broken, even when the AI itself isn't.

The model is usually fine. The experience around it is where AI chatbots break.

Here's what "the experience around it" actually means:

 

  • Disambiguation flows. When the user's intent is ambiguous, does the AI ask a clarifying question — or confidently give the wrong answer? I've seen many enterprise AI assistants choose confident wrongness over honest clarification. That's a design decision, not a model limitation.
     

  • Fallback states. When the AI doesn't know something, what does it say? "I don't have information on that" is a design failure. A well-designed fallback is specific, helpful, and points the user somewhere useful. That requires someone to think through every failure mode before launch — which most teams don't do.
     

  • Expectation-setting. Users form mental models of what an AI can do from the first sentence they read. If that sentence overpromises — "I can help you with anything!" — every limitation that follows feels like a broken promise. The onboarding copy, the placeholder text, the capability framing: all of it is UX.
     

  • Escalation paths. When the AI can't help, can the user get to a human — or to the right resource — in one step? Most AI chatbots treat escalation as failure. The best ones treat it as part of the designed experience.
     

  • Tone calibration. In high-stakes domains — compliance, legal, finance, healthcare — a chatty, casual AI tone creates anxiety, not delight. The tone of an AI assistant is a design decision. It should match the domain and the user's emotional state, not the product team's personality preferences.

I spent over a year designing "Registro" — an embedded AI assistant for a compliance platform used by 30,000+ businesses in 190 countries. The model we built was solid. But the design work that made it trustworthy was entirely in these five areas: disambiguation, fallback states, expectation-setting, escalation, and tone. None of that is AI engineering. All of it is UX.

The teams that ship AI experiences users actually trust are the ones who treat the experience design with the same rigour as the model design.

If your AI chatbot isn't performing the way you expected, before you retrain the model — audit the experience. The answer is probably there.

What's the worst AI chatbot experience you've encountered? I'd love to hear what broke — and whether it was the model or the design.

bottom of page