Fromdemotoproduction:whatbuildingrealAIchatbotstakes
My day job is building an AI chatbot product family: conversational assistants and lead-capture widgets that businesses embed on their own sites, answering visitors from the company's own content. I work across the whole thing, the retrieval pipeline, the LLM integration, the APIs and the interfaces, alongside an external dev team. The single biggest lesson so far is how wide the gap is between a chatbot that demos well and one that holds up in production.
A chatbot is easy to demo and hard to ship
Wiring a language model to a chat box takes an afternoon, and it will even sound impressive. The hard part starts when real users ask real questions: about products that changed yesterday, in ways you did not anticipate, expecting answers that are actually true. A demo has to work once. A production bot has to be right, or honest about not knowing, thousands of times a day, on someone else's business.
Retrieval is the real product
The model is a commodity. What makes a bot useful is what you feed it. The pattern is retrieval-augmented generation (RAG): instead of hoping the model memorised a customer's business, you index that customer's own content, crawl their site and documents, split it into chunks, turn each chunk into an embedding, and store those in a vector database. At question time you find the passages closest in meaning to what was asked and hand them to the model as context. Most of the engineering effort, and most of the quality, lives in that pipeline, not in the prompt.
Grounding, so the model stops guessing
Left alone, a language model will happily invent a confident, wrong answer. The whole point of retrieval is to keep it anchored to real material, and the rest is instructing it to lean on that material, to admit when it does not know, and to stay in its lane. Getting that behaviour consistent, across vague questions, hostile questions and questions the content simply does not cover, is an iterative craft of its own: change something, test against a batch of real questions, measure, repeat.
The unglamorous 80%
The demo is the model. The product is everything around it. In practice most of the work is the parts nobody posts screenshots of:
- Ingestion that survives messy real-world content: PDFs, awkward HTML, sites that change under you.
- Keeping the index fresh as customers update their material, without re-processing everything every night.
- Background jobs and queues, so a slow crawl or a big re-index never blocks a live conversation.
- Handling the model's bad days: rate limits, timeouts and oversized inputs, all turned into a graceful reply instead of a crash.
- Cost and latency, because every answer is a metered API call and users will not wait.
None of that shows up in a demo. All of it decides whether a business keeps the bot switched on.
Full-stack, because AI features do not stop at the model
This is why I think of it as full-stack work rather than 'AI' work. The retrieval pipeline is a data problem. Serving it reliably is a backend problem. Letting customers configure their own bot, tune how it answers and review real conversations is a frontend problem. And embedding a widget that loads fast on a stranger's website, without breaking their layout, is the same craft I brought from years of client web work. The model is one component. Turning it into something people trust is the actual job.
The interesting frontier right now is not a cleverer model, it is the engineering around it: retrieval that stays accurate, systems that stay up, and interfaces that make all of it feel simple. That is the part I get to build, and it is where the real work is.