Building a Chat System That's Actually a Workflow Engine
Most AI chat UIs are just message lists. Mine is a thin layer over a deterministic workflow engine. The UX is better for it.
Most AI chat UIs are just message lists. Mine is a thin layer over a deterministic workflow engine. The UX is better for it.

Every AI product starts with a chat interface. Message in, message out. Simple.
Then reality hits. You need the AI to check inventory before answering. You need human approval before processing a refund. You need to call three APIs, merge results, and present them coherently.
Suddenly your "chat system" is an orchestration engine wearing a chat costume.
I decided to build it that way from the start.
Most AI chat systems work like this:
User message → LLM → Response
Add tools:
User message → LLM → Tool calls → Results → LLM → Response
Add memory:
User message → Retrieve context → LLM → Tool calls → Results →
LLM → Save to memory → Response
Add approvals:
User message → Retrieve context → LLM → Tool calls → Results →
Wait for approval → If approved → LLM → Save to memory → Response
Each addition makes the code more complex. The chat handler grows. Error handling multiplies. The logic becomes a tangled mess of conditionals.
And debugging? "Why did the AI do that?" requires reading through a monolithic handler to trace the path.
Instead of building a chat system and bolting on features, I built a workflow engine and put chat on top.
User message → Trigger workflow → Execute nodes → Stream results → Chat UI
Every chat interaction is a workflow execution. The chat UI is just the rendering layer.
Even a basic "answer a question" flow is a workflow:
[Receive Message]
↓
[Retrieve Memory] → fetch relevant context
↓
[LLM Node] → reason with context + tools
↓
[Extract Entities] → identify facts to remember
↓
[Store Memory] → persist for future conversations
↓
[Respond] → stream response to chat UI
Six steps. Each explicit. Each debuggable. Each independently testable.
When the user asks something that requires real work:
[Receive Message]
↓
[Route by Intent]
├─→ Simple Q&A → [LLM Node] → [Respond]
├─→ Order Inquiry → [Check Order API] → [Format Response] → [Respond]
└─→ Refund Request → [Validate Eligibility] → [Human Gate] →
[Process Refund] → [Confirm] → [Respond]
The chat UI doesn't know about routing logic. It doesn't know about human gates. It receives a stream of events and renders them.
The chat interface renders workflow events as messages:
Text messages — Standard chat bubbles. User messages and AI responses.
Terminal messages — Execution output. When the workflow calls a tool or runs a query, the output appears in a terminal-style block.
┌─ Checking order status...
│ Order #12345: Shipped
│ Tracking: USPS 9400111899
└─ Done (0.3s)
Workflow steps — Visual indicators of progress. The user sees what's happening without seeing the internals.
✓ Understanding your request
✓ Checking inventory
→ Calculating pricing
○ Preparing response
Approval requests — When a human gate activates, the chat shows an approval card:
┌─────────────────────────────┐
│ Refund $49.99 to customer? │
│ │
│ [Approve] [Reject] │
│ Auto-reject in 5 minutes │
└─────────────────────────────┘
The workflow engine streams events. The chat UI renders them in real-time.
Event types:
text_delta → append to current message
tool_start → show "executing..." indicator
tool_result → render terminal block
node_start → update progress indicator
node_complete → check off step
human_gate → show approval card
error → show error with context
This is Server-Sent Events (SSE) from the backend. The frontend consumes the stream and updates the UI incrementally.
No polling. No "loading..." spinners that last 30 seconds. The user sees progress as it happens.
The workflow engine doesn't care about the UI. The same execution can be:
Build the engine once. Build as many interfaces as you need.
Every workflow execution produces a trace:
Execution #abc123
├─ [Receive Message] 12ms
│ Input: "What's the status of my order?"
├─ [Route by Intent] 45ms
│ Result: order_inquiry (confidence: 0.94)
├─ [Check Order API] 230ms
│ Input: order_id=12345
│ Output: { status: "shipped", tracking: "..." }
├─ [Format Response] 89ms
│ Output: "Your order has shipped! Tracking..."
└─ [Respond] 5ms
Streamed 45 tokens
Total: 381ms, Cost: $0.02
Something went wrong? Read the trace. Which node failed? What were the inputs? What was the output? Everything is visible.
Want to add memory? Add a memory node to the workflow.
Want to add approval? Insert a human gate node.
Want to add cost tracking? The workflow engine already tracks per-node costs.
Want A/B testing? Route to different workflow versions.
Each feature is a composable piece, not a code change in a monolithic handler.
Every node has the same error contract:
Success → continue to next node
Retriable error → retry with backoff (max 3)
Fatal error → stop execution, report to user
Timeout → stop execution, report with partial results
The chat UI doesn't handle errors. The workflow engine does. The chat just renders whatever the engine produces.
Beyond the main chat, I built an activity sidebar:
┌─ Recent Activity ──────────┐
│ │
│ 2:34 PM - Support workflow │
│ Status: Completed │
│ Cost: $0.03 | Duration: 2s │
│ │
│ 2:30 PM - Research workflow │
│ Status: Awaiting approval │
│ Cost: $0.45 | Duration: 12s │
│ │
│ 2:15 PM - Lookup workflow │
│ Status: Completed │
│ Cost: $0.01 | Duration: 0.5s│
└─────────────────────────────┘
Every execution is visible. Costs are transparent. Status is real-time.
This is trivial to build when the chat is backed by a workflow engine. Each execution has structured metadata. The sidebar is just a different view of the same data.
One practical detail: polling for execution status.
The full execution object can be 1-10MB (all node inputs, outputs, traces). For a status check — "is it done yet?" — that's absurd.
So I built a lightweight endpoint:
GET /executions/{id}/status
{
"id": "abc123",
"status": "running",
"current_node": "check_order_api",
"completed_nodes": 3,
"total_nodes": 6,
"error": null
}
~1KB. Fast. The chat UI polls this while streaming is in progress. The full execution data is only fetched when needed for debugging.
Not every chat needs a workflow engine behind it.
If your AI just answers questions from a knowledge base — no tools, no approvals, no multi-step reasoning — a simple chat handler is fine.
The workflow approach pays off when:
For a weekend project chatbot, this is over-engineering. For a production system customers depend on, this is the minimum.
Stop treating chat as the architecture. Chat is the UI. The architecture is the workflow engine underneath.
When you build it this way, every feature becomes a node. Every interaction is traceable. Every interface is just a different rendering of the same execution. The chat looks simple to users because the complexity lives where it belongs — in the engine.