Feb 5, 20265 min read

Building a Chat System That's Actually a Workflow Engine

Most AI chat UIs are just message lists. Mine is a thin layer over a deterministic workflow engine. The UX is better for it.

ai architecture engineering product

Chat System as Workflow Engine

Every AI product starts with a chat interface. Message in, message out. Simple.

Then reality hits. You need the AI to check inventory before answering. You need human approval before processing a refund. You need to call three APIs, merge results, and present them coherently.

Suddenly your "chat system" is an orchestration engine wearing a chat costume.

I decided to build it that way from the start.

The Problem With Chat-First Architecture

Most AI chat systems work like this:

User message → LLM → Response

Add tools:

User message → LLM → Tool calls → Results → LLM → Response

Add memory:

User message → Retrieve context → LLM → Tool calls → Results →
LLM → Save to memory → Response

Add approvals:

User message → Retrieve context → LLM → Tool calls → Results →
Wait for approval → If approved → LLM → Save to memory → Response

Each addition makes the code more complex. The chat handler grows. Error handling multiplies. The logic becomes a tangled mess of conditionals.

And debugging? "Why did the AI do that?" requires reading through a monolithic handler to trace the path.

The Alternative: Workflow-First

Instead of building a chat system and bolting on features, I built a workflow engine and put chat on top.

User message → Trigger workflow → Execute nodes → Stream results → Chat UI

Every chat interaction is a workflow execution. The chat UI is just the rendering layer.

The Workflow Behind a Simple Chat

Even a basic "answer a question" flow is a workflow:

[Receive Message]
    ↓
[Retrieve Memory] → fetch relevant context
    ↓
[LLM Node] → reason with context + tools
    ↓
[Extract Entities] → identify facts to remember
    ↓
[Store Memory] → persist for future conversations
    ↓
[Respond] → stream response to chat UI

Six steps. Each explicit. Each debuggable. Each independently testable.

A Complex Chat Interaction

When the user asks something that requires real work:

[Receive Message]
    ↓
[Route by Intent]
    ├─→ Simple Q&A → [LLM Node] → [Respond]
    ├─→ Order Inquiry → [Check Order API] → [Format Response] → [Respond]
    └─→ Refund Request → [Validate Eligibility] → [Human Gate] →
                          [Process Refund] → [Confirm] → [Respond]

The chat UI doesn't know about routing logic. It doesn't know about human gates. It receives a stream of events and renders them.

The Chat UI Layer

The chat interface renders workflow events as messages:

Message Types

Text messages — Standard chat bubbles. User messages and AI responses.

Terminal messages — Execution output. When the workflow calls a tool or runs a query, the output appears in a terminal-style block.

┌─ Checking order status...
│  Order #12345: Shipped
│  Tracking: USPS 9400111899
└─ Done (0.3s)

Workflow steps — Visual indicators of progress. The user sees what's happening without seeing the internals.

✓ Understanding your request
✓ Checking inventory
→ Calculating pricing
○ Preparing response

Approval requests — When a human gate activates, the chat shows an approval card:

┌─────────────────────────────┐
│  Refund $49.99 to customer? │
│                             │
│  [Approve]  [Reject]        │
│  Auto-reject in 5 minutes   │
└─────────────────────────────┘

Streaming

The workflow engine streams events. The chat UI renders them in real-time.

Event types:
  text_delta    → append to current message
  tool_start    → show "executing..." indicator
  tool_result   → render terminal block
  node_start    → update progress indicator
  node_complete → check off step
  human_gate    → show approval card
  error         → show error with context

This is Server-Sent Events (SSE) from the backend. The frontend consumes the stream and updates the UI incrementally.

No polling. No "loading..." spinners that last 30 seconds. The user sees progress as it happens.

Why This Architecture Wins

1. Same Engine, Multiple Interfaces

The workflow engine doesn't care about the UI. The same execution can be:

Rendered in a chat interface (customer-facing)
Displayed in a monitoring dashboard (internal)
Consumed via API (embedded in another product)
Triggered by a webhook (automated)

Build the engine once. Build as many interfaces as you need.

2. Debugging Is Trivial

Every workflow execution produces a trace:

Execution #abc123
├─ [Receive Message] 12ms
│  Input: "What's the status of my order?"
├─ [Route by Intent] 45ms
│  Result: order_inquiry (confidence: 0.94)
├─ [Check Order API] 230ms
│  Input: order_id=12345
│  Output: { status: "shipped", tracking: "..." }
├─ [Format Response] 89ms
│  Output: "Your order has shipped! Tracking..."
└─ [Respond] 5ms
   Streamed 45 tokens
Total: 381ms, Cost: $0.02

Something went wrong? Read the trace. Which node failed? What were the inputs? What was the output? Everything is visible.

3. Features Are Nodes, Not Code

Want to add memory? Add a memory node to the workflow.

Want to add approval? Insert a human gate node.

Want to add cost tracking? The workflow engine already tracks per-node costs.

Want A/B testing? Route to different workflow versions.

Each feature is a composable piece, not a code change in a monolithic handler.

4. Error Handling Is Consistent

Every node has the same error contract:

Success → continue to next node
Retriable error → retry with backoff (max 3)
Fatal error → stop execution, report to user
Timeout → stop execution, report with partial results

The chat UI doesn't handle errors. The workflow engine does. The chat just renders whatever the engine produces.

The Activity Sidebar

Beyond the main chat, I built an activity sidebar:

┌─ Recent Activity ──────────┐
│                             │
│ 2:34 PM - Support workflow  │
│ Status: Completed           │
│ Cost: $0.03 | Duration: 2s  │
│                             │
│ 2:30 PM - Research workflow │
│ Status: Awaiting approval   │
│ Cost: $0.45 | Duration: 12s │
│                             │
│ 2:15 PM - Lookup workflow   │
│ Status: Completed           │
│ Cost: $0.01 | Duration: 0.5s│
└─────────────────────────────┘

Every execution is visible. Costs are transparent. Status is real-time.

This is trivial to build when the chat is backed by a workflow engine. Each execution has structured metadata. The sidebar is just a different view of the same data.

The Lightweight Status Endpoint

One practical detail: polling for execution status.

The full execution object can be 1-10MB (all node inputs, outputs, traces). For a status check — "is it done yet?" — that's absurd.

So I built a lightweight endpoint:

GET /executions/{id}/status

{
  "id": "abc123",
  "status": "running",
  "current_node": "check_order_api",
  "completed_nodes": 3,
  "total_nodes": 6,
  "error": null
}

~1KB. Fast. The chat UI polls this while streaming is in progress. The full execution data is only fetched when needed for debugging.

When Chat Should Just Be Chat

Not every chat needs a workflow engine behind it.

If your AI just answers questions from a knowledge base — no tools, no approvals, no multi-step reasoning — a simple chat handler is fine.

The workflow approach pays off when:

Multiple steps are involved
External APIs need to be called
Human approval is required
You need execution visibility
The same logic serves multiple interfaces
Debugging production issues matters

For a weekend project chatbot, this is over-engineering. For a production system customers depend on, this is the minimum.

Stop treating chat as the architecture. Chat is the UI. The architecture is the workflow engine underneath.

When you build it this way, every feature becomes a node. Every interaction is traceable. Every interface is just a different rendering of the same execution. The chat looks simple to users because the complexity lives where it belongs — in the engine.