Authors: Ohad Elhelo, Ori Cohen, Co-Founders
Task-oriented dialogue powers every conversational interaction that results in real-world action, from booking flights and processing payments to managing insurance claims and executing trades. Every scheduling, payment, and claim in the economy depends on one missing capability in AI: task-oriented dialogue. However, task-oriented dialogue requires three capabilities simultaneously:
Together, these three capabilities enable something fundamental: behavioral certainty—when you define how an agent should behave, it behaves that way. Every time. A bank needs certainty that refunds over $200 always trigger ID verification. An airline needs certainty that business class upgrades are always offered before economy. A fashion retailer might need out-of-stock items to always trigger similar recommendations, while a luxury brand needs the same scenario to always show pre-order links instead.
These aren’t preferences. They’re requirements. And no approach has delivered this certainty.
This is why, despite three years and billions in investment, task-oriented dialogue remains largely undeployed. The best transformer-based agents achieve 60% task completion. Most hover below 25%.
Larger models won’t close this gap. The problem isn’t scale; it’s the absence of an authoritative state and the right computational architecture.
Transformers generate statistically plausible text through pattern matching. This revolutionized open-ended dialogue where plausibility equals success.
Task-oriented dialogue demands something transformers cannot provide: stateful reasoning over an explicit, typed symbolic state. When booking a flight, generating “I’ve booked your flight” means nothing without actually reserving seats, charging cards, and issuing tickets. These require maintaining a typed symbolic state, making deterministic decisions from that state, and guaranteeing coordination with external systems—capabilities that do not emerge from token prediction alone.
More fundamentally, transformers cannot guarantee behavioral consistency. Ask an LLM agent to ‘always offer insurance before payment’ and it might—usually. Configure Apollo-1 with that rule in the System Prompt, and it will—with certainty. This difference between ‘probably’ and ‘certainly’ is why task-oriented dialogue remains undeployed despite billions in investment.
This isn’t a scaling limitation. It’s architectural: token predictors lack native control flow and explicit state representation. LLM agents approximate intent through probability. Task-oriented dialogue requires certainty of behavior in key interactions.
In 2017, we began solving and encoding millions of real-user task-oriented conversations into structured data, powered by a workforce of 60,000 human agents. The core insight wasn’t about data scale; it was about what must be represented: task-oriented dialogue requires two kinds of knowledge used together:
Training a transformer on multi-turn transcripts can capture conversational style, but it won’t teach the model how to handle interactions correctly. Datasets are one-dimensional and stateless—surface text without explicit state, structure, or guarantees.
To compute correctly over both, we needed a representation that separates structure from context while still carrying each. So we constructed a symbolic language that encodes procedural roles and descriptive facts, giving the model a typed symbolic state it can reason over with certainty.
In parallel, we observed that across use cases and domains—selling shoes, booking flights, processing loans—task-oriented dialogue follows universal procedural patterns. Food delivery, claims processing, and order management share similar procedural structures, such as parameter and constraint extraction, intent identification, policy validation, etc.
Next, for the actual computation, we developed the Neuro-Symbolic Reasoner, a cognitive core that predicts the next task action from the current symbolic state, as opposed to token prediction. It maintains explicit state, enforces guarantees, and ensures tool invocations are structured, not guessed by token sampling.
Together, the symbolic language and the reasoner form Apollo-1: the domain-agnostic foundation model for task-oriented dialogue.
Apollo-1’s breakthrough is stateful neuro‑symbolic reasoning: a computation built explicitly for task-oriented dialogue. Rather than forcing transformers—designed to predict statistically likely words—into deterministic execution roles, Apollo-1 places stateful neuro‑symbolic reasoning at its core, predicting the next action from the current symbolic state to deliver reliable, repeatable outcomes with certainty.
The Neuro-Symbolic Reasoner operates on symbolic structures—intents, constraints, parameters, actions—that remain constant across domains, while neural modules continuously enrich those structures with semantic nuance.
Architecture: encoder–stateful reasoning loop–decoder
The key insight: neural modules handle context, symbolic modules handle structure, Apollo-1 is a unified model. The symbolic state represents both procedural progress (what step we’re on) and descriptive facts (what we know). Neural components interpret language and enrich understanding; symbolic components ensure reliable execution. Perception is probabilistic, but given the same state, the Reasoner always makes the same decision, delivering the behavioral certainty that task-oriented dialogue requires and making task execution reproducible, auditable, and steerable.
The complete technical paper—including architectural specifications, formal proofs, high-level procedural ontology construction, evaluation methodologies, and turn-closure semantics—will be released alongside general availability in November 2025. [Request early access to the technical paper]
Augmented Intelligence (AUI) Inc. – Patents Pending
Apollo-1 is the first foundation model built not to be used as an agent, but to let every organization create its own task-oriented conversational agents. Apollo-1 ships with a Playground where any task-oriented dialogue use case can run from the System Prompt alone. The System Prompt exposes a symbolic interface that the model’s stateful neuro-symbolic loop executes against.
The System Prompt isn’t mere configuration, it’s a behavioral contract. You define exactly how your agent must behave in situations of interest. Apollo-1 guarantees those behaviors will execute.
Configured via the System Prompt:
This is behavioral certainty in practice: When a food ordering app configures ‘if allergy mentioned, always inform the restaurant,’ that safety protocol executes—always. When a telecom provider configures ‘third failed payment attempt triggers service suspension,’ that policy enforces—without exception. When an insurance company configures ‘claims over $10,000 require two approvals,’ that workflow completes—every time. Not usually. Not probably. With certainty.
Airlines to insurance, retail to healthcare: same foundation model, different System Prompts. Production behavior on day one—often within hours—comes via the System Prompt; ongoing fine-tuning and System Prompt optimization deliver compounding gains and fine-grained control across conversational scenarios and tool invocations.
Conversational AI was never one problem. It was always two.
The first half—open-ended conversation—is solved brilliantly by transformers. ChatGPT writes and codes. Claude explains and analyzes. Gemini creates and explores. When the goal is creative, informative, or exploratory dialogue, statistical plausibility is exactly right. Whether generating Python functions, crafting emails, or explaining quantum physics, transformers excel because plausible variation creates value.
The second half — task-oriented dialogue—requires behavioral certainty. Apollo-1 provides it. When the goal is booking a stay, processing payments, or managing claims, you need certainty that your defined policies, procedures, and brand experiences will execute exactly as specified. Probability isn’t enough when real money, real appointments, and real customer relationships are at stake.
Transformers are architecturally designed for open-ended dialogue; their attention mechanisms and probabilistic generation create the variation and creativity these conversations require. Apollo-1 is architecturally designed for task-oriented dialogue; its stateful neuro‑symbolic reasoning and symbolic state management provide the reliability and guarantees task execution demands.
Transformers optimize for creative probability. Apollo-1 optimizes for behavioral certainty. Together, they form the complete spectrum of conversational AI.
Test / Benchmark | Apollo‑1 | Best LLM Agent | Δ |
τ‑Bench‑Airline (toughest public benchmark)* | 90.8–92.5 % | Claude‑4 60 % | +51% |
Google Flights – 111 live booking chats | 83 % | Gemini 2.5‑Flash 22 % | +277% |
Amazon Retail – 120 live shopping chats | 90.8 % | Rufus 16.7 % | +444% |
These are order-of-magnitude reliability differences. Apollo-1 handles complete customer journeys end-to-end.
Every conversation that drives economic activity becomes automatable:
With certainty of execution, enterprises can finally trust conversational agents with customer interactions because they have certainty that:
While open-ended conversation enhances productivity, task-oriented dialogue is the productivity. Every transaction, every booking, every claim—these are the conversations that run the economy. Now they can run automatically.
Apollo-1 is already demonstrating transformative results at scale in undisclosed programs across leading Fortune 500 organizations in critical sectors. Its modular architecture is designed for easy integration with existing generative-AI-based workflows, enabling smooth transitions without operational disruption.
Strategic go-to-market partnership with Google; General Availability in November 2025, complete with open APIs, full documentation, toolkits, rigorous evaluation methods, and new voice and image modalities. Starting November 2025, any organization—Fortune 500 to solo founder—can deploy production-ready agents within hours. The foundation model that cracked task-oriented dialogue becomes infrastructure for conversational automation.
What this delivers today:
In November 2025, reliable task-oriented dialogue becomes possible at scale.