We raised $20M at a $750M valuation

Blog

Apollo-1: The First Foundation Model for Task-Oriented Agents

Reliable task-oriented agents can't be built on language models. They need their own foundation. Apollo-1 is the first: a neuro-symbolic model unifying generation and control.

Pending Release

Authors: Ohad Elhelo, Ori Cohen, Co-Founders

01. Two Kinds of Agents

The word “agent” is doing double duty. Two different things are emerging under the same name.

Open-ended agents work for users. Coding assistants write code on your behalf. Computer-use agents operate your machine on your behalf. These agents do whatever you ask. Flexibility is the point. The user is the principal—the one whose goals matter. If the agent interprets your intent slightly differently each time, that’s fine. You’re in the loop. You’ll correct it.

Task-oriented agents work on behalf of entities. An airline’s booking agent. A bank’s support agent. An insurer’s claims agent. These agents serve users, but they represent the entity. The entity is the principal—the one whose policies must be enforced. The agent has to follow the entity’s rules while conversing naturally with customers.

This distinction matters because the requirements are fundamentally different.

Open-ended agents need maximum flexibility. They should do whatever the user wants, handle novel situations, figure things out. LLMs are well-suited here. Probabilistic, creative, adaptable.

Task-oriented agents need certainty. They must enforce specific behaviors in specific scenarios. “If the refund exceeds $200, always require ID.” “Always offer insurance before payment.” These aren’t suggestions; they’re requirements that determine whether AI agents can be trusted with customer interactions that involve real money, real appointments, and real business logic.

Task-oriented agents also need controllability. When an item is out of stock, does the agent offer to pre-order or show similar items? Who should decide?

LLMs can’t guarantee these behaviors. Not because they’re not smart enough. Because they’re architecturally incapable of it. They predict tokens. They approximate intent. They do what you want most of the time. Most of the time isn’t good enough when the stakes are real.

Two kinds of agents. Two different foundations required.

02. The Attempt to Make LLMs Task-Oriented

Task-oriented agents should power every interaction that results in real-world action: booking flights, processing payments, managing insurance claims, executing trades. Every scheduling, payment, approval, and claim in the economy depends on these conversations working reliably. Yet despite three years and billions in investment, task-oriented agents remain largely undeployed.

The industry has tried to make LLMs work for task-oriented agents. The approaches vary. The problem doesn’t.

Prompting: Tell the model to “always do X” and hope it complies. It usually does. Usually isn’t enough.

Fine-tuning: Train the model on task-oriented behavior. It learns patterns. Patterns aren’t guarantees.

Orchestration: Orchestration has become the default. Wrap the model in workflow frameworks. Build state machines around it. Route between prompts. The LLM inside the wrapper is still a language model. It still has no native concept of state, no mechanism for enforcing rules. The moment a user goes off-script, the system either breaks or falls back to uncontrolled LLM behavior.

For basic task-oriented scenarios—simple product discovery, straightforward FAQ responses—LLM agents can function. For high-stakes workflows with multi-step procedures and policy requirements, a different computational architecture is required.

You can’t turn a foundation model for language into a foundation model for task-oriented agents by wrapping it in infrastructure. The foundation is wrong.

03. What Task-Oriented Agents Actually Require

If you were going to build a foundation model for task-oriented agents—not adapt a language model, but build one from scratch—what would it need?

Explicit state. Task-oriented agents operate over multi-turn interactions. They need to track where they are in a process, what they know, what’s happened, what constraints apply. Language models have no native state. They rely on context windows and external memory.

Programmable behavior in critical scenarios. Entities need to define how their agent behaves in high-stakes scenarios and know those definitions will hold. “If X, always do Y.” Not probably. Not usually. Always. Language models are probabilistic by design. They generate plausible outputs, not guaranteed ones.

Native tool use. Task-oriented agents invoke external systems: booking engines, payment processors, CRMs. These invocations need to be reliable, with proper parameters, error handling, and execution guarantees. Language models sample tool calls probabilistically.

White-box reasoning. Every decision the agent makes must be traceable: which rules (if any) fired, how the state evolved, why it acted. Language models are black boxes. You see inputs and outputs, not the reasoning between.

Fluent interaction. Despite all the above, task-oriented agents still need to converse naturally. Handle unexpected inputs. Respond like humans expect. Rigidity kills user experience.

This is the core tension: you need the fluency of neural language understanding and the reliability of symbolic control. You need both, unified, not bolted together.

LLMs provide fluency without control. Symbolic systems provide control without fluency. Orchestration provides neither reliably.

Task-oriented agents need a different foundation.

04. Apollo-1

Apollo-1 is the first foundation model built for task-oriented agents.

Not a language model adapted for tasks. Not an orchestration layer around existing models. A new foundation, built from the ground up on neuro-symbolic architecture that unifies language fluency with behavioral control.

Apollo-1 combines neural modules that understand and generate natural language with symbolic modules that maintain state, enforce rules, and guarantee execution. The neural components interpret meaning, handle ambiguity, produce fluent responses. The symbolic components track state, apply logic, ensure defined behaviors execute exactly as specified.

The agent understands language like an LLM. It enforces behavior like a state machine. One model. Both capabilities. Native to the architecture.

You don’t program every interaction, only the ones that matter. For everything else, the agent defaults to intelligent, fluent, common-sense conversation. When you’ve defined specific behaviors, they execute with certainty. When you haven’t, the agent thinks for itself.

Because the reasoning is neuro-symbolic, it’s white-box. Every decision is traceable and auditable. And because each step is explicit, you can give feedback on specific parts of the reasoning, not just final outputs.

Apollo-1 is domain-agnostic and use-case-agnostic. The same foundation model powers auto repair scheduling, insurance claims, retail support, healthcare navigation, and financial services—without any domain-specific rebuilding. The symbolic structures that enable control are universal. Same model, different System Prompts.

This is what it means to be a foundation model for task-oriented agents: the core capabilities that task-oriented agents require—state, control, native tool use—are built into the foundation, not wrapped around it.

05. Eight Years to Build the Solution

In 2017, we began solving and encoding millions of real-user task-oriented conversations into structured data, powered by a workforce of 60,000 human agents. The core insight wasn’t about data scale; it was about what must be represented.

We found out that task-oriented conversational AI requires two kinds of knowledge working together:

Procedural knowledge — roles, constraints, flows, policies
Descriptive knowledge — entities, attributes, domain content

Training a transformer on multi-turn transcripts can capture conversational style, but it won’t teach the model how to handle task-oriented interactions correctly. Datasets are one-dimensional and stateless. Without explicit state, how is the model supposed to learn procedural knowledge?

To compute reliably over both kinds of knowledge, we needed a representation that separates structure from context while carrying each. We constructed a symbolic language that encodes procedural roles and descriptive facts, giving the model a typed symbolic state it can reason over.

In parallel, we observed that across use cases and domains—selling shoes, booking flights, processing loans—task-oriented dialogue follows universal procedural patterns. Food delivery, claims processing, and order management share similar procedural structures: parameter extraction, constraint validation, intent identification, policy enforcement, state-dependent branching, etc.

The key insight: if we could create a unified model where neural modules handle context and symbolic modules handle structure, we’d solve the problem on its head. Of course, it’d have to work agnostically across domains and use-cases, capable of symbolically representing any scenario in every task-oriented conversational use case.

For the actual computation, we developed the Neuro-Symbolic Reasoner, a cognitive core that computes next actions from the current symbolic state, as opposed to predicting the next token. While neural modules assist in the translation to and from the symbolic language, symbolic modules maintain explicit state, enforce guarantees, and ensure that tool invocations are structured rather than probabilistically sampled.

Together, the symbolic language and the reasoner form Apollo-1: the foundation model for task-oriented agents.

06. How It Works (at a glance)

Apollo-1’s breakthrough is stateful neuro-symbolic reasoning: a computation built explicitly for task-oriented conversational AI.

Apollo-1 achieves generalization through a fundamental principle: structure-content separation.

The Neuro-Symbolic Reasoner operates on symbolic structures—intents, constraints, parameters, actions—that remain constant across domains, while neural modules continuously enrich those structures with semantic nuance.

Architecture: encoder–stateful reasoning loop–decoder

Domain-Agnostic Encoder: Translates natural language into symbolic state using both procedural and descriptive knowledge.
Stateful Reasoning Loop (iterates until turn completion):
- Neuro-Symbolic State Machine maintains symbolic state
- Symbolic Reasoning Engine computes next actions from state
- Neuro-Symbolic Planner creates executable plans
Domain-Agnostic Decoder: Generates natural language from final state

Apollo-1’s neuro-symbolic design unifies neural modules that understand context with symbolic modules that enforce structure.

The symbolic state represents both procedural progress (what state we’re in) and descriptive facts (what we know). Neural components interpret language and enrich understanding; symbolic components ensure reliable execution. Perception is probabilistic, but given the same state, the Reasoner always makes the same decision, delivering the behavioral guarantees that task-oriented agents require and making task execution reproducible, auditable, and steerable.

The Symbolic Reasoning Engine is a deterministic, rule-based engine, based on the procedural logic learned from years of solving and encoding millions of multi-turn task-oriented conversations with human agents, relying on a reputation system that ranks their turn outputs based on peer feedback.

The complete technical paper—including architectural specifications, formal proofs, procedural ontology samples, evaluation methodologies, and turn-closure semantics—will be released alongside general availability. [Request early access to the technical paper]

Augmented Intelligence (AUI) Inc. – Patents Pending

07. Programming Behavior in High-Stakes Scenarios

Apollo-1 ships with a Playground where any use case runs from the System Prompt alone. The System Prompt isn’t configuration. It’s a behavioral contract.

Via the System Prompt, you define tools and policies in a specification that is immediately compiled into Apollo-1’s typed symbolic language, producing explicit, machine-checkable representations of intents, parameters, constraints, policies, tool schemas, and pre-/post-conditions.

You define how your agent must behave in scenarios that matter. Apollo-1 guarantees those behaviors execute. For everything else, the agent remains conversationally intelligent throughout. It handles unexpected inputs, maintains context, and responds naturally.

Via the System Prompt, you specify:

State-dependent rules: “If refund > $200, require ID verification”
Behavioral sequences: “Always offer insurance before processing payment”
Escalation logic: “Third failed payment attempt triggers human handoff”
Tool specifications: Required fields, pre- and post-conditions, failure states
Terminal states: How and when interactions conclude

When a food ordering app specifies “if allergy mentioned, always inform the restaurant,” that protocol executes. Always.

When a telecom provider specifies “third failed payment triggers escalation,” that policy enforces. Without exception.

When an insurance company specifies “claims over $10,000 require two approvals,” that workflow completes. Every time.

Control where you need it. Intelligence where you don’t. The agent is never stuck like a rigid workflow when users go off-script. But when your defined scenarios occur, your defined behaviors fire.

08. What Apollo-1 Isn’t For: Architecture as Choice

Apollo-1’s architecture makes deliberate trade-offs. By optimizing for task-oriented agents, we’ve built a model that intentionally doesn’t compete in other domains, and that’s by design.

Open-Ended Creative Work
Apollo-1 isn’t designed for creative writing, brainstorming sessions, or exploratory dialogue where variation creates value. For drafting marketing copy, generating story ideas, or exploring hypothetical scenarios, transformers remain the superior architecture. Our symbolic structures enforce consistency; creativity often requires the opposite.

Code Generation & Software Development
While Apollo-1 can integrate with code execution tools in task-oriented workflows, it doesn’t offer state-of-the-art code generation. Transformers trained on massive code repositories excel at synthesizing programming patterns, autocompleting functions, and explaining algorithms. Apollo-1’s symbolic language is purpose-built for task execution, not software development.

Low-Stakes, High-Variation Scenarios
When conversational variety enhances user experience—customer engagement campaigns, educational tutoring with adaptive responses, entertainment chatbots—probabilistic variation is often preferable to deterministic certainty. Apollo-1’s guarantees become constraints when flexibility is the goal.

09. Early Deployments & Results

Apollo-1 is deployed in production at Fortune 500 organizations. Partnerships to power consumer-facing AI at some of the world’s largest companies in retail, automotive, and regulated industries will be announced alongside general availability.

Organizations testing Apollo-1 against their existing systems—some built over years with teams of thousands—are seeing the same pattern: order-of-magnitude improvements in task completion rates.

Benchmark Performance

Test / Benchmark	Apollo‑1	Best LLM Agent	Δ
τ‑Bench‑Airline	90.8–92.5 %	Claude‑4  60 %	+51%
Google Flights – 111 live booking chats	83 %	Gemini 2.5‑Flash 22 %	+277%
Amazon Retail – 120 live shopping chats	90.8 %	Rufus  16.7 %	+444%

Explore detailed evaluation scenarios, trajectories, and reward logs

10. Two Foundations for Two Futures

Open-ended agents and task-oriented agents are both important. They’re not the same problem.

Open-ended agents will keep getting better as LLMs improve. More capable models, better tool use, longer context, stronger reasoning. They will remain the better choice for agents that work on behalf of users. This is the road toward general AI assistants that do whatever you need.

Task-oriented agents require a different foundation. Controllability isn’t a feature to add. It’s the core requirement that determines whether an entity can trust an agent with its customers. LLMs weren’t built for this. Orchestration can’t solve it. The foundation has to be different.

Apollo-1 is that foundation. Being neuro-symbolic, it benefits from improvements to LLMs.

LLMs are foundation models for language. Apollo-1 is the foundation model for task-oriented agents. Each serves its purpose. Each unlocks what the other can’t.

11. What this means

Every conversation that drives economic activity becomes reliably automatable.

Booking systems that complete reservations. Claims processing that adjudicates correctly. Customer service that resolves issues. Transaction systems that execute trades.

With guarantees of execution, enterprises can finally trust conversational agents with customer interactions because they have certainty that:

Their exact policies will be enforced as defined
Their specific business logic will execute as configured
Their unique brand experience will manifest as designed
Their interactions with customers will be fully documented and explainable

While open-ended agents enhance productivity, task-oriented agents are the productivity. Every transaction, every booking, every claim; these are the conversations that run the economy. Now they can run automatically.

12. General Availability

Apollo-1’s architecture integrates seamlessly with existing Generative AI workflows and adapts to any API or external system, with no need to change endpoints or preprocess data. It launches with native connectivity with all major platforms (Salesforce, HubSpot, Zendesk, etc.), and full MCP support. Strategic go-to-market partnership withGoogle.

General Availability December 2025, complete with:

Open APIs
Full documentation and toolkits
Rigorous evaluation methodologies
Voice and image modalities

13. Conclusion

Two kinds of agents need two kinds of foundations.

Open-ended agents—working for users, maximizing flexibility—run on language models. That path continues.

Task-oriented agents—working on behalf of entities, enforcing policies in critical scenarios—need a foundation built for control. LLMs can’t provide it. Orchestration can’t force it.

Apollo-1 is the first foundation model for task-oriented agents. Neuro-symbolic architecture that unifies fluency with control. Agents you can actually program. Behaviors that actually execute.

Language models were never going to solve this. Task-oriented agents need their own foundation.

Now they have one.

Back