We raised $20M at a $750M valuation

Blog

A Second Kind of Reasoning

Apollo-1 is the first foundation model for neuro-symbolic agents. Its reasoning was built through symbolic representation, not deep learning, shaped from the ground up to work on behalf of companies, not users.

Authors: Ohad Elhelo, Ori Cohen, Co-Founders

01. Summary

For the last decade, one kind of system could reason over open-ended language: the language model. It predicts the next token, and does so remarkably — powering the generative AI that now helps users think, employees work, and developers ship. Language models are not going anywhere. They keep doing what they do.

Apollo-1 is a second kind of system. It is a neuro-symbolic foundation model — reasoning built through symbolic representation, not deep learning — designed to handle open-ended conversation and enforce business rules in one forward pass. The same computation that writes the sentence checks the rule. This is the architecture task-oriented agents were missing. For more than three years, the industry has tried to deliver the agents that book a flight, file a claim, process a payment, authorize a return — and it hasn’t. The architecture puts conversation and rules in separate systems, and no company will hand interactions that move money, carry legal weight, or touch customer trust to a system that treats its own policies as suggestions.

Apollo-1 puts them in the same model architecture. Business rules become a first-class object — readable and editable by the people whose policies they are, enforced formally at runtime. This is the kind of agent a company can send into a conversation. A different kind of model, for a different kind of agent, running on a different reasoning framework, doing the work language models were never built to do.

02. Two Kinds of Agents

Two different things are emerging under the name “agent,” and they are not variations of the same thing. They are different objects, with different principals, built for different jobs.

Open-ended agents work for users. Coding assistants, computer-use agents, personal AI. You are the principal. If the agent interprets your intent slightly differently each time, that is fine — you are in the loop and you will correct it. Flexibility is the point.

Task-oriented agents work on behalf of companies. An airline’s booking agent. A bank’s support agent. An insurer’s claims agent. These agents serve users, but they represent the company. The company is the principal — the one whose policies must be enforced.

Task-oriented agents require logical reasoning: the ability to evaluate conditions against state and produce guaranteed outcomes. The ticket won’t be cancelled unless the passenger is Business Class and Platinum Elite. The payment won’t process without explicit confirmation. The refund won’t be issued if required documentation is missing.

These are not preferences. They are requirements that determine whether AI can be trusted with interactions involving real money, real appointments, and real consequences.

But task-oriented agents also need to reason over open-ended language. Users don’t follow scripts. They ask unexpected questions, change their mind, go off on tangents. The agent has to reason over what they say while enforcing the rules that must hold.

That combination — reasoning over open-ended language and logical reasoning over state — is the hard problem. It needs both capabilities operating together inside one model. Not two systems trading messages. One architecture, running both.

Generative AI works for users. It follows their prompts. Neuro-Symbolic AI works for companies. It converses with users, but follows business rules.

03. Why Current Approaches Struggle

Every conversation that results in real-world action — booking trips, processing payments, managing claims, executing trades — could be automated. These interactions run the economy. The market for task-oriented agents dwarfs what open-ended assistants will ever capture.

And yet no company will trust AI with high-stakes interactions when the best guarantee is probabilistic. The industry has converged on two approaches. Both fail for the same structural reason: neither treats business rules as a first-class object.

Orchestration Frameworks

Orchestration wraps LLMs in workflow systems: state machines, routing logic, branching conditions. The state machine reasons. The LLM converses. Two systems, each doing one job.

The problem is that these systems don’t share understanding. A user is mid-payment and says “wait, what’s the cancellation policy before I pay?” No transition was coded for this. The system either breaks, gives a canned response, or forces the user back on script. You add a branch. Then users ask about refunds mid-payment. Or shipping. Or they change their mind. Real deployments accumulate hundreds of branches and still miss edge cases.

The alternative is handing off to the LLM — but the LLM has no understanding of where you are in the flow, what logic applies, what state has accumulated. It might process the payment without confirmation because it is predicting tokens, not reasoning from state.

Conversation and reasoning are inversely correlated. The tighter the state machine, the worse the user experience. The more you rely on the LLM, the less you can trust the behavior. And there is no model underneath — every branch, every condition is coded by hand. Each workflow is its own silo. When business rules change, you update them in multiple places.

Orchestration gives you reasoning over a flowchart. It does not give you an agent that reasons.

Function-Calling LLM Agents

Function-calling agents take the opposite approach: give the LLM access to tools and let it decide when to call them. Natural conversation works. But the LLM is still the decision-maker, and its decisions are sampled from a probability distribution, not computed from state.

You can make unwanted tool calls less likely through prompting, fine-tuning, or output filtering. You cannot make them impossible. The LLM might call the refund function without verifying documentation. It might skip the confirmation step. It might invoke a tool with incorrect parameters. These are not bugs. They are inherent to the architecture.

Validation layers help — check the tool call before executing, reject if conditions aren’t met. But validation is reactive. The agent already decided to take the action; you are just blocking it after the fact. And the validation logic is coded per tool, not derived from any shared model of the domain.

The Gap

Business rules are not a first-class object in either architecture. In orchestration, a rule is a branch — part of a flow, not part of a model. In function-calling, a rule is a sentence in a system prompt — advisory, soft, forgettable. Neither architecture gives the runtime something it can actually hold. And the symptom of that absence is the split the two approaches share: the state machine has no model of language, the LLM has no model of state. What is needed is a single architecture where the same computation that handles an unexpected question is the one that evaluates the rule.

04. Eight Years to Build

In 2017, we began solving and encoding millions of real-user task-oriented conversations into structured data, powered by a workforce of 60,000 human agents. The core insight wasn’t about data scale; it was about what must be represented.

Task-oriented conversational AI requires two kinds of knowledge working together:

Descriptive knowledge — entities, attributes, domain content.
Procedural knowledge — roles, logic, flows, policies.

Training a transformer on multi-turn transcripts can capture conversational style, but it won’t teach the model how to handle critical interactions correctly. Datasets are one-dimensional and stateless. Without explicit state, how is the model supposed to learn when to block an action versus when to allow it?

We needed a representation that separates structure from context while carrying each. We constructed a symbolic language that encodes procedural roles and descriptive facts, giving the model a typed symbolic state it can reason over and evaluate logic against.

In parallel, we observed that across every domain we studied — selling shoes, booking flights, processing loans — task-oriented dialogue follows universal procedural patterns. Food delivery, claims processing, and order management share the same structures: parameter extraction, logic evaluation, intent identification, policy enforcement, state-dependent branching.

The key insight: task-oriented dialogue has a finite number of universal states. They can be mapped.

We built a unified model where neural modules handle context and symbolic modules handle structure, operating together rather than in sequence. For the computation itself, we developed the Neuro-Symbolic Reasoner — a cognitive core that computes next actions from the current symbolic state, as opposed to predicting the next token. Neural modules translate to and from the symbolic language. Symbolic modules maintain explicit state, evaluate logic, and ensure tool invocations are structured rather than probabilistically sampled.

Two pieces arrived from outside our own work. The first, around 2021, was the leap in language models. Our NLU, NLP, and NLG stack had been built on pre-transformer foundations; modern LLMs made it dramatically stronger, and we integrated them as Apollo-1’s neural modules. Language stopped being the bottleneck.

The second, in mid-2025, was coding agents. An Apollo-1 agent’s configuration is a symbolic description of its rules, tools, and flows, written as structured JSON. Until mid-2025, authoring and maintaining that description required humans fluent in both Apollo-1’s schema and its internal logic. Then coding agents became capable enough to author rules, map tool responses, and evolve configurations in natural language — acting as text-to-DSL compilers. The last structural argument against neuro-symbolic AI at scale closed in 2025.

Together, the symbolic language and the Neuro-Symbolic Reasoner form Apollo-1. With coding agents capable enough to operate over its representation, Apollo-1 could be built and maintained at scale.

05. Apollo-1

Apollo-1 is the first foundation model built on neuro-symbolic architecture. It is not a language model adapted for reasoning. It is not an orchestration layer around existing models. It is a new foundation — where deep learning gave us reasoning over open-ended language, neuro-symbolic architecture gives us reasoning over open-ended language and business rules, in the same computation.

Generation and enforcement in one forward pass. The same computation that writes the sentence checks the rule.

There is no moment at which the model could choose to break a rule, because there is no moment at which rules are separate from generation. The neural and symbolic components operate together on the same representation in the same computational loop: interpreting language, maintaining state, evaluating logic, and generating responses as one integrated process. The neural components handle conversation. The symbolic components handle logical reasoning. Both are native to the architecture, not glued on.

In practice: when a user asks about cancellation policy mid-payment, the neural side understands the question naturally while the symbolic side maintains the payment flow. State is explicit, not inferred from context. The rule “don’t process payment without confirmation” holds absolutely. No branch was coded for this. No handoff occurred.

When you define business rules about refund authorization, the model understands how they relate to customer status, order history, and documentation requirements — not because you coded those connections, but because the ontology is part of the model’s representation. When you have defined specific business rules, they hold absolutely. When you haven’t, the agent converses naturally and thinks for itself.

Because the reasoning is neuro-symbolic, it is white-box. Every decision is traceable and auditable — and because the symbolic structures live in real files, they are addressable in language.

Universal by Design

Apollo-1 is domain-agnostic and use-case-agnostic. The same model powers auto-repair scheduling, insurance claims, retail support, healthcare navigation, and financial services — without rebuilding logic per workflow or manual ontology creation. The symbolic structures (intents, logic, parameters, execution semantics) are universal primitives. This is what makes Apollo-1 a foundation model: not scale, but representational generality. Same model, different system prompt.

06. How It Works

The principle behind Apollo-1 is structure–content separation. Its symbolic representation carries structure — roles, relations, state transitions — as a universal schema, and holds descriptive knowledge as values bound to that schema. Any value can occupy any field: the symbolic layer knows where a piece of content sits in this turn’s state, not what it means in the world. The symbolic layer doesn’t need to know. Understanding content — parsing language, resolving ambiguity, mapping an utterance onto the right field — happens in the neural modules.

Because the structure itself is meaning-free, the set of universal states is finite. Novel inputs are approximated to those states at inference. This is the generalization mechanism: the Reasoner operates on symbolic structures — intents, logic, parameters, actions — that remain constant across every instance, while the values that populate them change turn to turn and domain to domain. Classical symbolic AI tried to encode meaning into the structure itself, which forced ontologies to represent the world — which proved impossible.

Neural modules handle content: perception, generation, and the language that binds symbols to their values. Symbolic modules handle structure: state, logic, execution. The two operate together on the same representation, not in sequence.

Architecture: encoder, stateful reasoning loop, decoder

Domain-Agnostic Encoder. Parses natural language into typed symbolic objects — entities, their attributes, their relational roles — forming the initial symbolic state.
Stateful Reasoning Loop (iterates until turn completion):
- Neuro-Symbolic State Machine maintains symbolic state, representing both procedural progress (what state we are in) and descriptive facts (what we know).
- Symbolic Reasoning Engine computes next actions from state.
- Neuro-Symbolic Planner compiles executable plans.
Domain-Agnostic Decoder. Generates natural language from final state, filling symbolic placeholders.

Within the loop, neural and symbolic modules operate jointly. The State Machine and Planner are neuro-symbolic by construction — symbolic in their operation, neural in the ways they bind symbolic objects to the specifics of a given turn (for example, forming an exact API query from a typed tool call).

Perception is probabilistic. Action selection is not. Given the same state, the Reasoner makes the same decisions, and every decision in a trace can be reproduced from the state that produced it. Apollo-1’s end-to-end outputs are not deterministic — perception runs through the neural modules, and two phrasings of the same request can produce different initial states. Once a state is formed, the logic over it is fixed. Failures in perception surface as task failure; they do not surface as policy violation.

The Symbolic Reasoning Engine is a formal, rule-based engine. Its procedural logic is not learned — it is the reasoning we taught the model. We built that logic over years, by dissecting millions of multi-turn task-oriented conversations with human agents into their symbolic elements, with a reputation system that ranked contributions by peer review. It surfaced consensus where it existed, flagged the gaps where it didn’t, and drove the iterations that filled them. The Symbolic Reasoning Engine is what that process exposed. It is also what gave the company its name: Augmented Intelligence, refined through cybernetic feedback loops.

Augmented Intelligence (AUI) Inc. Patents Pending.

07. Rules as a First-Class Object

Business rules are how a company operates — the policies, conditions, and constraints under which it acts. In function-calling LLM Agents, business rules are sentences in a system prompt: advisory, soft, forgettable. In orchestration frameworks, they are branches in a state machine: rigid, siloed, coded by hand. In Apollo-1, they are a first-class object — symbolic structures that the runtime enforces formally, and that the people who own those policies can read, edit, and audit.

When you define your tools, Apollo-1 automatically generates an ontology: a structured representation of your entities, parameters, and relationships. This ontology is shared across all your tools and interactions. Business rules defined once apply everywhere they are relevant.

Defining Business Rules

From the ontology, you define business rules for the scenarios where the agent must reason formally. Apollo-1 supports a growing set of rule types, each enforced symbolically at runtime:

Policy Rules. Business rules the agent must enforce unconditionally. These are the hardest guarantees — actions that must be blocked or required regardless of conversational context.

“Block disputes for transactions older than 8 days.”
“Never process a plan downgrade during an active billing dispute.”
“Block wire transfers to accounts not in the pre-approved list.”

Confirmation Rules. Actions that require explicit user consent before execution. The agent must pause, present the action it’s about to take, and wait for affirmative confirmation. It cannot proceed on implication or assumption.

“Require confirmation before processing payment.”
“Confirm cancellation terms before cancelling a subscription.”
“Show the user the full cost breakdown and get explicit approval before booking.”

Authentication Rules. Actions that require identity verification before execution. The agent must verify the user’s identity through a specified method before proceeding with sensitive operations.

- “Require ID verification for refunds over $200.”
- “Verify account ownership before changing billing information.”
- “Require two-factor confirmation before processing address changes on active policies.”

How Enforcement Works

Logic enforcement is formal; perception is not. This is a critical distinction.

When the Symbolic Reasoning Engine evaluates a rule, the evaluation is deterministic. If the predicate says today - txn.date <= 8 days and the transaction is 9 days old, the action is blocked. Every time. The agent doesn’t weigh the pros and cons. It doesn’t approximate. The rule fires or it doesn’t.

Perception — understanding what the user is asking — remains probabilistic, handled by the neural components. The system may misunderstand what is being requested. But it won’t “decide” to skip a required step or “forget” a policy mid-conversation. Misclassification affects whether an action is attempted, not whether business rules are enforced. If perception fails, the action isn’t invoked; the user experiences task failure, not policy violation.

This means failures are confined to a narrower, more auditable category. And because every rule evaluation is logged in the symbolic trace, failures can be diagnosed to the exact point where perception diverged from intent.

Why This Matters

When business rules become a first-class object, the locus of control over agent behavior shifts. The people who own the policies are no longer dependent on the people who wrote the code. Compliance teams can read the rules they are responsible for. Operations can adjust thresholds without filing tickets. Product can ship a behavior in a sentence. Engineers stop being a bottleneck for business rules, because business rules stop being an engineering artifact.
It also changes what deployment means. The cost of building enterprise agents collapses from months of engineering to hours of describing behavior. And the cost of changing an agent — historically the thing that kills deployments — collapses to a sentence.

08. The CLI: Agents as Code

Agents are built in the Apollo-1 CLI. An agent’s configuration is a typed JSON codebase — structured files the runtime reads directly, with no prompt in between.

The CLI lets developers and companies programmatically build, test, and deploy Apollo-1 agents using their own coding agents, editors, and development environments. Full setup and usage are covered in the documentation.

Edit .aui.json files in Cursor, VS Code, or any editor with full schema autocomplete. Version control is git. The schema is typed. The diff against rules.aui.json is a real diff. An agent is a codebase, not a prompt.

Because the substrate is a codebase, it supports a second environment: one that non-engineers can use without leaving natural language.

09. The Playground

The Apollo-1 Playground is an environment for the stakeholders in an organization whose questions do not stop at code: compliance officers, operations leads, product managers, customer-experience owners. Also for engineers who want to debug an agent’s reasoning, watch a rule fire in real time, or iterate on a policy without round-tripping through a CLI.

The Playground gives each of them a window into the same agent the runtime runs. Three things are visible at once. A conversation pane, where the agent talks and acts. A reasoning pane, where every turn’s trace — initial state, execution, rule evaluation, generation — is laid out in full, in both a structured white-box view and a raw trace. And a Business Rules Agent: an environment inside the environment.

The Business Rules Agent

The Business Rules Agent is a coding agent with read-write access to the same .aui.json files the CLI and the runtime read. It is pointed at the agent’s symbolic codebase and at the live reasoning trace of each turn. It is a coding agent in the literal sense — it can inspect, diff, and modify an agent’s configuration — but the codebase it operates on is not ordinary code. It is a structured natural-language representation of the agent’s business policies, and a compliance officer or operations lead can reason about it without reading a line of JSON.

You talk to it. It reads and writes the same files Apollo-1 reads at inference, and it has the live reasoning trace of every turn in view. It can:

Explain decisions by reference to the actual symbolic trace. Why did you block that? has a literal answer: this rule, this predicate, this state.
Locate rules by description. You don’t need to know the file or the schema, just the policy.
Author or edit business rules in plain English and apply them as a diff against the agent’s configuration.
Evaluate changes against other scenarios and edge cases before pushing live.
Verify that the change fires correctly in the next turn.

Every edit is a diff against a real file. Every change is a version: auditable, revertable, attributable. Stakeholders author, edit, and test in natural language, saving their versions and submitting them for evaluation in the Playground simulator. The same version is viewable as a code diff in the Playground itself, appears as a diff in the engineer’s IDE, and — when selected for production, or while active — is what Apollo-1 reads at inference.

What made this possible is the coding-agent breakthrough of 2025. Until coding agents became capable of operating fluently over structured codebases, the maintenance overhead of symbolic representations was the standing argument against neuro-symbolic AI at scale. That argument is now closed. A symbolic codebase authored in natural language and maintained by a coding agent is easier to evolve than a system prompt, and it is enforced.

An Example

We tested the full loop against the Credit Card Dispute agent, configured with the rule “block disputes for transactions older than 8 days.”

Enforcement. A user asks to dispute a Grocery Store charge from 9 days ago. The agent pulls the user record, fetches the transaction history, and refuses: the charge is more than 8 days old. The reasoning trace shows what happened. The planner selected Create Dispute and populated the parameters. Before the call could execute, the Symbolic Reasoning Engine evaluated the policy rule against the current state. The predicate today - txn.date <= 8 days returned false. The block was logged, and the tool call did not happen. The agent talked like an LLM and refused like a compiler.

Editing. In the Business Rules Agent, we typed: “Change the dispute blocking rule from 8 days to 3 days.” The agent located the existing rule, produced a diff against rules.aui.json, updated the user-facing explanation so the agent’s spoken refusal would match the new threshold, and pushed the change live. Both edits landed in the same place because the rule and its explanation are the same symbolic object.

Verification. In a fresh conversation, the agent refused to dispute a charge from 4 days ago — disputable under the old rule, blocked under the new one. Same trace shape. New predicate, new threshold. The behavior moved with the rule, by construction, because the behavior is the rule.

The loop took under a minute. No engineer. No redeployment. Every step is auditable: a specific rule, in a specific file, firing at a specific point in the reasoning loop, with a diff you can read.

10. What Apollo-1 Isn’t For

Apollo-1’s architecture makes deliberate trade-offs. By optimizing for task-oriented agents, it does not compete in other domains — by design.

Open-ended creative work. Creative writing, brainstorming, exploratory dialogue where variation creates value. Transformers remain the superior architecture. Apollo-1’s symbolic structures enforce consistency; creativity often requires the opposite.

Code generation. Apollo-1 can integrate with code execution tools, but its symbolic language is purpose-built for task execution, not software development.

Low-stakes, high-variation scenarios. Customer engagement campaigns, educational tutoring, entertainment chatbots — when conversational variety enhances user experience, probabilistic variation is preferable to formal enforcement.

11. Availability

Ahead of general availability, Apollo-1 is already deployed at scale across dozens of enterprises in regulated and unregulated industries, including Fortune 500 companies. Additional partnerships to power consumer-facing AI at some of the world’s largest companies in retail, automotive, financial services, and insurance will be announced alongside GA. Strategic go-to-market partnership with Google.

A Preview Playground is accessible here, featuring Apollo-1 agents across HR, IT, regulated industries, retail, automotive warranties, and more — domains where we have active early deployments, simulated here for preview. Each agent runs from its system prompt alone, viewable as code in the Playground or in a UI view for non-technical stakeholders. The Business Rules Agent is available on every one of them.

A technical paper — architectural specifications, formal proofs, procedural ontology samples, turn-closure semantics — will be released alongside GA.

General Availability: Q2 2026.

Apollo-1 integrates with existing generative AI workflows and adapts to any API or external system — no changes to endpoints, no data preprocessing. Native connectivity with Salesforce, HubSpot, Zendesk, and others. Full MCP support.

GA launches with:

The Conversational API — the modality used throughout this paper, with Messaging and System Prompt endpoints.
The Apollo-1 Playground and CLI.
Full documentation and toolkits.

Following in 2026:

The Workflow Automation API — a second API modality, for task-oriented workflows that don’t originate in a user message.
Voice support.
Fully local agent development in the CLI — pushing the agent file as-is, without server-side compilation.

Looking ahead, Apollo-1 improves on three independent axes. Its neural modules improve with every advance in low-latency LLMs. Its symbolic language evolves as we extend its coverage of task-oriented reasoning — identifying gaps, surfacing new structures, and automatically deriving finer symbols and representations from patterns that emerge across instances. And as coding agents continue to improve, so does the ease of building on Apollo-1.

12. Conclusion

Open-ended agents work for users. Apollo-1 is the first foundation model for agents that work on behalf of the thing the user is talking to — a bank, an airline, an insurer, a hospital, a retailer. Every conversation that moves money, books a seat, files a claim, authorizes a return, or schedules a procedure is one of those conversations. They run the economy.

Until now, no model could be trusted to hold them. Generation and enforcement were separate systems, stapled together with a prompt and a prayer. Apollo-1 makes them the same computation.

Apollo-1 is the first foundation model for task-oriented agents an organization can stand up without an engineering army. The vision is that every organization in the world will.

A different kind of model, for a different kind of agent, running on a different reasoning framework, doing the work language models were never built to do.

Back