Fix: ElizaOS Action Triggers Return AI Model Responses
If your ElizaOS agent replies with conversational text instead of executing a configured action handler (e.g., executing a swap or sending a token), the LLM has broken the structured output schema. To fix this, you must explicitly inject the required JSON action template into the character configuration’s system prompt and enforce json_mode on the target model API.
Prompt Enforcement Configuration
{
"system": "You are an autonomous execution node. You MUST respond ONLY with the following strict JSON schema: {\"action\": \"[ACTION_NAME]\", \"payload\": {}}. Do not include conversational text or markdown."
}
Architectural Context: Schema Validation and Execution Drops
The ElizaOS architecture maps natural language intent to programmatic TypeScript execution handlers. This bridge is the most fragile part of any autonomous agent system. While the agent’s “brain” is a probabilistic Large Language Model (LLM), the “body” (the execution runtime) is a deterministic engine.
The JSON Parsing Loop
When ElizaOS receives a message, it passes the context to the LLM. The expected output is a structured object. The runtime’s orchestrator then attempts to parse this output:
- String Sanitization: The orchestrator trims the response and removes markdown code blocks (e.g.,
```json). - JSON.parse(): It attempts to convert the string into a JavaScript object.
- Action Mapping: It searches for an exact
actionstring (e.g.,SEND_TOKEN) that matches a registered plugin handler.
If the model is not properly constraint-tuned, it reverts to its base conversational alignment—a behavior reinforced by billions of parameters of human chat data. Instead of returning {"action": "SWAP"}, it outputs, “Certainly! I’ll swap those tokens for you right away.” Because this string lacks the structured action key, the validation loop fails. In ElizaOS, this is known as an Execution Drop. The engine aborts the programmatic handler and routes the text directly to the user interface as a standard reply.
Context Collapse in Complex Workflows
Operators debugging complex interactions, such as an elizaos ai16z token migration tangem hardware wallet snapshot error, often find that “Action Drops” are the root cause. If the agent is supposed to trigger a signature request but instead explains why it is migrating, the cryptographic handshake never initiates. A dropped payload is mathematically identical to a dropped execution handler: if the structure is invalid, the engine’s deterministic components cannot fire.
Preventative Maintenance: Structured Output Fallbacks
To immunize your agent from conversational drift and execution drops, you must implement a multi-layered defense strategy at the API, prompt, and parser levels.
1. API-Level Constraints: OpenAI Structured Outputs
If you are using GPT-4o or newer models, the most effective solution is to utilize Structured Outputs. Unlike standard json_mode, Structured Outputs utilize a context-free grammar to force the model’s token sampling to strictly follow your provided JSON schema.
Implementation Logic:
Inside your ElizaOS provider configuration, ensure the response_format is set to json_schema. This forces the model to treat your action schema as a hard constraint rather than a suggestion.
const responseFormat = {
type: "json_schema",
json_schema: {
name: "action_trigger",
strict: true,
schema: {
type: "object",
properties: {
action: { type: "string", enum: ["SWAP_TOKENS", "SEND_MESSAGE", "NONE"] },
payload: { type: "object" }
},
required: ["action", "payload"]
}
}
};
2. Regex-Based “Scraper” Fallback
Production-grade agents often encounter models that ignore system instructions (common in smaller local models like Llama-3-8B). Implement an interceptor function in your ElizaOS message handler that “scrapes” the conversational noise for valid JSON.
- Algorithm: Use a regex pattern like
/\{(?:[^{}]|(?R))*\}/gto identify potential JSON objects within a larger text string. - Validation: Attempt to
JSON.parseeach match. The first object containing a validactionkey is promoted to the execution handler, while the surrounding “fluff” text is discarded.
3. System Prompt Engineering: The “Negative Constraint”
Modern LLMs respond better to “Negative Constraints” (what NOT to do) when coupled with “Few-Shot Examples.”
- Few-Shotting: Provide 3-5 examples of user inputs followed by the exact JSON output required.
- Penalty Clauses: Explicitly state: “PENALTY: If you include any text outside the JSON brackets, the operation will fail and assets will be lost. DO NOT include greetings like ‘Sure’ or ‘Okay’.”
Security Policies: Handler Validation and Error Thresholds
When an action trigger fails, it’s not just a UI bug—it’s a security risk. A “hanging” state can lead to double-spends or lost opportunities in fast-moving DeFi markets.
- Execution Retries with Exponential Backoff: If the parser fails to find an action, the agent should not simply stop. Configure a “Schema Recovery” loop where the agent is re-prompted with the error message: “ERROR: Your last response was not valid JSON. Please retry using ONLY the schema.”
- Health-Check Monitors: Implement a dashboard that tracks “Action Success Rate.” If the ratio of conversational replies to successful handlers drops below 90%, the agent should be automatically paused for prompt re-calibration.
- Fallback to Manual Approval: For high-value actions (e.g., swapping >$1000), if the JSON schema is malformed but the “intent” seems clear, the agent should route the request to a human-in-the-loop dashboard rather than failing silently.
Advanced FAQ: Technical Contextual Analysis
Why does ‘json_mode’ still occasionally return invalid JSON?
json_mode ensures that the output is JSON, but it does not ensure the JSON is complete or matches your schema. For example, if the model reaches its token limit, it may return a truncated, invalid JSON string. Always set a high max_tokens limit and use a finish_reason check to ensure the model completed its generation.
Can I use ElizaOS actions without JSON?
Technically, no. The ElizaOS runtime is built on the concept of “Plugins” which are triggered by a string-match against the action field. While you could write a custom parser that reads natural language, you would sacrifice the deterministic reliability that JSON provides. It is far better to fix the model’s formatting than to weaken the runtime’s validation.
Does fine-tuning help with action trigger reliability?
Absolutely. Fine-tuning a model on a dataset of (Intent -> Action JSON) pairs is the “gold standard” for autonomous agents. A fine-tuned model (even a smaller one like Mistral-7B) will significantly outperform a generic GPT-4 in schema adherence because it has “learned” that its only job is to generate structured data.