MatterAI Documentation | AI Super Intelligence

Overview

Input Prompt token caching is now live for all Axon models! This feature significantly reduces costs and improves response times by caching frequently used prompt tokens.

Key Features

Default TTL: 5 minutes
Supported Modes: Both streaming and non-streaming responses
Automatic: Works transparently with all Axon models
Cost Reduction: Cached tokens are billed at a reduced rate

How It Works

When you send a request with similar or identical prompt content, the system automatically caches the tokenized prompt. Subsequent requests within the 5-minute TTL period will reuse the cached tokens, resulting in:

Faster response times
Lower token costs
Improved API performance

Usage Example

The cached tokens are reported in the API response under the usage.prompt_tokens_details field:

{
    "id": "chatcmpl-954ab54b-ee3f-4199-b0a7-06457c426dc8",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "matched_stop": 151645,
            "message": {
                "content": "Hello. How can I assist you today?",
                "role": "assistant",
                "tool_calls": null
            }
        }
    ],
    "created": 1766466952,
    "model": "axon-mini",
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 25,
        "completion_tokens": 74,
        "total_tokens": 99,
        "completion_tokens_details": {
            "reasoning_tokens": 64
        },
        "prompt_tokens_details": {
            "cached_tokens": 25
        }
    }
}

Best Practices

To maximize the benefits of prompt caching:

Reuse System Prompts: Keep system messages consistent across requests
Batch Similar Requests: Send related requests within the 5-minute window
Cache-Friendly Content: Use stable, reusable prompt components
Monitor Usage: Track cached token metrics to optimize your integration

Benefits

Cost Savings: Up to 70% reduction in prompt token costs for cached content
Performance: Faster response times for cached prompts
Scalability: Better API performance under high load
Transparency: Clear visibility into cached token usage via API responses

Getting Started

Axon AI Coding Agent

LLM Models and Pricing

AI Code Reviews

Integrations

Prompt Caching

Overview

Key Features

How It Works

Usage Example

Best Practices

Benefits

Getting Started

Axon AI Coding Agent

LLM Models and Pricing

AI Code Reviews

Integrations

​Overview

​Key Features

​How It Works

​Usage Example

​Best Practices

​Benefits

Overview

Key Features

How It Works

Usage Example

Best Practices

Benefits