Documentation Index
Fetch the complete documentation index at: https://docs.matterai.so/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Input Prompt token caching is now live for all Axon models! This feature significantly reduces costs and improves response times by caching frequently used prompt tokens.Key Features
- Default TTL: 5 minutes
- Supported Modes: Both streaming and non-streaming responses
- Automatic: Works transparently with all Axon models
- Cost Reduction: Cached tokens are billed at a reduced rate
How It Works
When you send a request with similar or identical prompt content, the system automatically caches the tokenized prompt. Subsequent requests within the 5-minute TTL period will reuse the cached tokens, resulting in:- Faster response times
- Lower token costs
- Improved API performance
Usage Example
The cached tokens are reported in the API response under theusage.prompt_tokens_details field:
Best Practices
To maximize the benefits of prompt caching:- Reuse System Prompts: Keep system messages consistent across requests
- Batch Similar Requests: Send related requests within the 5-minute window
- Cache-Friendly Content: Use stable, reusable prompt components
- Monitor Usage: Track cached token metrics to optimize your integration
Benefits
- Cost Savings: Up to 70% reduction in prompt token costs for cached content
- Performance: Faster response times for cached prompts
- Scalability: Better API performance under high load
- Transparency: Clear visibility into cached token usage via API responses