Overview
Input Prompt token caching is now live for all Axon models! This feature significantly reduces costs and improves response times by caching frequently used prompt tokens.Key Features
- Default TTL: 5 minutes
- Supported Modes: Both streaming and non-streaming responses
- Automatic: Works transparently with all Axon models
- Cost Reduction: Cached tokens are billed at a reduced rate
How It Works
When you send a request with similar or identical prompt content, the system automatically caches the tokenized prompt. Subsequent requests within the 5-minute TTL period will reuse the cached tokens, resulting in:- Faster response times
- Lower token costs
- Improved API performance
Usage Example
The cached tokens are reported in the API response under theusage.prompt_tokens_details field:
Best Practices
To maximize the benefits of prompt caching:- Reuse System Prompts: Keep system messages consistent across requests
- Batch Similar Requests: Send related requests within the 5-minute window
- Cache-Friendly Content: Use stable, reusable prompt components
- Monitor Usage: Track cached token metrics to optimize your integration
Benefits
- Cost Savings: Up to 70% reduction in prompt token costs for cached content
- Performance: Faster response times for cached prompts
- Scalability: Better API performance under high load
- Transparency: Clear visibility into cached token usage via API responses