$KTransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Key Features
Advanced Optimization
Enhanced kernel optimizations and placement/parallelism strategies for improved performance.
Flexible Integration
Compatible with 🤗 Transformers interface and RESTful APIs compliant with OpenAI and Ollama.
Resource Efficient
Run powerful models like DeepSeek-Coder-V3 using only 14GB VRAM and 382GB DRAM.
Latest Updates
Feb 15, 2025
KTransformers V0.2.1 Released
Longer Context (from 4K to 8K for 24GB VRAM) & Slightly Faster Speed (+15%) (Up to 16 Tokens/s)
Feb 10, 2025
Deepseek-R1 and V3 Support
Support for single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~28x speedup.