$KTransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Key Features

Advanced Optimization

Enhanced kernel optimizations and placement/parallelism strategies for improved performance.

Flexible Integration

Compatible with 🤗 Transformers interface and RESTful APIs compliant with OpenAI and Ollama.

Resource Efficient

Run powerful models like DeepSeek-Coder-V3 using only 14GB VRAM and 382GB DRAM.

Latest Updates

Feb 15, 2025

KTransformers V0.2.1 Released

Longer Context (from 4K to 8K for 24GB VRAM) & Slightly Faster Speed (+15%) (Up to 16 Tokens/s)

Feb 10, 2025

Deepseek-R1 and V3 Support

Support for single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~28x speedup.