Lists (12)
Sort Name ascending (A-Z)
Starred repositories
A light weight vLLM simulator, for mocking out replicas.
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
slime is an LLM post-training framework for RL Scaling.
Open-source, secure environment with real-world tools for enterprise-grade agents.
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
Python framework for creating, editing, and running Noisy Intermediate-Scale Quantum (NISQ) circuits.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
PennyLane is an open-source quantum software platform for quantum computing, quantum machine learning, and quantum chemistry. Create meaningful quantum algorithms, from inspiration to implementation.
A debugging and profiling tool that can trace and visualize python code execution
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
An open-source AI agent that brings the power of Gemini directly into your terminal.
📄 Configuration files that enhance Cursor AI editor experience with custom rules and behaviors
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
DeepSeek-V3/R1 inference performance simulator
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
The official repository for the gem5 computer-system architecture simulator.
A lightweight design for computation-communication overlap.
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.




