Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
-
Updated
Mar 20, 2026 - Rust
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
Android inference engine running 20B+ parameter LLMs on 4GB-8GB RAM devices. Features proprietary Layer-by-Layer (LBL) streaming, zero-copy mmap loading, and native C++/Kotlin architecture.
Convert and quantize llm models
Privacy-first Local RAG Server: Chat with PDF & DOCX using GGUF models via llama.cpp and Qdrant. A lightweight, standalone FastAPI server with a clean HTML UI. High-performance, fully offline document intelligence. No Ollama, no cloud, no API keys.
Splinter is an atomic, lock-free persist-able shared memory KV & vector store that runs LLM inference without socket, mutex or memcpy() overhead; it ingests, stores and optionally persists huge amounts of data without latency. Splinter fits in the size of most modern CPU instruction caches (875 LOC) and ships with CLI , tools and tests.
A simple Gradio app for local translation using the GGUF versions of MADLAD-400
Emotica AI is a compassionate and therapeutic virtual assistant designed to provide empathetic and supportive conversations. It integrates a local LLaMA model for text generation, a vision model for image captioning, a RAG system for information retrieval, and emotion detection to tailor its responses.
Nectar-X-Studio is a powerful, Local AI-Inferencing application that allows the user download, create, run agents and run large language models on their own machine. With no internet connection required, Nectar ensures privacy-first, high-performance inference using cutting-edge open-source models from Hugging Face, Ollama, and beyond.
Containerized LLM for any use-case big or small
AI tool to help users research using local LLMs and automated web search.
Add a description, image, and links to the gguf-model-support topic page so that developers can more easily learn about it.
To associate your repository with the gguf-model-support topic, visit your repo's landing page and select "manage topics."