High-speed Large Language Model Serving for Local Deployment
-
Updated
Jan 24, 2026 - C++
High-speed Large Language Model Serving for Local Deployment
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant
On-device AI for iOS & Android
Notolog Markdown Editor
Tool for test diferents large language models without code.
Desktop AI tutoring app with local inference using Ollama for privacy-focused education.
Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).
EN: An overfitted SD prompt engine with severe "aesthetic snobbery," forcibly transforming mundane ideas into professional-grade physical rendering instructions. CN: 一个具备“审美洁癖”的过拟合提示词引擎,强行将平庸构思纠偏为具备极致物理质感的工业级渲染指令。
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
Edge Agent Lab is an Android testing platform for evaluating small language model (SLM) agents directly on mobile devices.
Local embeddings server for Apple Silicon using MLX, providing OpenAI-compatible API endpoints
Verify claims using AI agents that debate using scraped evidence and local language models.
Privacy‑first, real‑time speech‑to‑text dictation. 100% local inference in Rust; hotkey to dictate anywhere (macOS, Linux, Windows).
An agentic, zero‑shot document intelligence engine that sees, understands, and extracts from any PDF, no training, no hallucinations. Just define your fields and get trusted, structured outputs with confidence scores, deployed locally and built for the enterprise.
Local voice typing for Windows powered by SenseVoice. 15x faster than Whisper for Chinese input.
MCP server that runs local LLMs (with full access to MCP tools included). Callable by Python to chain MCP tools with local intelligence.
AI study assistant for engineering students.
A lightweight Python implementation of Microsoft's Phi-3 model running locally on CPU.
Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.
To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."