bonsai-ollama — Ollama Proxy for Q1_0 GGUF Models
HTTP proxy for PrismML Bonsai 1.7B Q1_0 via llama-server — stock Ollama cannot load Q1_0 yet.
bonsai-ollama — Ollama Proxy for Q1_0 GGUF Models
bonsai-ollama is an Ollama-facing HTTP proxy for PrismML Bonsai 1.7B (Q1_0 GGUF) via llama-server — because stock Ollama cannot load Q1_0 quantizations yet.
Repository: github.com/eSlider/bonsai-ollama · last push 2026-04-22 · ★2
Problem
Ultra-low-bit quants (Q1_0) save VRAM but need a loader that supports them. Ollama’s native loader rejects some formats — this proxy exposes an OpenAI-compatible surface while llama-server handles the model file.
Stack position
flowchart LR
Client[OpenAI-compatible client] --> Proxy[bonsai-ollama]
Proxy --> LlamaServer[llama-server]
LlamaServer --> GGUF[Bonsai Q1_0 GGUF]
Related
- go-ollama — Go client for Ollama + Open WebUI
- go-second-brain — Ollama embed/generate in RAG stack
Tech stack
Go · llama-server · GGUF · HTTP proxy · Ollama API compatibility
This post is licensed under CC BY 4.0 by the author.