Post

bonsai-ollama — Ollama Proxy for Q1_0 GGUF Models

HTTP proxy for PrismML Bonsai 1.7B Q1_0 via llama-server — stock Ollama cannot load Q1_0 yet.

bonsai-ollama — Ollama Proxy for Q1_0 GGUF Models

bonsai-ollama is an Ollama-facing HTTP proxy for PrismML Bonsai 1.7B (Q1_0 GGUF) via llama-server — because stock Ollama cannot load Q1_0 quantizations yet.

Repository: github.com/eSlider/bonsai-ollama · last push 2026-04-22 · ★2

Problem

Ultra-low-bit quants (Q1_0) save VRAM but need a loader that supports them. Ollama’s native loader rejects some formats — this proxy exposes an OpenAI-compatible surface while llama-server handles the model file.

Stack position

flowchart LR
  Client[OpenAI-compatible client] --> Proxy[bonsai-ollama]
  Proxy --> LlamaServer[llama-server]
  LlamaServer --> GGUF[Bonsai Q1_0 GGUF]

Tech stack

Go · llama-server · GGUF · HTTP proxy · Ollama API compatibility

This post is licensed under CC BY 4.0 by the author.