bonsai-ollama — Ollama Proxy for Q1_0 GGUF Models

HTTP proxy for PrismML Bonsai 1.7B Q1_0 via llama-server — stock Ollama cannot load Q1_0 yet.

Posted Apr 22, 2026

By Andriy Oblivantsev

1 min read

bonsai-ollama is an Ollama-facing HTTP proxy for PrismML Bonsai 1.7B (Q1_0 GGUF) via llama-server — because stock Ollama cannot load Q1_0 quantizations yet.

Repository: github.com/eSlider/bonsai-ollama · last push 2026-04-22 · ★2

Problem

Ultra-low-bit quants (Q1_0) save VRAM but need a loader that supports them. Ollama’s native loader rejects some formats — this proxy exposes an OpenAI-compatible surface while llama-server handles the model file.

Stack position

flowchart LR
  Client[OpenAI-compatible client] --> Proxy[bonsai-ollama]
  Proxy --> LlamaServer[llama-server]
  LlamaServer --> GGUF[Bonsai Q1_0 GGUF]

go-ollama — Go client for Ollama + Open WebUI
go-second-brain — Ollama embed/generate in RAG stack

Tech stack

Go · llama-server · GGUF · HTTP proxy · Ollama API compatibility

Projects, Programming

Edit this post

Go Ollama LLM Docker

This post is licensed under CC BY 4.0 by the author.

Problem

Stack position

Related

Tech stack

Trending Tags