Edelweiss — Video Assistant MVP (Matrix + GraphRAG)
MVP: Element assistant bot — voice/text → STT → GraphRAG → Ollama humanize → TTS, with feedback into Qdrant and Neo4j.
Edelweiss video assistant · MVP design · 2026-06
Matrix-native knowledge assistant for healthcare users in Element: converse by text, voice message, or call; answers grounded in Edelweiss Markdown → Neo4j + Qdrant, humanized by local Ollama (Gemma / Bonsai).
Builds on
- Matrix / WebRTC stack — Synapse, coturn, LiveKit (architecture below)
- go-second-brain — GraphRAG bot SDK
- Edelweiss healthcare stack — pflege homeservers
MVP flow
- User speaks or types in Element room with bot
- STT (Faster-Whisper / Vosk) for audio
- GraphRAG — Qdrant semantic chunks + Neo4j associations
- Ollama — generate + humanize (Gemma / Bonsai)
- Reply as text + optional TTS (Piper / Coqui)
- Positive feedback → store refined context in Qdrant + Neo4j
Voice uses a dedicated bot media path; the room timeline stays the audit trail and text fallback.
Portfolio detail: eSlider/cv — video-assistant-mvp · v1 architecture · MVP v2 source · SVG diagram
Architecture v1 — modular pipeline
Original modular design: bot inside Matrix homeserver, STT → GraphRAG → Ollama → TTS, feedback via ingestor.
flowchart TD
subgraph Client["Element Client Web X"]
User["User text voice call"]
Element["Element interface"]
end
subgraph Matrix["Matrix Homeserver"]
Synapse["Synapse server"]
Coturn["coturn TURN STUN"]
LiveKit["LiveKit SFU MatrixRTC"]
Bot["Knowledge bot service"]
end
subgraph Voice["Voice pipeline"]
STT["STT Faster-Whisper or Vosk"]
TTS["TTS Piper or Coqui"]
end
subgraph RAG["GraphRAG layer"]
Qdrant["Qdrant semantic vectors"]
Neo4j["Neo4j association graph"]
Ingest["Ingestor Markdown to graph"]
end
subgraph AI["AI inference"]
LLM["Ollama Gemma Bonsai humanize"]
end
subgraph Storage["Persistence"]
KB["Knowledge base Markdown files"]
Feedback["Feedback store"]
end
User -->|text or audio| Element
Element -->|signaling events| Synapse
Synapse -->|signaling events| Element
Element -->|media streams| LiveKit
LiveKit -->|media streams| Element
Element -->|WebRTC media| Coturn
Synapse -->|bot events| Bot
Bot -->|bot events| Synapse
LiveKit -->|bot joins calls| Bot
Bot -->|bot joins calls| LiveKit
Bot -->|audio stream| STT
STT -->|transcript text| Bot
Bot -->|semantic search| Qdrant
Bot -->|graph traversal| Neo4j
Qdrant -->|retrieved context| Bot
Neo4j -->|retrieved context| Bot
Bot -->|prompt plus context| LLM
LLM -->|generated response| Bot
Bot -->|text reply| Element
Bot -->|audio reply| TTS
TTS -->|voice message| Element
KB -->|ingest| Ingest
Ingest -->|chunks and entities| Qdrant
Ingest -->|nodes and edges| Neo4j
User -->|reaction feedback| Bot
Bot -->|store refined data| Feedback
Feedback -->|update| Ingest
style Bot fill:#e1f5fe
style LLM fill:#f3e5f5
style LiveKit fill:#e8f5e8
Architecture MVP v2 — Wan Streamer lane
flowchart TB
subgraph Client["Element Client"]
User["User text voice call"]
Element["Element interface"]
User -->|text or audio| Element
end
subgraph Matrix["Matrix Homeserver"]
Synapse["Synapse server"]
Coturn["coturn TURN STUN"]
LiveKit["LiveKit SFU MatrixRTC"]
Element -->|signaling and events| Synapse
Element -->|WebRTC media| Coturn
Element -->|Element Call| LiveKit
end
Bot["Knowledge bot service"]
Synapse -->|bot events| Bot
LiveKit -->|audio stream| Bot
subgraph VoiceMVP["MVP voice pipeline"]
STT["STT Faster-Whisper or Vosk"]
TTS["TTS Piper or Coqui"]
Bot -->|audio stream| STT
STT -->|transcript text| Bot
Bot -->|answer text| TTS
TTS -->|audio reply| Bot
end
subgraph VoiceV2["v2 Wan Streamer when available"]
Wan["Wan Streamer duplex AV about 500ms"]
Bot -.->|future swap| Wan
Wan -.->|sync audio video| Bot
end
subgraph GraphRAG["GraphRAG layer"]
Qdrant["Qdrant semantic vectors"]
Neo4j["Neo4j association graph"]
Ingestor["Ingestor Markdown to graph"]
Bot -->|semantic search| Qdrant
Bot -->|graph traversal| Neo4j
Qdrant -->|retrieved context| Bot
Neo4j -->|retrieved context| Bot
end
subgraph AI["AI inference"]
LLM["Ollama Gemma Bonsai humanize"]
Bot -->|prompt plus context| LLM
LLM -->|generated response| Bot
end
subgraph Persistence["Persistence"]
KB["Knowledge base Markdown files"]
FB["Feedback store"]
KB -->|ingest| Ingestor
Ingestor -->|chunks and entities| Qdrant
Ingestor -->|nodes and edges| Neo4j
Bot --> FB
FB -->|update| Qdrant
FB -->|update| Neo4j
end
Bot -->|text reply| Synapse
Bot -->|voice message| Synapse
Synapse --> Element
Element -->|reaction feedback| Bot
One voice turn
sequenceDiagram
participant U as User Element
participant S as Synapse
participant B as Knowledge bot
participant W as STT
participant Q as Qdrant
participant N as Neo4j
participant L as Ollama
participant T as TTS
U->>S: voice message or call audio
S->>B: Matrix event
B->>W: audio bytes
W-->>B: transcript text
B->>Q: semantic search
Q-->>B: chunks
B->>N: graph associations
N-->>B: entities paths
B->>L: prompt history retrieval
L-->>B: humanized answer
B->>S: text reply
B->>T: answer text
T-->>B: audio
B->>S: voice message optional
S-->>U: text and or audio
U->>S: thumbs up reaction
S->>B: feedback event
B->>Q: store embedding
B->>N: store triples
Rollout
| Phase | Deliverable |
|---|---|
| 1 | Text !brain RAG (shipped in go-second-brain) |
| 2 | Voice message STT → RAG → TTS |
| 3 | Element Call / LiveKit bot join |
| 4 | Reaction feedback → graph + vector store |
| 5 | Wan Streamer — end-to-end duplex voice/video when available (~500 ms vs modular STT+RAG+TTS) |
Next version — Wan Streamer
MVP uses a modular voice path (STT → GraphRAG → Ollama → TTS). The next version adapts Wan Streamer as soon as it is available for integration: one end-to-end streaming Transformer for real-time, full-duplex audio-visual interaction (~200 ms model-side, ~500–550 ms total per their v0.1 figures — much faster than chained ASR/LLM/TTS).
GraphRAG (Qdrant + Neo4j) remains the knowledge layer; Wan Streamer becomes the interaction layer (voice reply, optional synchronized video agent).
Related
go-second-brain · Edelweiss healthcare · Bonsai Ollama proxy