Post

Dreamteam — HOCR & Document Processing Pipelines

~200+ commits on HOCR content server, PDF generation, traffic analytics — client via produktor.io.

Dreamteam — HOCR & Document Processing Pipelines

Dreamteam · 2022–2023
Role: Client work delivered via ProProdukt SL / produktor.io (not a separate employer)

Summary

~200+ author commits on document/OCR and traffic-analytics tooling — precursor to go-hocr and AI document workflows.

Key repositories

RepositoryCommitsFocus
content-serve-hocr~96HOCR content server
hocr~72HOCR pipeline
dpo~75Data protection tooling
storyflash~36Storyflash product
trafficdesk~25Traffic analytics

Architecture

  • Go services serve HOCR (HTML OCR) content from Tesseract output
  • PDF generation and expert-report tooling
  • Docker Compose stacks for content server and address-parsing sidecars

Tech stack

Go · PHP · HOCR/OCR · PDF · Docker Compose

Follow-on

Public library: github.com/eSlider/go-hocr — hOCR 1.2 parser with YAML/HTML export (Jun 2026). See go-hocr post.

This post is licensed under CC BY 4.0 by the author.