What is Tambourine?

Tambourine is an open-source voice dictation platform built on Pipecat. It provides a modular pipeline for speech-to-text with LLM-based formatting, supporting 10+ STT/LLM providers or fully local inference with Whisper and Ollama.

Can I run it completely offline/locally?

Yes. Use Whisper for local STT and Ollama for local LLM formatting. No API keys or internet required. Set WHISPER_ENABLED=true and OLLAMA_BASE_URL in your .env file.

What's the architecture?

Tauri desktop app (Rust + React) connects via WebRTC to a Python server running Pipecat pipelines. The server handles STT → LLM formatting → returns cleaned text. Runtime config via RTVI protocol.

How do I add a new STT/LLM provider?

Tambourine uses Pipecat's service abstraction. Add any Pipecat-supported provider by implementing the service interface. See server/ for examples with Deepgram, Cartesia, Groq, etc.

How customizable is the formatting?

Fully customizable. The LLM formatting uses editable prompts stored in settings. Modify filler word removal, punctuation rules, backtracking behavior, or add your own custom logic. Personal dictionary supports technical terms and proper nouns.

What's the tech stack?

Desktop: Tauri (Rust) + React + TypeScript
Server: Python + FastAPI + Pipecat
Communication: WebRTC (SmallWebRTC)
State: Zustand + XState
Validation: Zod + Pydantic

Ready to try it?

Tambourine is open source and free to self-host.

View on GitHub