Transform your voice
into polished ideas.
Open-source AI dictation tool for knowledge workers. Set up in minutes and start dictating your ideas in seconds.
Open-source AI dictation tool for knowledge workers. Set up in minutes and start dictating your ideas in seconds.
Built for power users who want speed, control, and an app that stays out of the way.
No subscription. Use your own provider keys and pay only for what you transcribe/refine.
Pick Deepgram for ultra-fast transcription or OpenAI Whisper for accuracy. Switch anytime. Support additional AI refinment using OpenAI, Gemini, Cerebras, Ollama, or custom providers.
Glossary + custom prompts keep product names, acronyms, and formatting consistent across sessions.
Hold-to-talk or toggle, global hotkeys, and live config updates—direct dictated text insertion into the active app without ever breaking the flow.
Turn spoken thoughts into clean emails, prompts, tickets, and docs without leaving your editor
Optimize for speed (Deepgram Nova-3) or accuracy (OpenAI Whisper) depending on the task.
Turn messy speech into professional-grade bullet points, emails, and PRDs with LLM refinement.
Stop acronyms, repo names, and product terms from getting mangled. Your vocabulary, preserved.
Tweak hotkeys, providers, and models while running—changes apply instantly, no restart.
Hold-to-talk for quick thoughts, or toggle for longer brain-dumps. Fully customizable hotkeys.
Results paste into whatever you’re typing in—issue tracker, doc, IDE comment—no copy/paste.
From speech to polished text in seconds
Hold your hotkey and speak naturally. Release when done.
AI converts your speech to text using leading STT providers.
Smart cleanup fixes grammar, punctuation, and formatting.
Polished text appears instantly in your active application.
Quick answers on setup, privacy, and cost
PushToTalk is a local desktop app. Your audio/text is sent directly from your machine to the providers you configure for transcription/refinement—there’s no PushToTalk backend.
You'll need an API key for your chosen speech-to-text provider (OpenAI or Deepgram). Text refinement is optional (currently supported: OpenAI GPT, Cerebras, Gemini, Ollama, custom providers like Together AI). You don't need a PushToTalk account—just your provider API key(s).
No PushToTalk account. You'll need an account with whichever provider(s) you use (OpenAI, Deepgram, Cerebras) to obtain API keys.
Yes—PushToTalk is free and open source (MIT). You’ll pay your providers directly for transcription/refinement usage (pay-as-you-go), and many offer free tiers to start. In the upcoming version (v0.6.0), you can also use your own custom endpoints or local LLM models for text refinement.
Not currently. PushToTalk requires an internet connection to call your configured AI providers for transcription and text refinement (optional).
Sometimes, yes. If your hotkey doesn't trigger recording, try running PushToTalk as admin or choose a different hotkey combo that doesn't conflict with other apps.
Language support depends on your chosen STT provider. OpenAI Whisper supports 50+ languages including English, Spanish, French, German, Chinese, Japanese, and many more. Deepgram also offers multi-language support.
PushToTalk doesn't run a backend or upload your data anywhere except to the provider(s) you configure. By default it doesn't keep audio; in debug mode it may save recordings locally for troubleshooting. Provider retention/privacy policies still apply.
Yes. Use custom system prompts and a glossary to control tone, formatting, and terminology. You can tweak settings while running and changes apply instantly.