New release: PushToTalk v0.5.0

Transform your voice
into polished ideas.

Open-source AI dictation tool for knowledge workers. Set up in minutes and start dictating your ideas in seconds.

GitHub Stars | Free & Open Source | Bring your own API keys | No account required

Powered by AI providers you trust

Why PushToTalk?

Built for power users who want speed, control, and an app that stays out of the way.

Free app, pay-as-you-go APIs

No subscription. Use your own provider keys and pay only for what you transcribe/refine.

Provider choice (no lock-in)

Pick Deepgram for ultra-fast transcription or OpenAI Whisper for accuracy. Switch anytime. Support additional AI refinment using OpenAI, Gemini, Cerebras, Ollama, or custom providers.

Customization that actually sticks

Glossary + custom prompts keep product names, acronyms, and formatting consistent across sessions.

Stay in flow

Hold-to-talk or toggle, global hotkeys, and live config updates—direct dictated text insertion into the active app without ever breaking the flow.

Dictation that fits your
knowledge workflow

Turn spoken thoughts into clean emails, prompts, tickets, and docs without leaving your editor

Multi-Provider STT

Optimize for speed (Deepgram Nova-3) or accuracy (OpenAI Whisper) depending on the task.

AI Text Refinement

Turn messy speech into professional-grade bullet points, emails, and PRDs with LLM refinement.

Custom Glossary

Stop acronyms, repo names, and product terms from getting mangled. Your vocabulary, preserved.

Live Config Sync

Tweak hotkeys, providers, and models while running—changes apply instantly, no restart.

Flexible Recording

Hold-to-talk for quick thoughts, or toggle for longer brain-dumps. Fully customizable hotkeys.

Auto Text Insertion

Results paste into whatever you’re typing in—issue tracker, doc, IDE comment—no copy/paste.

Common Use Cases

Slack messages Jira tickets Standup notes PRDs & specs Meeting follow-ups Code review comments Release notes Design docs

How it works

From speech to polished text in seconds

Record

Hold your hotkey and speak naturally. Release when done.

Transcribe

AI converts your speech to text using leading STT providers.

Refine

Smart cleanup fixes grammar, punctuation, and formatting.

Insert

Polished text appears instantly in your active application.

Frequently asked questions

Quick answers on setup, privacy, and cost

Privacy & control

PushToTalk is a local desktop app. Your audio/text is sent directly from your machine to the providers you configure for transcription/refinement—there’s no PushToTalk backend.

You'll need an API key for your chosen speech-to-text provider (OpenAI or Deepgram). Text refinement is optional (currently supported: OpenAI GPT, Cerebras, Gemini, Ollama, custom providers like Together AI). You don't need a PushToTalk account—just your provider API key(s).

No PushToTalk account. You'll need an account with whichever provider(s) you use (OpenAI, Deepgram, Cerebras) to obtain API keys.

Yes—PushToTalk is free and open source (MIT). You’ll pay your providers directly for transcription/refinement usage (pay-as-you-go), and many offer free tiers to start. In the upcoming version (v0.6.0), you can also use your own custom endpoints or local LLM models for text refinement.

Not currently. PushToTalk requires an internet connection to call your configured AI providers for transcription and text refinement (optional).

Sometimes, yes. If your hotkey doesn't trigger recording, try running PushToTalk as admin or choose a different hotkey combo that doesn't conflict with other apps.

Language support depends on your chosen STT provider. OpenAI Whisper supports 50+ languages including English, Spanish, French, German, Chinese, Japanese, and many more. Deepgram also offers multi-language support.

PushToTalk doesn't run a backend or upload your data anywhere except to the provider(s) you configure. By default it doesn't keep audio; in debug mode it may save recordings locally for troubleshooting. Provider retention/privacy policies still apply.

Yes. Use custom system prompts and a glossary to control tone, formatting, and terminology. You can tweak settings while running and changes apply instantly.

Ready to transform your workflow?

Start dictating smarter today. Free and open source.

Windows 10/11 • Bring your own API key • Hotkeys may require admin • ~25MB download