You have an audio recording and need it as text. Maybe a meeting, an interview, a lecture, or a podcast episode. Here's every free method to get it done — with honest accuracy numbers and the trade-offs nobody tells you.
| Method | Input | Accuracy | Time | Limit | Difficulty |
|---|---|---|---|---|---|
| Browser STT + speakers | Live mic/speakers | 90-95% | Real-time | ✓ Unlimited | Easy |
| YouTube auto-captions | Video file upload | 85-90% | 5-15 min | ✓ Free | Easy |
| Whisper web tools | Audio file upload | 95%+ | 10-30 min | ~Varies by tool | Easy-Medium |
| Whisper self-hosted | Audio file (local) | 95%+ | 10-30 min | ✓ Unlimited | Hard (Python) |
| Otter.ai free | Live/recording | Very good | Real-time | 300 min/month | Easy |
| Google Docs voice typing | Live mic | 90-95% | Real-time | ✓ Unlimited | Easy |
The simplest method for any audio recording:
Tip: Position your microphone close to your speaker. Reduce background noise. This method works best for clear, single-speaker audio.
Limitation: Real-time only — a 1-hour recording takes 1 hour to transcribe. Multiple speakers may cause confusion.
Good for: Long recordings where you don't want to sit through real-time transcription.
Limitation: 85-90% accuracy. Misses proper nouns, technical terms, and quiet speech. Requires uploading to Google's servers.
No free transcription tool gives you perfect text. Budget 15-20 minutes of editing per hour of audio:
| Audio Quality | Expected Accuracy | Tips |
|---|---|---|
| Studio recording, single speaker | 95%+ | Best case — minimal editing needed |
| Good mic, quiet room | 90-95% | Normal use case — expect minor edits |
| Phone recording, some noise | 80-90% | Review carefully — expect word substitutions |
| Conference room, multiple speakers | 70-85% | Significant editing needed — consider paid tools |
| Outdoor, windy, distant | 60-75% | May not be worth automated transcription |
Free transcription saves time, not effort. Instead of manually typing from scratch (20-30 min per minute of audio), you edit machine output (5-10 min per minute of audio). That's a 3-4x speedup. But you will still need to edit — no free tool gives you publish-ready text.
Start transcribing — no signup, no upload, free.
Open Speech to Text