OpenAI Whisper: Audio and Video Transcription on VPS
Whisper is OpenAI's open-source speech recognition model supporting 99 languages. Runs fully locally on your VPS.
Install
apt install -y ffmpeg
pip install openai-whisperPython API
import whisper
model = whisper.load_model("medium")
result = model.transcribe("audio.mp3", language="en")
print(result["text"])Faster-Whisper (4x speed)
from faster_whisper import WhisperModel
model = WhisperModel("medium", device="cpu", compute_type="int8")
segments, _ = model.transcribe("audio.mp3")
for s in segments:
print(f"[{s.start:.2f}s] {s.text}")Tip: Use
small for speed+accuracy balance. large-v3 for maximum accuracy (needs 10GB VRAM).