Resource requirements for a Whisper STT web Service
I've been working on a web service to provide speech-to-text transcriptions to Chief of Staff.
Originally, I presumed some GPU-muscled serverless function would be the way to go. However, on the services I looked at the time the model loading meant extended delays in getting shorter transcriptions done.
I wondered: "How cheaply can a persistant instance of Whisper as a webservice be had, and how fast could it be to deliver reasonable quality?"
Ahmet Oner has a great starting place with his `whispe…