Audio Transcriber API

Transcribe audio from URL or upload from local disk.

Sep 28, 2023 • 3 min read

Python
FastAPI
OpenAI Whisper
Docker

About the project

With this API, users can simply post a URL or upload an audio file, and the API will transcribe the audio and returns the transcription. This project was created so I can transcribe limitless audio files locally without calling external APIs.

I built the API endpoints using FastAPI. FastAPI is a modern and high-performance web framework for building APIs based on standard Python type hints. It provides automatic interactive documentation and it is a lot fun working with the tool.

How it works

When the client upload an audio to the API, it will assign a Celery task for transcribing the audio and returns the task ID.

Python

@app.post('/transcribe')
async def transcribe(audio: AudioFile = None, url: str = Form(None)):
    ...
    if audio:
        # Save the uploaded file into a temporary file
        ext = pathlib.Path(audio.filename).suffix
        _, filepath = tempfile.mkstemp(dir='/tmp', suffix=ext)
        with open(filepath, 'wb') as f:
            f.write(audio.file.read())

        # Transcribe asynchronously
        try:
            task = transcribe_from_file.delay(filepath)
        except TaskException as e:
            raise HTTPException(status_code=500, detail=str(e))

    return {'taskId': task.id}

The client then need to check the status of the given task on a separate endpoint. When the transcribing is done, the client will also receive the result.

Python

@app.get('/transcribe/{task_id}')
async def transcribe_status(task_id: str):
    task = celery.AsyncResult(task_id)
    if task.ready():
        return {'status': 'DONE', 'result': task.get()}
    else:
        return {'status': 'IN_PROGRESS'}

Under the hood, the API is using OpenAI Whisper model for transcribing the audio. The model is relatively small but it gives very good results.

Python

import whisper
...
@celery.task
def transcribe_from_file(filepath: str):
    model = whisper.load_model('base')
    result = model.transcribe(filepath)
    ...
    return result

Summary

By using this API, users can transcribe their audio files locally without having to spend additional costs for using third-party APIs. You can download the source code on my Github.