Skip to content

Introduction

Aura is an ASR (Automatic Speech Recognition) API providing transcription, alignment, punctuation, translation, and transcription evaluation services.

  • Transcription: Submit audio or video files for asynchronous transcription with support for multiple Whisper models (tiny, base, small, medium, large-v1, large-v2, large-v3).
  • Alignment: Align text against audio/video, supporting JSON, HTML, or automatic transcription alignment.
  • Translation: Translate JSON transcription files between language pairs in multiple modes (full, text, word).
  • Punctuation: Restore punctuation marks on raw transcribed text.
  • Transcription Evaluation: Evaluate transcription quality using WER (Word Error Rate) metrics in multiple modes (prediction, estimation, computation, full).
EnvironmentURL
Preproductionhttps://aura-preprod.authot.app
Productionhttps://aura.authot.app

The typical lifecycle of an Aura interaction follows this flow:

  1. Authentication: Sign in with email and password to obtain a Bearer access token.
  2. Job Submission: Submit a file (audio, video, or text depending on the service) via the REST API.
  3. Polling or Callback: Monitor job status by polling or provide a callback_url for automatic notification.
  4. Result Retrieval: Download the result once the job is finished.

This documentation is intended for:

  • Developers integrating speech recognition services into their applications.
  • System Integrators connecting Aura to existing audio/video infrastructure.