Introduction
Aura is an ASR (Automatic Speech Recognition) API providing transcription, alignment, punctuation, translation, and transcription evaluation services.
Core Capabilities
Section titled “Core Capabilities”- Transcription: Submit audio or video files for asynchronous transcription with support for multiple Whisper models (
tiny,base,small,medium,large-v1,large-v2,large-v3). - Alignment: Align text against audio/video, supporting JSON, HTML, or automatic transcription alignment.
- Translation: Translate JSON transcription files between language pairs in multiple modes (
full,text,word). - Punctuation: Restore punctuation marks on raw transcribed text.
- Transcription Evaluation: Evaluate transcription quality using WER (Word Error Rate) metrics in multiple modes (
prediction,estimation,computation,full).
Servers
Section titled “Servers”| Environment | URL |
|---|---|
| Preproduction | https://aura-preprod.authot.app |
| Production | https://aura.authot.app |
How it Works
Section titled “How it Works”The typical lifecycle of an Aura interaction follows this flow:
- Authentication: Sign in with email and password to obtain a Bearer access token.
- Job Submission: Submit a file (audio, video, or text depending on the service) via the REST API.
- Polling or Callback: Monitor job status by polling or provide a
callback_urlfor automatic notification. - Result Retrieval: Download the result once the job is finished.
Target Audience
Section titled “Target Audience”This documentation is intended for:
- Developers integrating speech recognition services into their applications.
- System Integrators connecting Aura to existing audio/video infrastructure.