Introduction

Aura Live is a high-performance, real-time transcription (Automatic Speech Recognition) platform built with FastAPI and designed for low-latency live audio processing.

Core Capabilities

Real-time Transcription: Process audio streams and receive text results with minimal latency.
Multilingual Support: Support for a growing list of languages and dialects.
WebSocket Protocol: A robust, binary-friendly protocol for streaming audio data.
REST API: Comprehensive management of languages, models, sessions, and user settings.
Translation: Real-time translation of transcription output between language pairs.
Extensible Architecture: Built to integrate with various ASR engine backends.

How it Works

The typical lifecycle of an Aura Live interaction follows this flow:

Authentication: The client authenticates using the /login endpoint to obtain a JWT token.
Discovery: The client queries the REST API to discover available languages, ASR models, and translation pairs.
Session Initiation: The client creates a “Live Session” via the REST API.
Streaming: The client connects to the provided WebSocket URI and streams raw audio data.
Result Retrieval: Aura Live sends transcription results back through the same WebSocket connection.

Authentication

Aura Live supports two authentication methods for REST endpoints:

Bearer Token (Authorization: Bearer <token>) — JWT obtained from /login.
API Key (X-API-Key: <key>) — managed from your user settings.

For WebSocket connections, authentication is handled during the initial handshake via a token query parameter.

Target Audience

This documentation is intended for:

Developers looking to integrate real-time transcription into their applications.
System Integrators connecting Aura Live to existing audio infrastructure.
Users who want to understand the underlying mechanics of the transcription service.