Skip to content

Introduction

Aura Live is a high-performance, real-time transcription (Automatic Speech Recognition) platform built with FastAPI and designed for low-latency live audio processing.

  • Real-time Transcription: Process audio streams and receive text results with minimal latency.
  • Multilingual Support: Support for a growing list of languages and dialects.
  • WebSocket Protocol: A robust, binary-friendly protocol for streaming audio data.
  • REST API: Comprehensive management of languages, models, sessions, and user settings.
  • Translation: Real-time translation of transcription output between language pairs.
  • Extensible Architecture: Built to integrate with various ASR engine backends.

The typical lifecycle of an Aura Live interaction follows this flow:

  1. Authentication: The client authenticates using the /login endpoint to obtain a JWT token.
  2. Discovery: The client queries the REST API to discover available languages, ASR models, and translation pairs.
  3. Session Initiation: The client creates a “Live Session” via the REST API.
  4. Streaming: The client connects to the provided WebSocket URI and streams raw audio data.
  5. Result Retrieval: Aura Live sends transcription results back through the same WebSocket connection.

Aura Live supports two authentication methods for REST endpoints:

  • Bearer Token (Authorization: Bearer <token>) — JWT obtained from /login.
  • API Key (X-API-Key: <key>) — managed from your user settings.

For WebSocket connections, authentication is handled during the initial handshake via a token query parameter.

This documentation is intended for:

  • Developers looking to integrate real-time transcription into their applications.
  • System Integrators connecting Aura Live to existing audio infrastructure.
  • Users who want to understand the underlying mechanics of the transcription service.