Skip to content

WebSocket Protocol

The Aura Live WebSocket protocol allows for full-duplex communication: you send raw audio binary data to the server, and the server returns JSON transcription results.

Connect to the WebSocket endpoint using your session ID and auth token.

URL Structure: wss://<api-domain>/ws/live?session_id=<SESSION_ID>&token=<TOKEN>

Once connected, you should send audio data as Binary Frames.

  • Encoding: Linear16 (PCM)
  • Sample Rate: 16000 Hz (16kHz)
  • Channels: Mono (1 channel)
  • Bit Depth: 16-bit

[!NOTE] Sending audio in 16kHz mono is significantly more efficient than higher sample rates and results in faster transcription with the same accuracy.

The server will send back Text Frames containing JSON objects.

{
"text": "Hello world",
"is_final": false,
"start": 1.2,
"end": 2.5,
"language": "en"
}
  • text: The transcribed fragment.
  • is_final:
    • false: This is an interim result (the user is still speaking).
    • true: This is a final result (the engine has finished processing this segment).
  • start / end: Timestamp of the fragment relative to the start of the session.
  1. Keep-Alive: The server handles idle timeouts. Ensure you are streaming data or sending a heartbeat if required by your specific network environment.
  2. Closing: Send a Close frame to gracefully terminate the session. The server will respond with any remaining results before closing.