WebSocket Protocol
The Aura Live WebSocket protocol allows for full-duplex communication: you send raw audio binary data to the server, and the server returns JSON transcription results.
Connection
Section titled “Connection”Connect to the WebSocket endpoint using your session ID and auth token.
URL Structure:
wss://<api-domain>/ws/live?session_id=<SESSION_ID>&token=<TOKEN>
Sending Audio
Section titled “Sending Audio”Once connected, you should send audio data as Binary Frames.
Recommended Format:
Section titled “Recommended Format:”- Encoding: Linear16 (PCM)
- Sample Rate: 16000 Hz (16kHz)
- Channels: Mono (1 channel)
- Bit Depth: 16-bit
[!NOTE] Sending audio in 16kHz mono is significantly more efficient than higher sample rates and results in faster transcription with the same accuracy.
Receiving Results
Section titled “Receiving Results”The server will send back Text Frames containing JSON objects.
Typical Message Format:
Section titled “Typical Message Format:”{ "text": "Hello world", "is_final": false, "start": 1.2, "end": 2.5, "language": "en"}text: The transcribed fragment.is_final:false: This is an interim result (the user is still speaking).true: This is a final result (the engine has finished processing this segment).
start / end: Timestamp of the fragment relative to the start of the session.
Handling the Connection
Section titled “Handling the Connection”- Keep-Alive: The server handles idle timeouts. Ensure you are streaming data or sending a heartbeat if required by your specific network environment.
- Closing: Send a
Closeframe to gracefully terminate the session. The server will respond with any remaining results before closing.