Create Live Session Url
POST /lives
đ€ Authentification Utilisateur Requise
Câest le point dâaccĂšs principal pour crĂ©er et lancer une nouvelle session de retranscription en temps rĂ©el. Il est extrĂȘmement flexible et permet de configurer le modĂšle dâIA, la langue, les traductions, la source audio, et mĂȘme lâapparence de lâaffichage.
- Gestion des Ressources (VRAM) : La crĂ©ation dâune session rĂ©serve de la mĂ©moire GPU (VRAM) sur le serveur. Si les ressources requises par le modĂšle et les traductions dĂ©passent la capacitĂ© disponible, la crĂ©ation Ă©chouera.
- Limites Utilisateur : La création est soumise aux limites définies dans les
User Settingsde lâutilisateur (nombre maximum de sessions et de traductions). - Auto Start : Pour les sources audio de type flux (
hls,rtmp,rtsp), le paramÚtreauto_start: truelancera automatiquement le processus de retranscription en arriÚre-plan sur le serveur. - Jetons de Sécurité : Chaque session génÚre deux jetons uniques :
sender_token: Pour lâapplication qui envoie le flux audio au serveur WebSocket.listener_token: Pour les clients (viewers) qui se connectent pour recevoir les sous-titres.
Example :
{
"model": "tiny",
"language": "fr",
"audio_format": "rtsp",
"audio_url": "rtsp://localhost:8554/mystream",
"auto_start": true,
"live_retention": false,
"config": {
"vad": true,
"max_buffer": 5
},
"background_color": "1a1a1a",
"color_text": "ffffff",
"size_text": 24,
"translations": ["en"]
}Authorizations
Section titled âAuthorizations âRequest Body
Section titled âRequest Body âobject
The unique name of the transcription model to use (e.g., âbaseâ, âlarge-v3â). Must be an active model.
Example
baseThe two-letter ISO 639-1 code of the language to be transcribed (e.g., âenâ, âfrâ). Must be an active language.
Example
enThe format of the audio that will be sent. Use raw for pushing audio via WebSocket. Use hls, rtmp, etc., for the server to pull from a URL specified in audio_url.
Example
rtmpThe source URL of the audio stream. This is required and only used when audio_format is hls, rtmp, or rtsp.
Example
rtmp://media.example.com/live/stream1A list of target language ISO 639-1 codes for which to generate simultaneous translations.
Example
[ "fr", "es", "de"]If true, the service will immediately start trying to process the stream from the audio_url. This only applies to URL-based audio formats.
Request to save the final transcription result. This is only honored if the userâs account has the user_retention permission enabled.
object
Enable or disable Voice Activity Detection (VAD). If true, the engine will only process audio segments that contain speech, which can reduce processing load and improve accuracy by filtering out silence or noise.
Example
trueThe transcription mode, controlling how results are sent. Typically âpartialâ for faster, intermediate results or âfinalâ for more accurate, completed segments.
Example
partialThe minimum amount of audio (in seconds) to buffer before sending it to the ASR model. Higher values can improve accuracy on short phrases but increase initial latency.
Example
2The maximum amount of audio (in seconds) to buffer before forcing a transcription. This acts as an upper bound on latency.
Example
4The maximum duration of silence (in seconds) to wait before considering a phrase complete and finalizing the transcription for the current buffer.
Example
4The maximum allowed delay (in seconds) between the incoming audio and the returned transcription. The system may adjust buffer sizes to stay within this limit.
Example
4The maximum number of characters to include in a single line of the returned transcription. Used for formatting the output for display.
Example
40The maximum number of lines to display in the transcription output. When this limit is reached, older lines may be removed.
Example
2Enable or disable punctuation. If true, the engine will add punctuation.
Example
trueobject
The background color of the iframe content. Must be a valid hex color code without the leading â#â symbol.
Example
ffffffThe color of the transcribed text inside the iframe. Must be a valid hex color code without the leading â#â symbol.
Example
000000The font size of the transcribed text, in pixels.
Example
16An optional, user-provided token for listeners. If provided and not in use by another user, it will be assigned to this session. If omitted, a random token will be generated.
Example
my-custom-event-2025Responses
Section titled â Responses âSuccessful response (inferred from assertions)
object
object
The unique numeric identifier for the live session.
Example
101The current state of the live sessionâs lifecycle.
Example
activeIf true, the final transcription result for this session will be saved and can be retrieved later. This is only possible if the user has the user_retention permission.
Example
trueThe format of the incoming audio stream.
Example
rawIndicates if the session was configured to start processing automatically for URL-based audio formats.
The total amount of Video RAM (in Megabytes) allocated for this session, including the ASR model and all requested translation models.
Example
2100The source URL for the audio stream if the format is hls, rtmp, or rtsp. Will be null for raw format where audio is pushed via WebSocket.
Example
rtmp://media.example.com/live/stream1Internal Process ID for the subprocess managing an auto-started session. Primarily used for system diagnostics.
Example
12345A pre-generated HTML <iframe> snippet for easily embedding a view-only display of the live transcription. This is only returned if it was specifically requested upon session creation.
Example
<iframe src=... ></iframe>The secret token required to authenticate and send audio data to the sessionâs WebSocket endpoint. This token is sensitive and must be kept secure.
Example
sender_abc123...The token required to connect to the WebSocket and receive transcription and translation results. This token can be shared publicly to allow others to view the live output.
Example
listener_xyz789...The base WebSocket URL for connecting to the live session.
Example
ws://api.example.comThe relative path for the WebSocket endpoint.
Example
/ws/liveobject
Enable or disable Voice Activity Detection (VAD). If true, the engine will only process audio segments that contain speech, which can reduce processing load and improve accuracy by filtering out silence or noise.
Example
trueThe transcription mode, controlling how results are sent. Typically âpartialâ for faster, intermediate results or âfinalâ for more accurate, completed segments.
Example
partialThe minimum amount of audio (in seconds) to buffer before sending it to the ASR model. Higher values can improve accuracy on short phrases but increase initial latency.
Example
2The maximum amount of audio (in seconds) to buffer before forcing a transcription. This acts as an upper bound on latency.
Example
4The maximum duration of silence (in seconds) to wait before considering a phrase complete and finalizing the transcription for the current buffer.
Example
4The maximum allowed delay (in seconds) between the incoming audio and the returned transcription. The system may adjust buffer sizes to stay within this limit.
Example
4The maximum number of characters to include in a single line of the returned transcription. Used for formatting the output for display.
Example
40The maximum number of lines to display in the transcription output. When this limit is reached, older lines may be removed.
Example
2Enable or disable punctuation. If true, the engine will add punctuation.
Example
trueobject
The unique numeric identifier for the model. Use this ID when creating new resources that depend on a specific model, such as a live session.
Example
1The unique, human-readable name of the transcription model (e.g., âtinyâ, âlarge-v1â, âlarge-v2â). This name is used to select a model when creating a new live session via the API.
Example
tinyIndicates whether the model is currently available for use. An inactive model (false) cannot be used to create new transcription sessions.
Example
trueThe estimated amount of VRAM (in Megabytes) required to load and run this model. This value is used to calculate resource allocation and prevent overloading the system.
Example
1250object
The unique numeric identifier for the language. This is the internal primary key.
Example
10The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.
Example
enThe full, human-readable name of the language in English.
Example
EnglishIndicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.
Example
trueRepresents an available translation capability from a specific source language to a target language.
object
The unique numeric identifier for this specific translation path (e.g., English to French).
Example
5The name of the underlying machine translation model used for this language pair.
Example
nllb-200-distilled-600MIndicates whether this translation path is available for use. An inactive translation (false) cannot be requested when creating a live session.
Example
trueThe additional amount of Video RAM (in Megabytes) required to load this translation model. This is added to the VRAM of the base transcription model.
Example
850The source language that can be translated from.
object
The unique numeric identifier for the language. This is the internal primary key.
Example
10The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.
Example
enThe full, human-readable name of the language in English.
Example
EnglishIndicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.
Example
trueThe target language that the source can be translated into.
object
The unique numeric identifier for the language. This is the internal primary key.
Example
10The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.
Example
enThe full, human-readable name of the language in English.
Example
EnglishIndicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.
Example
trueExample
{ "data": { "created_at": "2025-11-18T17:13:25.370815+01:00", "updated_at": null, "live_retention": false, "config": { "vad": true, "mode": "partial", "max_chars": 40, "max_delay": 4, "max_lines": 2, "max_buffer": 5, "min_buffer": 1, "buffer_timeout": 3 }, "id": 12, "status": "pending", "vram": 1425, "iframe_html": null, "audio_format": "rtsp", "audio_url": "rtsp://localhost:8554/mystream", "auto_start": true, "auto_start_pid": null, "base_url": "ws://localhost:8000/backend", "sender_token": "_VtvMv3mYKtVs4sbjJC5lGMS-eSOQnnheT6Li30fmr8", "listener_token": "aSPhiwbTbHkCQItRpl9s9HZhbKYPrUv00A3aVCTzwRE", "relative_path": "/ws/live/12", "model": { "created_at": "2025-11-13T16:29:53.934766+01:00", "updated_at": null, "name": "tiny", "active": true, "vram": 625, "id": 1 }, "language": { "created_at": "2025-11-13T16:29:53.954832+01:00", "updated_at": null, "iso_639_1": "fr", "name": "French", "id": 7, "active": true }, "translations": [ { "created_at": "2025-11-13T16:29:53.978144+01:00", "updated_at": null, "model": "opus-mt-fr-en", "active": false, "vram": 800, "id": 589, "lang_from": { "created_at": "2025-11-13T16:29:53.954832+01:00", "updated_at": null, "iso_639_1": "fr", "name": "French", "id": 7, "active": true }, "lang_to": { "created_at": "2025-11-13T16:29:53.954832+01:00", "updated_at": null, "iso_639_1": "en", "name": "English", "id": 1, "active": true } } ] }}