Skip to content

Create Live Session Url

POST
/lives

đŸ‘€ Authentification Utilisateur Requise

C’est le point d’accĂšs principal pour crĂ©er et lancer une nouvelle session de retranscription en temps rĂ©el. Il est extrĂȘmement flexible et permet de configurer le modĂšle d’IA, la langue, les traductions, la source audio, et mĂȘme l’apparence de l’affichage.

  • Gestion des Ressources (VRAM) : La crĂ©ation d’une session rĂ©serve de la mĂ©moire GPU (VRAM) sur le serveur. Si les ressources requises par le modĂšle et les traductions dĂ©passent la capacitĂ© disponible, la crĂ©ation Ă©chouera.
  • Limites Utilisateur : La crĂ©ation est soumise aux limites dĂ©finies dans les User Settings de l’utilisateur (nombre maximum de sessions et de traductions).
  • Auto Start : Pour les sources audio de type flux (hls, rtmp, rtsp), le paramĂštre auto_start: true lancera automatiquement le processus de retranscription en arriĂšre-plan sur le serveur.
  • Jetons de SĂ©curitĂ© : Chaque session gĂ©nĂšre deux jetons uniques :
    • sender_token : Pour l’application qui envoie le flux audio au serveur WebSocket.
    • listener_token : Pour les clients (viewers) qui se connectent pour recevoir les sous-titres.

Example :

{
  "model": "tiny",
  "language": "fr",
  "audio_format": "rtsp",
  "audio_url": "rtsp://localhost:8554/mystream",
  "auto_start": true,
  "live_retention": false,
  "config": {
    "vad": true,
    "max_buffer": 5
  },
  "background_color": "1a1a1a",
  "color_text": "ffffff",
  "size_text": 24,
  "translations": ["en"]
}
object
model
required

The unique name of the transcription model to use (e.g., ‘base’, ‘large-v3’). Must be an active model.

string
Example
base
language
required

The two-letter ISO 639-1 code of the language to be transcribed (e.g., ‘en’, ‘fr’). Must be an active language.

string
Example
en
audio_format
required

The format of the audio that will be sent. Use raw for pushing audio via WebSocket. Use hls, rtmp, etc., for the server to pull from a URL specified in audio_url.

string
default: raw
Allowed values: raw hls rtmp file rtsp
Example
rtmp
audio_url

The source URL of the audio stream. This is required and only used when audio_format is hls, rtmp, or rtsp.

string
Example
rtmp://media.example.com/live/stream1
translations

A list of target language ISO 639-1 codes for which to generate simultaneous translations.

Array<string>
Example
[
"fr",
"es",
"de"
]
auto_start

If true, the service will immediately start trying to process the stream from the audio_url. This only applies to URL-based audio formats.

boolean
live_retention

Request to save the final transcription result. This is only honored if the user’s account has the user_retention permission enabled.

boolean
config
object
vad

Enable or disable Voice Activity Detection (VAD). If true, the engine will only process audio segments that contain speech, which can reduce processing load and improve accuracy by filtering out silence or noise.

boolean
Example
true
mode

The transcription mode, controlling how results are sent. Typically ‘partial’ for faster, intermediate results or ‘final’ for more accurate, completed segments.

string
Allowed values: partial final
Example
partial
min_buffer

The minimum amount of audio (in seconds) to buffer before sending it to the ASR model. Higher values can improve accuracy on short phrases but increase initial latency.

number format: float
Example
2
max_buffer

The maximum amount of audio (in seconds) to buffer before forcing a transcription. This acts as an upper bound on latency.

number format: float
Example
4
buffer_timeout

The maximum duration of silence (in seconds) to wait before considering a phrase complete and finalizing the transcription for the current buffer.

number format: float
Example
4
max_delay

The maximum allowed delay (in seconds) between the incoming audio and the returned transcription. The system may adjust buffer sizes to stay within this limit.

number format: float
Example
4
max_chars

The maximum number of characters to include in a single line of the returned transcription. Used for formatting the output for display.

integer
Example
40
max_lines

The maximum number of lines to display in the transcription output. When this limit is reached, older lines may be removed.

integer
Example
2
punctuate

Enable or disable punctuation. If true, the engine will add punctuation.

boolean
Example
true
iframe
object
background_color
required

The background color of the iframe content. Must be a valid hex color code without the leading ’#’ symbol.

string
default: ffffff
Example
ffffff
color_text
required

The color of the transcribed text inside the iframe. Must be a valid hex color code without the leading ’#’ symbol.

string
default: 000000
Example
000000
size_text
required

The font size of the transcribed text, in pixels.

integer
default: 16
Example
16
listener_token

An optional, user-provided token for listeners. If provided and not in use by another user, it will be assigned to this session. If omitted, a random token will be generated.

string
Example
my-custom-event-2025

Successful response (inferred from assertions)

object
data
required
object
id
required

The unique numeric identifier for the live session.

integer
Example
101
status
required

The current state of the live session’s lifecycle.

string
Allowed values: pending active finished error
Example
active
live_retention
required

If true, the final transcription result for this session will be saved and can be retrieved later. This is only possible if the user has the user_retention permission.

boolean
Example
true
audio_format
required

The format of the incoming audio stream.

string
Allowed values: raw hls rtmp file rtsp
Example
raw
auto_start
required

Indicates if the session was configured to start processing automatically for URL-based audio formats.

boolean
vram
required

The total amount of Video RAM (in Megabytes) allocated for this session, including the ASR model and all requested translation models.

integer
Example
2100
audio_url

The source URL for the audio stream if the format is hls, rtmp, or rtsp. Will be null for raw format where audio is pushed via WebSocket.

string
nullable
Example
rtmp://media.example.com/live/stream1
auto_start_pid

Internal Process ID for the subprocess managing an auto-started session. Primarily used for system diagnostics.

integer
nullable
Example
12345
iframe_html

A pre-generated HTML <iframe> snippet for easily embedding a view-only display of the live transcription. This is only returned if it was specifically requested upon session creation.

string
nullable
Example
<iframe src=... ></iframe>
sender_token
required

The secret token required to authenticate and send audio data to the session’s WebSocket endpoint. This token is sensitive and must be kept secure.

string
Example
sender_abc123...
listener_token
required

The token required to connect to the WebSocket and receive transcription and translation results. This token can be shared publicly to allow others to view the live output.

string
Example
listener_xyz789...
base_url
required

The base WebSocket URL for connecting to the live session.

string
Example
ws://api.example.com
relative_path
required

The relative path for the WebSocket endpoint.

string
Example
/ws/live
config
required
object
vad

Enable or disable Voice Activity Detection (VAD). If true, the engine will only process audio segments that contain speech, which can reduce processing load and improve accuracy by filtering out silence or noise.

boolean
Example
true
mode

The transcription mode, controlling how results are sent. Typically ‘partial’ for faster, intermediate results or ‘final’ for more accurate, completed segments.

string
Allowed values: partial final
Example
partial
min_buffer

The minimum amount of audio (in seconds) to buffer before sending it to the ASR model. Higher values can improve accuracy on short phrases but increase initial latency.

number format: float
Example
2
max_buffer

The maximum amount of audio (in seconds) to buffer before forcing a transcription. This acts as an upper bound on latency.

number format: float
Example
4
buffer_timeout

The maximum duration of silence (in seconds) to wait before considering a phrase complete and finalizing the transcription for the current buffer.

number format: float
Example
4
max_delay

The maximum allowed delay (in seconds) between the incoming audio and the returned transcription. The system may adjust buffer sizes to stay within this limit.

number format: float
Example
4
max_chars

The maximum number of characters to include in a single line of the returned transcription. Used for formatting the output for display.

integer
Example
40
max_lines

The maximum number of lines to display in the transcription output. When this limit is reached, older lines may be removed.

integer
Example
2
punctuate

Enable or disable punctuation. If true, the engine will add punctuation.

boolean
Example
true
model
required
object
id
required

The unique numeric identifier for the model. Use this ID when creating new resources that depend on a specific model, such as a live session.

integer
Example
1
name
required

The unique, human-readable name of the transcription model (e.g., ‘tiny’, ‘large-v1’, ‘large-v2’). This name is used to select a model when creating a new live session via the API.

string
Example
tiny
active
required

Indicates whether the model is currently available for use. An inactive model (false) cannot be used to create new transcription sessions.

boolean
Example
true
vram
required

The estimated amount of VRAM (in Megabytes) required to load and run this model. This value is used to calculate resource allocation and prevent overloading the system.

integer
Example
1250
language
required
object
id
required

The unique numeric identifier for the language. This is the internal primary key.

integer
Example
10
iso_639_1
required

The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.

string
Example
en
name
required

The full, human-readable name of the language in English.

string
Example
English
active
required

Indicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.

boolean
Example
true
translations
required
Array<object>

Represents an available translation capability from a specific source language to a target language.

object
id
required

The unique numeric identifier for this specific translation path (e.g., English to French).

integer
Example
5
model
required

The name of the underlying machine translation model used for this language pair.

string
Example
nllb-200-distilled-600M
active
required

Indicates whether this translation path is available for use. An inactive translation (false) cannot be requested when creating a live session.

boolean
Example
true
vram
required

The additional amount of Video RAM (in Megabytes) required to load this translation model. This is added to the VRAM of the base transcription model.

integer
Example
850
lang_from
required

The source language that can be translated from.

object
id
required

The unique numeric identifier for the language. This is the internal primary key.

integer
Example
10
iso_639_1
required

The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.

string
Example
en
name
required

The full, human-readable name of the language in English.

string
Example
English
active
required

Indicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.

boolean
Example
true
lang_to
required

The target language that the source can be translated into.

object
id
required

The unique numeric identifier for the language. This is the internal primary key.

integer
Example
10
iso_639_1
required

The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.

string
Example
en
name
required

The full, human-readable name of the language in English.

string
Example
English
active
required

Indicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.

boolean
Example
true
created_at
string
updated_at
string
Example
{
"data": {
"created_at": "2025-11-18T17:13:25.370815+01:00",
"updated_at": null,
"live_retention": false,
"config": {
"vad": true,
"mode": "partial",
"max_chars": 40,
"max_delay": 4,
"max_lines": 2,
"max_buffer": 5,
"min_buffer": 1,
"buffer_timeout": 3
},
"id": 12,
"status": "pending",
"vram": 1425,
"iframe_html": null,
"audio_format": "rtsp",
"audio_url": "rtsp://localhost:8554/mystream",
"auto_start": true,
"auto_start_pid": null,
"base_url": "ws://localhost:8000/backend",
"sender_token": "_VtvMv3mYKtVs4sbjJC5lGMS-eSOQnnheT6Li30fmr8",
"listener_token": "aSPhiwbTbHkCQItRpl9s9HZhbKYPrUv00A3aVCTzwRE",
"relative_path": "/ws/live/12",
"model": {
"created_at": "2025-11-13T16:29:53.934766+01:00",
"updated_at": null,
"name": "tiny",
"active": true,
"vram": 625,
"id": 1
},
"language": {
"created_at": "2025-11-13T16:29:53.954832+01:00",
"updated_at": null,
"iso_639_1": "fr",
"name": "French",
"id": 7,
"active": true
},
"translations": [
{
"created_at": "2025-11-13T16:29:53.978144+01:00",
"updated_at": null,
"model": "opus-mt-fr-en",
"active": false,
"vram": 800,
"id": 589,
"lang_from": {
"created_at": "2025-11-13T16:29:53.954832+01:00",
"updated_at": null,
"iso_639_1": "fr",
"name": "French",
"id": 7,
"active": true
},
"lang_to": {
"created_at": "2025-11-13T16:29:53.954832+01:00",
"updated_at": null,
"iso_639_1": "en",
"name": "English",
"id": 1,
"active": true
}
}
]
}
}