Create Live Session Url

POST

/lives

Production

👤 Authentification Utilisateur Requise

C’est le point d’accès principal pour créer et lancer une nouvelle session de retranscription en temps réel. Il est extrêmement flexible et permet de configurer le modèle d’IA, la langue, les traductions, la source audio, et même l’apparence de l’affichage.

Gestion des Ressources (VRAM) : La création d’une session réserve de la mémoire GPU (VRAM) sur le serveur. Si les ressources requises par le modèle et les traductions dépassent la capacité disponible, la création échouera.
Limites Utilisateur : La création est soumise aux limites définies dans les User Settings de l’utilisateur (nombre maximum de sessions et de traductions).
Auto Start : Pour les sources audio de type flux (hls, rtmp, rtsp), le paramètre auto_start: true lancera automatiquement le processus de retranscription en arrière-plan sur le serveur.
Jetons de Sécurité : Chaque session génère deux jetons uniques :
- sender_token : Pour l’application qui envoie le flux audio au serveur WebSocket.
- listener_token : Pour les clients (viewers) qui se connectent pour recevoir les sous-titres.

Example :

{
  "model": "tiny",
  "language": "fr",
  "audio_format": "rtsp",
  "audio_url": "rtsp://localhost:8554/mystream",
  "auto_start": true,
  "live_retention": false,
  "config": {
    "vad": true,
    "max_buffer": 5
  },
  "background_color": "1a1a1a",
  "color_text": "ffffff",
  "size_text": 24,
  "translations": ["en"]
}

Authorizations

apiKeyHeader

Request Body

object

model

required

The unique name of the transcription model to use (e.g., ‘base’, ‘large-v3’). Must be an active model.

string

Example

base

language

required

The two-letter ISO 639-1 code of the language to be transcribed (e.g., ‘en’, ‘fr’). Must be an active language.

string

Example

en

audio_format

required

The format of the audio that will be sent. Use raw for pushing audio via WebSocket. Use hls, rtmp, etc., for the server to pull from a URL specified in audio_url.

string

default: raw

Allowed values: raw hls rtmp file rtsp

Example

rtmp

audio_url

The source URL of the audio stream. This is required and only used when audio_format is hls, rtmp, or rtsp.

string

Example

rtmp://media.example.com/live/stream1

translations

A list of target language ISO 639-1 codes for which to generate simultaneous translations.

Array<string>

Example

[
  "fr",
  "es",
  "de"
]

auto_start

If true, the service will immediately start trying to process the stream from the audio_url. This only applies to URL-based audio formats.

boolean

live_retention

Request to save the final transcription result. This is only honored if the user’s account has the user_retention permission enabled.

boolean

config

object

vad

Enable or disable Voice Activity Detection (VAD). If true, the engine will only process audio segments that contain speech, which can reduce processing load and improve accuracy by filtering out silence or noise.

boolean

Example

true

mode

The transcription mode, controlling how results are sent. Typically ‘partial’ for faster, intermediate results or ‘final’ for more accurate, completed segments.

string

Allowed values: partial final

Example

partial

min_buffer

The minimum amount of audio (in seconds) to buffer before sending it to the ASR model. Higher values can improve accuracy on short phrases but increase initial latency.

number format: float

Example

max_buffer

The maximum amount of audio (in seconds) to buffer before forcing a transcription. This acts as an upper bound on latency.

number format: float

Example

buffer_timeout

The maximum duration of silence (in seconds) to wait before considering a phrase complete and finalizing the transcription for the current buffer.

number format: float

Example

max_delay

The maximum allowed delay (in seconds) between the incoming audio and the returned transcription. The system may adjust buffer sizes to stay within this limit.

number format: float

Example

max_chars

The maximum number of characters to include in a single line of the returned transcription. Used for formatting the output for display.

integer

Example

max_lines

The maximum number of lines to display in the transcription output. When this limit is reached, older lines may be removed.

integer

Example

punctuate

Enable or disable punctuation. If true, the engine will add punctuation.

boolean

Example

true

iframe

object

background_color

required

The background color of the iframe content. Must be a valid hex color code without the leading ’#’ symbol.

string

default: ffffff

Example

ffffff

color_text

required

The color of the transcribed text inside the iframe. Must be a valid hex color code without the leading ’#’ symbol.

string

default: 000000

Example

size_text

required

The font size of the transcribed text, in pixels.

integer

default: 16

Example

listener_token

An optional, user-provided token for listeners. If provided and not in use by another user, it will be assigned to this session. If omitted, a random token will be generated.

string

Example

my-custom-event-2025

Responses

0

201

Successful response (inferred from assertions)

object

data

required

object

required

The unique numeric identifier for the live session.

integer

Example

status

required

The current state of the live session’s lifecycle.

string

Allowed values: pending active finished error

Example

active

live_retention

required

If true, the final transcription result for this session will be saved and can be retrieved later. This is only possible if the user has the user_retention permission.

boolean

Example

true

audio_format

required

The format of the incoming audio stream.

string

Allowed values: raw hls rtmp file rtsp

Example

raw

auto_start

required

Indicates if the session was configured to start processing automatically for URL-based audio formats.

boolean

vram

required

The total amount of Video RAM (in Megabytes) allocated for this session, including the ASR model and all requested translation models.

integer

Example

audio_url

The source URL for the audio stream if the format is hls, rtmp, or rtsp. Will be null for raw format where audio is pushed via WebSocket.

string

nullable

Example

rtmp://media.example.com/live/stream1

auto_start_pid

Internal Process ID for the subprocess managing an auto-started session. Primarily used for system diagnostics.

integer

nullable

Example

iframe_html

A pre-generated HTML <iframe> snippet for easily embedding a view-only display of the live transcription. This is only returned if it was specifically requested upon session creation.

string

nullable

Example

<iframe src=... ></iframe>

sender_token

required

The secret token required to authenticate and send audio data to the session’s WebSocket endpoint. This token is sensitive and must be kept secure.

string

Example

sender_abc123...

listener_token

required

The token required to connect to the WebSocket and receive transcription and translation results. This token can be shared publicly to allow others to view the live output.

string

Example

listener_xyz789...

base_url

required

The base WebSocket URL for connecting to the live session.

string

Example

ws://api.example.com

relative_path

required

The relative path for the WebSocket endpoint.

string

Example

/ws/live

config

required

object

vad

boolean

Example

true

mode

The transcription mode, controlling how results are sent. Typically ‘partial’ for faster, intermediate results or ‘final’ for more accurate, completed segments.

string

Allowed values: partial final

Example

partial

min_buffer

The minimum amount of audio (in seconds) to buffer before sending it to the ASR model. Higher values can improve accuracy on short phrases but increase initial latency.

number format: float

Example

max_buffer

The maximum amount of audio (in seconds) to buffer before forcing a transcription. This acts as an upper bound on latency.

number format: float

Example

buffer_timeout

The maximum duration of silence (in seconds) to wait before considering a phrase complete and finalizing the transcription for the current buffer.

number format: float

Example

max_delay

The maximum allowed delay (in seconds) between the incoming audio and the returned transcription. The system may adjust buffer sizes to stay within this limit.

number format: float

Example

max_chars

The maximum number of characters to include in a single line of the returned transcription. Used for formatting the output for display.

integer

Example

max_lines

The maximum number of lines to display in the transcription output. When this limit is reached, older lines may be removed.

integer

Example

punctuate

Enable or disable punctuation. If true, the engine will add punctuation.

boolean

Example

true

model

required

object

required

The unique numeric identifier for the model. Use this ID when creating new resources that depend on a specific model, such as a live session.

integer

Example

name

required

The unique, human-readable name of the transcription model (e.g., ‘tiny’, ‘large-v1’, ‘large-v2’). This name is used to select a model when creating a new live session via the API.

string

Example

tiny

active

required

Indicates whether the model is currently available for use. An inactive model (false) cannot be used to create new transcription sessions.

boolean

Example

true

vram

required

The estimated amount of VRAM (in Megabytes) required to load and run this model. This value is used to calculate resource allocation and prevent overloading the system.

integer

Example

language

required

object

required

The unique numeric identifier for the language. This is the internal primary key.

integer

Example

iso_639_1

required

The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.

string

Example

en

name

required

The full, human-readable name of the language in English.

string

Example

English

active

required

Indicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.

boolean

Example

true

translations

required

Array<object>

Represents an available translation capability from a specific source language to a target language.

object

required

The unique numeric identifier for this specific translation path (e.g., English to French).

integer

Example

model

required

The name of the underlying machine translation model used for this language pair.

string

Example

nllb-200-distilled-600M

active

required

Indicates whether this translation path is available for use. An inactive translation (false) cannot be requested when creating a live session.

boolean

Example

true

vram

required

The additional amount of Video RAM (in Megabytes) required to load this translation model. This is added to the VRAM of the base transcription model.

integer

Example

lang_from

required

The source language that can be translated from.

object

required

The unique numeric identifier for the language. This is the internal primary key.

integer

Example

iso_639_1

required

The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.

string

Example

en

name

required

The full, human-readable name of the language in English.

string

Example

English

active

required

Indicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.

boolean

Example

true

lang_to

required

The target language that the source can be translated into.

object

required

The unique numeric identifier for the language. This is the internal primary key.

integer

Example

iso_639_1

required

The standard two-letter ISO 639-1 code for the language. This is the primary field you should use when specifying a language for a new transcription session.

string

Example

en

name

required

The full, human-readable name of the language in English.

string

Example

English

active

required

Indicates whether the language is currently available for transcription. An inactive language (false) cannot be selected when creating a new live session.

boolean

Example

true

created_at

string

updated_at

string

Example

{
  "data": {
    "created_at": "2025-11-18T17:13:25.370815+01:00",
    "updated_at": null,
    "live_retention": false,
    "config": {
      "vad": true,
      "mode": "partial",
      "max_chars": 40,
      "max_delay": 4,
      "max_lines": 2,
      "max_buffer": 5,
      "min_buffer": 1,
      "buffer_timeout": 3
    },
    "id": 12,
    "status": "pending",
    "vram": 1425,
    "iframe_html": null,
    "audio_format": "rtsp",
    "audio_url": "rtsp://localhost:8554/mystream",
    "auto_start": true,
    "auto_start_pid": null,
    "base_url": "ws://localhost:8000/backend",
    "sender_token": "_VtvMv3mYKtVs4sbjJC5lGMS-eSOQnnheT6Li30fmr8",
    "listener_token": "aSPhiwbTbHkCQItRpl9s9HZhbKYPrUv00A3aVCTzwRE",
    "relative_path": "/ws/live/12",
    "model": {
      "created_at": "2025-11-13T16:29:53.934766+01:00",
      "updated_at": null,
      "name": "tiny",
      "active": true,
      "vram": 625,
      "id": 1
    },
    "language": {
      "created_at": "2025-11-13T16:29:53.954832+01:00",
      "updated_at": null,
      "iso_639_1": "fr",
      "name": "French",
      "id": 7,
      "active": true
    },
    "translations": [
      {
        "created_at": "2025-11-13T16:29:53.978144+01:00",
        "updated_at": null,
        "model": "opus-mt-fr-en",
        "active": false,
        "vram": 800,
        "id": 589,
        "lang_from": {
          "created_at": "2025-11-13T16:29:53.954832+01:00",
          "updated_at": null,
          "iso_639_1": "fr",
          "name": "French",
          "id": 7,
          "active": true
        },
        "lang_to": {
          "created_at": "2025-11-13T16:29:53.954832+01:00",
          "updated_at": null,
          "iso_639_1": "en",
          "name": "English",
          "id": 1,
          "active": true
        }
      }
    ]
  }
}