> ## Documentation Index
> Fetch the complete documentation index at: https://gcore.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Create AI ASR task

> Transcribing is the process of writing down the words you hear in an audio.
Our solution allows you to transcribe audio from your video and get subtitles automatically. To do this, we use modern AI models.

The result:
- Transcription – subtitles in the original language. I.e. audio is in English – subtitles are in English too, audio is in German – subtitles are in German too.
- Translation – subtitles is translated from the original language to any other language. 

**How to use?**

- Explicit call to this AI method. Applicapbe for any file stored with us or located on the Internet.
- Standard video upload but with automatic subtitle generation. Look at ["VOD uploading"](/docs/api-reference/streaming/videos/create-video).


**What language will the subtitles be in?**

You can specify the language explicitly, then it will be used to create subtitles: the source language in the audio, the resulting subtitle language.
If this is not set, the system will run auto language identification and the subtitles will be in the detected language. The method also works based on AI analysis.


Additionally, when this is not set, we also support recognition of alternate languages in the video (code-switching). For example, when in a video different speakers speak several languages, or when they switch from their native language to English and back. Thus when you have multiple languages in the video it is better to not specify an "audio_language" otherwise AI may force the system to recognize gibberish.


  


**What can be transcribed?**

Service uses additional methods to detect presence of speech in audio track, thus improving the detection of any human conversations:
- Speech of one speaker,
- Speech of several speakers,
- Speech in different languages,
- etc

Restriction on music, lyrics most likely will not be created.

  


**What about translation?**

It is also possible to automatically translate from the original language to another you need. 

To create a translation, specify the desired language explicitly in "subtitles_language" parameter. Otherwise, the subtitles will be in the original language. Translation into different languages should be done by creating separate tasks.

![Auto generated subtitles example](https://demo-files.gvideo.io/apidocs/captions.gif)


Use MP4 videos to process. This method is not tied to videos that are stored only in our video hosting (look at how get a link to MP4 rendition), so you can use links to any other external file with HTTP/HTTPS access. 

  


For now, only the first audio track can be processed; later this functionality will be improved to allow to use any.
Also, not all language pairs are currently supported. If a language pair is not supported for automatic translation, the task status will be FAILURE with description of the reason.
Example: ```eng => uzb```.
You can request to add the language pair you need for automatic translation. Contact our support.
  


Example of modes to transcibe and/or translate:
- Auto language detection: `{ "url":"..." }`
- From German language explicitly : `{ "url":"...", "audio_language":"ger" }`
- From any auto-detected to English language explicitly: `{ "url":"...", "subtitles_language":"eng" }`
- From German language to English language explicitly: `{ "url":"...", "audio_language":"ger", "subtitles_language":"eng" }` 

Example of setting a task to process MP4 file (animated gif from above):

```
curl -L 'https://api.gcore.com/streaming/ai/transcribe' \
-H 'Content-Type: application/json' \
-H 'Authorization: APIKey 1234$abcd...' \
-d '{
    "url": "https://demo-files.gvideo.io/apidocs/spritefright-blender-cut30sec.mp4" 
}' 
```


As described above, transcription is done automatically using AI. Therefore, the quality may differ from a manual transcription by a professional person. If this happens to you, then you can download subtitles and change them in an external editor.

  


Transcription and translation are 2 different AI tasks:
- Transcription is set only for transcription.
- Translation, if non-original languages are set for translation.

Billing takes into account the duration of the analyzed original video.

  


The heart for transcribing is the AI model Whisper from OpenAI, with additional optimisations and services. The AI models run on our own infrastructure, so the files/data are not transferred anywhere to external services. After processing, origianl files are also deleted from local storage of AI.

Read more detailed information about our solution, and architecture, and benefits in the knowledge base and blog.



## OpenAPI

````yaml /api-reference/services_documented/streaming_api.yaml post /streaming/ai/tasks#transcribe
openapi: 3.1.0
info:
  title: Gcore OpenAPI – Streaming API
  description: >-
    This OpenAPI is an aggregated OpenAPI specification that unifies all Gcore
    products into a single file. It covers Cloud, CDN, DNS, WAAP, DDoS
    Protection, Object Storage, Streaming, and FastEdge services.
  version: '2026-05-15T06:37:28.230198+00:00'
servers:
  - url: https://api.gcore.com
security:
  - APIKey: []
tags:
  - name: AI
    x-displayName: AI
  - name: Broadcasts
    x-displayName: Broadcasts
  - name: Directories
    x-displayName: Directories
  - name: Overlays
    x-displayName: Overlays
  - name: Players
    x-displayName: Players
  - name: Playlists
    x-displayName: Playlists
  - name: QualitySets
    x-displayName: QualitySets
  - name: Restreams
    x-displayName: Restreams
  - name: Streaming Statistics
    x-displayName: Statistics
  - name: Streams
    x-displayName: Streams
  - name: Subtitles
    x-displayName: Subtitles
  - name: Videos
    x-displayName: Videos
paths:
  /streaming/ai/tasks#transcribe:
    post:
      tags:
        - AI
      summary: Create AI ASR task
      description: >-
        Transcribing is the process of writing down the words you hear in an
        audio.

        Our solution allows you to transcribe audio from your video and get
        subtitles automatically. To do this, we use modern AI models.


        The result:

        - Transcription – subtitles in the original language. I.e. audio is in
        English – subtitles are in English too, audio is in German – subtitles
        are in German too.

        - Translation – subtitles is translated from the original language to
        any other language. 


        **How to use?**


        - Explicit call to this AI method. Applicapbe for any file stored with
        us or located on the Internet.

        - Standard video upload but with automatic subtitle generation. Look at
        ["VOD uploading"](/docs/api-reference/streaming/videos/create-video).



        **What language will the subtitles be in?**


        You can specify the language explicitly, then it will be used to create
        subtitles: the source language in the audio, the resulting subtitle
        language.

        If this is not set, the system will run auto language identification and
        the subtitles will be in the detected language. The method also works
        based on AI analysis.



        Additionally, when this is not set, we also support recognition of
        alternate languages in the video (code-switching). For example, when in
        a video different speakers speak several languages, or when they switch
        from their native language to English and back. Thus when you have
        multiple languages in the video it is better to not specify an
        "audio_language" otherwise AI may force the system to recognize
        gibberish.


          


        **What can be transcribed?**


        Service uses additional methods to detect presence of speech in audio
        track, thus improving the detection of any human conversations:

        - Speech of one speaker,

        - Speech of several speakers,

        - Speech in different languages,

        - etc


        Restriction on music, lyrics most likely will not be created.

          


        **What about translation?**


        It is also possible to automatically translate from the original
        language to another you need. 


        To create a translation, specify the desired language explicitly in
        "subtitles_language" parameter. Otherwise, the subtitles will be in the
        original language. Translation into different languages should be done
        by creating separate tasks.


        ![Auto generated subtitles
        example](https://demo-files.gvideo.io/apidocs/captions.gif)



        Use MP4 videos to process. This method is not tied to videos that are
        stored only in our video hosting (look at how get a link to MP4
        rendition), so you can use links to any other external file with
        HTTP/HTTPS access. 

          


        For now, only the first audio track can be processed; later this
        functionality will be improved to allow to use any.

        Also, not all language pairs are currently supported. If a language pair
        is not supported for automatic translation, the task status will be
        FAILURE with description of the reason.

        Example: ```eng => uzb```.

        You can request to add the language pair you need for automatic
        translation. Contact our support.
          


        Example of modes to transcibe and/or translate:

        - Auto language detection: `{ "url":"..." }`

        - From German language explicitly : `{ "url":"...",
        "audio_language":"ger" }`

        - From any auto-detected to English language explicitly: `{ "url":"...",
        "subtitles_language":"eng" }`

        - From German language to English language explicitly: `{ "url":"...",
        "audio_language":"ger", "subtitles_language":"eng" }` 


        Example of setting a task to process MP4 file (animated gif from above):


        ```

        curl -L 'https://api.gcore.com/streaming/ai/transcribe' \

        -H 'Content-Type: application/json' \

        -H 'Authorization: APIKey 1234$abcd...' \

        -d '{
            "url": "https://demo-files.gvideo.io/apidocs/spritefright-blender-cut30sec.mp4" 
        }' 

        ```



        As described above, transcription is done automatically using AI.
        Therefore, the quality may differ from a manual transcription by a
        professional person. If this happens to you, then you can download
        subtitles and change them in an external editor.

          


        Transcription and translation are 2 different AI tasks:

        - Transcription is set only for transcription.

        - Translation, if non-original languages are set for translation.


        Billing takes into account the duration of the analyzed original video.

          


        The heart for transcribing is the AI model Whisper from OpenAI, with
        additional optimisations and services. The AI models run on our own
        infrastructure, so the files/data are not transferred anywhere to
        external services. After processing, origianl files are also deleted
        from local storage of AI.


        Read more detailed information about our solution, and architecture, and
        benefits in the knowledge base and blog.
      operationId: post_ai_transcribe
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ai_transcribe'
      responses:
        '201':
          description: >-
            Response returns ID of the created AI task. Using this AI task ID,
            you can check the status and get the video processing result. Look
            at GET /ai/results method.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ai_post_response'
        '400':
          description: |-
            Bad request:
            - "url" is not specified,
            - Queue limit reached (100), try later,
            - etc
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/streaming_error'
        '422':
          description: >-
            This is advanced functionality; to enable it, contact your manager
            or the Support Team.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/upgraderequired'
components:
  schemas:
    ai_transcribe:
      type: object
      required:
        - url
        - task_name
      properties:
        task_name:
          type: string
          description: Name of the task to be performed
          enum:
            - transcription
        url:
          type: string
          description: >-
            URL to the MP4 file to analyse. File must be publicly accessible via
            HTTP/HTTPS.
        audio_language:
          type: string
          description: >-
            Language in original audio (transcription only). This value is used
            to determine the language from which to transcribe.


            If this is not set, the system will run auto language identification
            and the subtitles will be in the detected language. The method also
            works based on AI analysis. It's fairly accurate, but if it's wrong,
            then set the language explicitly.


            Additionally, when this is not set, we also support recognition of
            alternate languages in the video (language code-switching).


            Language is set by 3-letter language code according to ISO-639-2
            (bibliographic code). 


            We can process languages:

            - 'afr': Afrikaans

            - 'alb': Albanian

            - 'amh': Amharic

            - 'ara': Arabic

            - 'arm': Armenian

            - 'asm': Assamese

            - 'aze': Azerbaijani

            - 'bak': Bashkir

            - 'baq': Basque

            - 'bel': Belarusian

            - 'ben': Bengali

            - 'bos': Bosnian

            - 'bre': Breton

            - 'bul': Bulgarian

            - 'bur': Myanmar

            - 'cat': Catalan

            - 'chi': Chinese

            - 'cze': Czech

            - 'dan': Danish

            - 'dut': Nynorsk

            - 'eng': English

            - 'est': Estonian

            - 'fao': Faroese

            - 'fin': Finnish

            - 'fre': French

            - 'geo': Georgian

            - 'ger': German

            - 'glg': Galician

            - 'gre': Greek

            - 'guj': Gujarati

            - 'hat': Haitian creole

            - 'hau': Hausa

            - 'haw': Hawaiian

            - 'heb': Hebrew

            - 'hin': Hindi

            - 'hrv': Croatian

            - 'hun': Hungarian

            - 'ice': Icelandic

            - 'ind': Indonesian

            - 'ita': Italian

            - 'jav': Javanese

            - 'jpn': Japanese

            - 'kan': Kannada

            - 'kaz': Kazakh

            - 'khm': Khmer

            - 'kor': Korean

            - 'lao': Lao

            - 'lat': Latin

            - 'lav': Latvian

            - 'lin': Lingala

            - 'lit': Lithuanian

            - 'ltz': Luxembourgish

            - 'mac': Macedonian

            - 'mal': Malayalam

            - 'mao': Maori

            - 'mar': Marathi

            - 'may': Malay

            - 'mlg': Malagasy

            - 'mlt': Maltese

            - 'mon': Mongolian

            - 'nep': Nepali

            - 'dut': Dutch

            - 'nor': Norwegian

            - 'oci': Occitan

            - 'pan': Punjabi

            - 'per': Persian

            - 'pol': Polish

            - 'por': Portuguese

            - 'pus': Pashto

            - 'rum': Romanian

            - 'rus': Russian

            - 'san': Sanskrit

            - 'sin': Sinhala

            - 'slo': Slovak

            - 'slv': Slovenian

            - 'sna': Shona

            - 'snd': Sindhi

            - 'som': Somali

            - 'spa': Spanish

            - 'srp': Serbian

            - 'sun': Sundanese

            - 'swa': Swahili

            - 'swe': Swedish

            - 'tam': Tamil

            - 'tat': Tatar

            - 'tel': Telugu

            - 'tgk': Tajik

            - 'tgl': Tagalog

            - 'tha': Thai

            - 'tib': Tibetan

            - 'tuk': Turkmen

            - 'tur': Turkish

            - 'ukr': Ukrainian

            - 'urd': Urdu

            - 'uzb': Uzbek

            - 'vie': Vietnamese

            - 'wel': Welsh

            - 'yid': Yiddish

            - 'yor': Yoruba
          default: null
        subtitles_language:
          type: string
          description: >-
            Indicates which language it is clearly necessary to translate into.

            If this is not set, the original language will be used from
            attribute "audio_language".


            Please note that:

            - transcription into the original language is a free procedure,

            - and translation from the original language into any other
            languages is a "translation" procedure and is paid. More details in
            [POST
            /streaming/ai/tasks#transcribe](/docs/api-reference/streaming/ai/create-ai-asr-task).

            Language is set by 3-letter language code according to ISO-639-2
            (bibliographic code).
          default: null
        client_user_id:
          type: string
          maxLength: 256
          default: null
          description: >-
            Meta parameter, designed to store your own identifier. Can be used
            by you to tag requests from different end-users. It is not used in
            any way in video processing.
        client_entity_data:
          type: string
          maxLength: 4096
          default: null
          description: >-
            Meta parameter, designed to store your own extra information about a
            video entity: video source, video id, etc. It is not used in any way
            in video processing.


            For example, if an AI-task was created automatically when you
            uploaded a video with the AI auto-processing option (transcribing,
            translationing), then the ID of the associated video for which the
            task was performed will be explicitly indicated here.
      example:
        url: https://demo-files.gvideo.io/apidocs/spritefright-blender-cut30sec.mp4
        task_name: transcription
        audio_language: ger
    ai_post_response:
      type: object
      required:
        - task_id
      properties:
        task_id:
          type: string
          format: uuid
          description: >-
            ID of the created AI task, from which you can get the execution
            result
      example:
        task_id: aafe70c6-0000-0000-0000-327b65f7670f
    streaming_error:
      type: object
      properties:
        error:
          type: string
          description: Text message with description of error.
      example:
        error: Queue limit reached (100), try later.
    upgraderequired:
      type: object
      properties:
        error:
          type: string
          description: >-
            This is advanced functionality; to enable it, contact your manager
            or support service.
      example:
        error: Feature is disabled. Contact support to enable.
  securitySchemes:
    APIKey:
      description: >-
        API key for authentication. Make sure to include the word `apikey`,
        followed by a single space and then your token.

        Example: `apikey 1234$abcdef`
      type: apiKey
      in: header
      name: Authorization

````