Create AI ASR task
Transcribing is the process of writing down the words you hear in an audio. Our solution allows you to transcribe audio from your video and get subtitles automatically. To do this, we use modern AI models. The result:
- Transcription – subtitles in the original language. I.e. audio is in English – subtitles are in English too, audio is in German – subtitles are in German too.
- Translation – subtitles is translated from the original language to any other language. How to use?
- Explicit call to this AI method. Applicapbe for any file stored with us or located on the Internet.
- Standard video upload but with automatic subtitle generation. Look at “VOD uploading”.
What language will the subtitles be in?
You can specify the language explicitly, then it will be used to create subtitles: the source language in the audio, the resulting subtitle language.
If this is not set, the system will run auto language identification and the subtitles will be in the detected language. The method also works based on AI analysis.
Additionally, when this is not set, we also support recognition of alternate languages in the video (code-switching). For example, when in a video different speakers speak several languages, or when they switch from their native language to English and back. Thus when you have multiple languages in the video it is better to not specify an “
audio_language
” otherwise AI may force the system to recognize gibberish.
What can be transcribed? Service uses additional methods to detect presence of speech in audio track, thus improving the detection of any human conversations:
- Speech of one speaker,
- Speech of several speakers,
- Speech in different languages,
- etc Restriction on music, lyrics most likely will not be created.
What about translation?
It is also possible to automatically translate from the original language to another you need.
To create a translation, specify the desired language explicitly in “subtitles_language
” parameter. Otherwise, the subtitles will be in the original language. Translation into different languages should be done by creating separate tasks.
Use MP4 videos to process. This method is not tied to videos that are stored only in our video hosting (look at how get a link to MP4 rendition), so you can use links to any other external file with HTTP/HTTPS access.
For now, only the first audio track can be processed; later this functionality will be improved to allow to use any.
Also, not all language pairs are currently supported. If a language pair is not supported for automatic translation, the task status will be FAILURE with description of the reason.
Example: eng => uzb
.
You can request to add the language pair you need for automatic translation. Contact our support.
Example of modes to transcibe and/or translate:
- Auto language detection:
{ "url":"..." }
- From German language explicitly :
{ "url":"...", "
audio_language":"ger" }
- From any auto-detected to English language explicitly:
{ "url":"...", "
subtitles_language":"eng" }
- From German language to English language explicitly:
{ "url":"...", "
audio_language":"ger", "
subtitles_language":"eng" }
Example of setting a task to process MP4 file (animated gif from above):
curl -L 'https://api.gcore.com/streaming/ai/transcribe' \
-H 'Content-Type: application/json' \
-H 'Authorization: APIKey 1234$abcd...' \
-d '{
"url": "https://demo-files.gvideo.io/apidocs/spritefright-blender-cut30sec.mp4"
}'
As described above, transcription is done automatically using AI. Therefore, the quality may differ from a manual transcription by a professional person. If this happens to you, then you can download subtitles and change them in an external editor.
Transcription and translation are 2 different AI tasks:
- Transcription is set only for transcription.
- Translation, if non-original languages are set for translation. Billing takes into account the duration of the analyzed original video.
The heart for transcribing is the AI model Whisper from OpenAI, with additional optimisations and services. The AI models run on our own infrastructure, so the files/data are not transferred anywhere to external services. After processing, origianl files are also deleted from local storage of AI. Read more detailed information about our solution, and architecture, and benefits in the knowledge base and blog.
Authorizations
API key for authentication.
Body
Response
Response returns ID of the created AI task. Using this AI task ID, you can check the status and get the video processing result. Look at GET /ai/results method.
The response is of type object
.