Captioning with Kaltura
When media is uploaded to Kaltura, it is automatically captioned using automatic speech recognition software. These machine generated captions help provide a starting point for creating accessible content, however machine captions are not reliable enough to provide equitable access and should be reviewed and edited to ensure accurate representation of audio content in any media. By default the machine generated captions assume that the broadcast language is English, but video owners can request automated captions for any of the 15 other available source language. This is not a translation service, but will improve the direct transcription of the content. Translation services can be used after content is fully corrected for more useful subtitles if desired.
Accessing the Caption Editor
The caption editor can be accessed by editing the media through your MediaSpace account.
- Log in to Kaltura MediaSpace.
- Navigate to the My Media page.
- From the media's menu, select Edit.

- On the Edit Media page, select the Captions tab from the navigation options.
- Select the Edit Captions button to launch the Caption Editor.

Caption Editor Features
The caption editor allows access to all caption files associated with the media. Copies of media often have multiple versions and Zoom recordings may have associated caption files created from other services. These files can be accessed from the Captions dropdown menu if available. The editor has a fully functional search and replace option for quick navigation or bulk replacements, there is the option to add speaker tags to individual or multiple cues at once. The changes can be saved or reverted as desired, and the captions can be set to auto scroll while previewing the media to validate timing and accuracy of the current caption file.
Updating Caption Details
- Select the desired caption file in the Captions drop down.
- Select Edit details to set correct language, accuracy percentage, and label that will appear in the caption selector.
- The default Language is set to English, this can be changed from the drop down if needed.
- The caption fidelity of the captions can be indicated with a slider that adjusts the accuracy percentage, 100 percent is the goal, the default percentage is set from an algorithm but should be adjusted when completing new edits.
- The label needs to be adjusted because Auto Generated is for non-edited caption files before review or manual manipulation. Delete the Auto Generated text from the label after editing the captions and select Save.


Having accurate labels can ensure that the proper file is available for viewers.
The media editor displays all caption files in the captions tab and indicates whether the caption file is displayed on the player and if it is set as the default caption file.

The check mark in the action menu sets the default caption file when enabled on the player by a viewer, the pencil icon for the specific caption file allows for quick access to the language, accuracy, and label, the x icon deletes the file, and the download icon allows for the file to be saved locally. The final icon in the Actions column is a toggle button that allows the caption to be hidden or displayed as an option for viewers. The icon with a line through it indicates that the file is hidden
and is not available to viewers. To share the information with viewers, the player icon should not have the diagonal line across the screen icon
.
Search and Replace Caption Content
The search option allows individuals to find specific words or phrases within the caption file. If you use the replace with option, any instance that matches the search criteria will be replaced with the new content desired.
To use the Replace with feature:
- Type a word into the Search in Captions field to find the instances of that search within the caption file.
- Enter a replacement work or phrase into the Replace with field.
- Select Replace.

Add Speaker Tags
Speaker tags are required to help identify multiple speaker’s specific content.
To use the Add Speaker feature:
- Select the check boxes for an individual multiple caption cue that were spoken by a specific individual.
- Enter the speaker’s name in the Add Speaker field.
- Select Add.

Editing Captions
When captioning content, it is important to follow captioning guidelines to help comply with ADA requirements for fully accessible content.
Auto-generated captions need to be reviewed and usually adjusted for full ADA compliance. When editing captions, it is important to follow the guidelines and best practices for captioning educational videos. The Described and Captioned Media Program provides a comprehensive Caption Key that should be used.
For more information on understanding captions visit the Web Accessibility Initiative site from the World Wide Web Consortium.
Steps to Edit Captions
- Locate the block of text containing content that should be edited.
- Play the media clip to check for accuracy and timing. The captions are highlighted in light blue as they appear in the media.
- Edit the block of text for the caption cue, total characters per block are displayed for reference in real-time.

- Separate the caption blocks as needed following the caption guidelines so that modifiers are not split and so that each caption cue is between 1-6 seconds in duration with 64 characters or less of content on no more than 2 separate lines of text.
- To add new caption cues, select the add caption button that can be activated for the specific location between existing caption cue blocks.
- Select Save.
- A confirmation window will provide an option to select Save to save or Cancel to return to editing.
- Review the captions by viewing the mini media player with the captions enabled.
- When finished captioning the media, remember to save one last time, then select Back to return to the media's Edit page.
Advanced Styling Options
Kaltura's players support advanced styling options of the caption cue blocks, but these changes cannot be applied through the online caption editor. Using WebVTT file formatting, the caption blocks can include basic text styling like bold, italics, or underlines. More importantly the display location can be adjusted to support defined positioning or alignment according to the W3C standard to ensure content isn't being obscured and this can also offer clarity when multiple speakers are contributing to a dialogue.
Steps for including advanced styling
- Open a text editor (You can create a .vtt file using any plain text editor e.g., Notepad, TextEdit, etc.).
- Add the header on the first line by typing WEBVTT.
- Separate each caption cue with a blank line.
- Ensure time range formatting has the proper structure (00:00:00.000 --> 00:00:00.000) and that the caption cues include the desired text.
- Add styling attributes after the time stamp, for example (00:00:00.000 --> 00:00:03.000 line:0) would have that caption block display at the top of the player.
WEBVTT
1
00:00:01.000 --> 00:00:04.000 line:0
Hello, welcome to this video.
2
00:00:05.000 --> 00:00:09.000
This is an example of a WebVTT file.
- Save the file with a .vtt extension and set the encoding to UTF-8.
- Upload the new caption file and change it to the default (if multiple caption files exist) using the Set as default link.
