Published: August 20, 2020
Captions and screen reader descriptions are the only way many users can experience your videos, and in some jurisdictions, they're even required by law or regulation. The WebVTT (Web Video Text Tracks) format is used to describe timed text data, such as closed captions or subtitles, to make your videos more accessible.
Add <track>
tags
To add captions or screen reader descriptions to a web video, add a <track>
tag within a <video>
tag. In addition to captions and screen reader
descriptions, <track>
tags may also be used for subtitles and chapter titles.
The <track>
tag is similar to the <source>
element in that both have a src
attribute that points to referenced content. For a <track>
tag, it points to a
WebVTT file.
The label
attribute specifies how a particular track can be
identified in the interface.
To provide tracks for multiple languages add a separate <track>
tag for each
WebVTT file you're providing and indicate the language using the srclang
attribute.
Take a look at this example <video>
tag with two <track>
tags.
Add a <track>
element as a child of the <video>
element.
<video controls>
<source src="https://storage.googleapis.com/webfundamentals-assets/videos/chrome.webm" type="video/webm" />
<source src="https://storage.googleapis.com/webfundamentals-assets/videos/chrome.mp4" type="video/mp4" />
<track src="chrome-subtitles-en.vtt" label="English captions" kind="captions" srclang="en" default>
<track src="chrome-subtitles-zh.vtt" label="中文字幕" kind="captions" srclang="zh">
<p>This browser does not support the video element.</p>
</video>
There's also a sample you can view on Glitch.
WebVTT file structure
Here's a hypothetical WebVTT file for the demo. This is a text file containing a series of cues. Each cue is a block of text to display on screen, and the time range during which it's displayed.
WEBVTT
00:00.000 --> 00:04.999
Man sitting on a tree branch, using a laptop.
00:05.000 --> 00:08.000
The branch breaks, and he starts to fall.
...
Each item within the track file is a cue. Each cue has a start time and
end time, separated by an arrow, followed by cue text. Cues can also have
IDs, such as railroad
and manuscript
. Cues are separated by an empty line.
WEBVTT
railroad
00:00:10.000 --> 00:00:12.500
Left uninspired by the crust of railroad earth
manuscript
00:00:13.200 --> 00:00:16.900
that touched the lead to the pages of your manuscript.
Cue times are in hours:minutes:seconds.milliseconds
format. Parsing is strict.
Meaning, numbers must be zero padded if necessary: hours, minutes, and seconds
must have two digits (00 for a zero value) and milliseconds must have three
digits (000 for a zero value). There is an excellent WebVTT validator at
Live WebVTT Validator, which checks for errors in time formatting, and
problems such as non-sequential times.
You can create a VTT file by hand, thought there are many services that create them for you.
As you can see in our previous examples, the WebVTT format is pretty simple. Just add your text data along with timing.
However, what if you want your captions to render in a different position with
left or right alignment? Perhaps to align the captions with the current speaker
position, or to stay out of the way of in-camera text. WebVTT defines settings to do that,
and more, directly inside the
.vtt
file. Take note of how the caption placement is defined by adding
settings after the time interval definitions.
WEBVTT
00:00:05.000 --> 00:00:10.000 line:0 position:20% size:60% align:start
The first line of the subtitles.
Another handy feature is the ability to style cues using CSS. Perhaps you want
to use a gray linear gradient as the background, with a foreground color of
papayawhip
for all captions and all bold text colored peachpuff
.
video::cue {
background-image: linear-gradient(to bottom, dimgray, lightgray);
color: papayawhip;
}
video::cue(b) {
color: peachpuff;
}
If you're interested in learning more about styling and tagging of individual cues, the WebVTT specification is a good source for advanced examples.
Kinds of text tracks
Did you notice the kind
attribute of the <track>
element? It's used to
indicate what relation the particular text track has to the video. The
possible values of the kind
attribute are:
captions
: For closed captions from transcripts and possibly translations of any audio. Suitable for hearing impaired and in cases when the video is playing muted.subtitles
: For subtitles, that is, translations of speech and text in a language different from the main language of the video.descriptions
: For descriptions of visual parts of the video content. Suitable for visually impaired people.chapters
: Intended to be displayed when the user is navigating within the video.metadata
: Not visible, and may be used by scripts.
Now that you understand the basics of making a video available and accessible on your web page, you might wonder about more complex use cases. Learn about Media frameworks and how they can help you add videos to your web page, while providing advanced features.