A transcript extension for HTML
A transcript extension for HTML
W3C
Working Group Note
01 October 2015
This version:
Latest published version:
Latest editor's draft:
Editor:
Chaals McCathie Nevile
Yandex
2015
W3C
MIT
ERCIM
Keio
Beihang
).
W3C
liability
trademark
and
document use
rules apply.
Abstract
This document describes an extension to HTML which explicitly identifies a transcript linked to a media object such as audio or
video.
It was created to meet requirements for transcriptions
that are described in the
Media Accessibility User Requirements (MAUR)
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current
W3C
publications and the latest revision of this technical report can be found in the
W3C
technical reports index
at http://www.w3.org/TR/.
This document was developed through the
HTML Accessibility Taskforce
, and is published by the
HTML Working Group
with approval by the
Protocols and Formats Working Group
It is published as a Working Group Note because the HTML
Working Group is reaching the end of its charter;
it is hoped and expected that a successor to that Working Group,
perhaps the Timed Media Working Group,
will continue the work of taking this document to become
a W3C Recommendation.
If you wish to make comments regarding this document, please send them to
public-html-a11y@w3.org
archives
).

All comments are welcome. However, with the end of the charter of the
HTML Working Group, comments might not receive a reply until this work
is taken up again by a different Working Group.
Publication as a Working Group Note does not imply endorsement by the
W3C
Membership. This is a draft document and may be updated, replaced or obsoleted by other
documents at any time. It is inappropriate to cite this document as other than work in
progress.
This document was produced by

a group
operating under the
5 February 2004
W3C
Patent
Policy
W3C
maintains a
public list of any patent
disclosures
made in connection with the deliverables of

the group; that page also includes

instructions for disclosing a patent. An individual who has actual knowledge of a patent
which the individual believes contains
Essential
Claim(s)
must disclose the information in accordance with
section
6 of the
W3C
Patent Policy
This document is governed by the
1 September 2015
W3C
Process Document
Table of Contents
1.
Introduction
2.
Conformance
3.
Use cases and requirements
3.1
Use cases
3.2
Requirements
4.
Denoting a transcript
4.1
The
transcript
element
5.
Linking transcripts
5.1
Extending
track
to allow
kind="
transcript
5.1.1
Example 1. Extending allowable
track
kind
6.
Acknowledgements
7.
Appendix: Alternative approaches
7.1
Alternative approach: create a new element
7.1.1
The
relateTranscript
element
7.1.2
Example 2: Using a
relateTranscript
element
7.2
Alternative approach: Use the
source
element
7.3
Alternative approach: Use the
element with
rel
and
for
attributes
7.3.1
Example 4: Using the
element with
rel
and
for
attribute
7.4
Alternative approach: Use an attribute
7.4.1
Example 5: Using a
relateTranscript
attribute
A.
References
A.1
Normative references
1.
Introduction
This section is non-normative.
HTML5
] allows the use of audio or video, and includes mechanisms for associating multiple timed tracks. But in the case where
there is a transcript, which
may not
include timing information, there is no way to provide an explicit
association between it and its associated media element.
2.
Conformance
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples,
and notes in this specification are non-normative. Everything else in this specification is
normative.
Throughout this document the terms
must
and
may
must
be interpreted in accordance with [
RFC2119
].
3.
Use cases and requirements
Over a number of years
use cases
and
requirements
have been extensively discussed by the HTML Working Group, in particular by its accessibility and media task forces. The following
section is intended to provide a brief summary of the key information.
3.1
Use cases
This section is non-normative.
Saving bandwidth - inline transcript
A user chooses to read a transcript included in the same page as a media resource, because their connection will not support
downloading the full media resource. Requires:
Explicit relation
Users who cannot determine from the default page formatting that part of the page is the transcript need a mechanism that
allows an assistive technology to do so
Delineation
Assistive technology needs a way to determine the extent of the transcript - i.e. where it begins and ends
Optional Consumption
For assistive technology users, the ability to skip over the transcript e.g. when navigating through the page is important
Inline transcript
This is assumed by this use case. For many users who are not relying on assistive technologies, seeing the transcript in
the page is sufficient to enable this use case.
Using the transcript for accessibility
A user reads a linked transcript because the media resource in its original format is inaccessible to them due to a
disability. Requires:
Explicit relation
The user needs a way to know that there is a transcript available. In certain cases such as where they have to make a
choice based on what kind of resource will be
more
accessible to them, they need a way for their assistive
technology to determine that there is a transcript.
Optional consumption
It is important that the user not be forced to read the transcript every time they navigate the page - just as users are
not forced to re-watch a full video every time they scroll past it.
Interactive transcript as controller
A user agent renders a transcript which includes timing information alongside the media resource. Navigating to a particular
point in the transcript scrubs through the media resource to that point. Requires:
Explicit relation
The user agent needs to determine that a particular resource is a transcript
Delineation
The user agent needs to identify exactly the content that is part of the transcript
Format agnostic
Publishers need to be able to provide transcripts in a format the user agent can use as a controller
Using a transcript to improve video/audio search
A search engine uses the explicit association of a transcript to collect textual information that can be reliably associated
with an audio or video resource, to improve discoverability of the resource through text-based search. Requires:
Explicit relation
The search engine needs to determine that a particular resource is a transcript
Increasing multilingual discovery of resources
A publisher produces multiple translated transcripts of a media resource in order to improve discoverability of and access to
the media resource. Requires:
Multiple transcripts
The publisher needs to link multiple transcripts to the same media resource.
Linked transcript
Many publishers do not want to include multiple transcripts in different languages inline in a page.
Use what is there
A publisher links a script which is available as PDF or Word document to provide a basic workable transcript for a media
resource that would otherwise be inaccessible to some users. Requires:
Format-agnostic
The publisher needs to be able to associate whatever resource is available
Linked transcript
The publisher needs to be able to link resources which are not HTML. Content management workflows need to allow for
resources kept and published separately.
3.2
Requirements
Delineation
It
must
be possible to determine which part of a larger resource such as an HTML page is the
transcript included within that resource.
Explicit relation
It
must
be possible to unambiguously determine that a resource is a transcript for a given media
resource
Format agnostic
It
must
be possible to use an arbitrary format for a transcript
Inline transcript
It
must
be possible to include the content transcript in the same page as a media resource
Linked transcript
It
must
be possible to link a media resource to an external transcript
Multiple transcripts
It
must
be possible to include more than one transcript for a media resource. It
should
be possible to differentiate transcripts for a given media resource to allow easy selection of the appropriate transcript for a
given use case
Optional Consumption
It
should
be possible for a user to choose whether or not to read the transcript. Note that this is
particularly important for users interacting with their system through speech output, or for whom large amounts of text make
content substantially more difficult to use.
4.
Denoting a transcript
To meet the
delineation requirement
this specification defines a new
transcript
element as follows:
4.1
The
transcript
element
Categories
Flow content
Palpable content
Contexts in which
this element can be used
Where
flow content
is expected.
Content
model
Flow content
Content
attributes
Global attributes
Tag
omission in text/html
Neither tag is omissible
Allowed
ARIA role attribute
values:
Any
role value
Allowed
ARIA state and property attributes
Global aria-* attributes
Any
aria-*
attributes
applicable
to the allowed roles
DOM interface
interface
HTMLTranscriptElement
HTMLElement
};
The
transcript
element can contain any content. It represents a transcript for a media resource.
5.
Linking transcripts
transcript
may
include timing information, machine-readable or otherwise. The preferred
solution includes the link to the transcript within the media element for which it is a transcript, and adds a transcript element as
a container for a transcript. This can be included on the page in which the media object is embedded, which is a common use in
practice, or can serve to separate multiple transcripts collected in a single page.
5.1
Extending
track
to allow
kind="
transcript
This proposal adds
transcript
to the set of values defined for the
kind
attribute of the
track
element. This requires adding an entry to the
table
of values defined for the attribute
in [
HTML5
], as follows:
Keyword
State
Brief description
transcript
transcript
Tracks intended to permit use independent of media source. May be displayed by the user agent instead of, or supplementary
to, the media resource.
Issue 1
An objection that has been raised to this method is that it requires a potential change to the
current
definition of the
track
element in HTML5
, which says that it
allows
authors to specify explicit external timed
text tracks
for
media
elements
unless a transcript with no timing information included can be considered a "timed text track". However, this
definition also appears to conflict with the allowed
metadata
state for tracks, so will probably be changed anyway.
5.1.1
Example 1. Extending allowable
track
kind
Example 1
controls
src
"video.rm"

kind
"transcript"
title
"English transcript"
href
"#theText"

kind
"transcript"
hreflang
"fr"
href
"http://transcripts.example.fr/qqchose#laTexte"
lang
"fr"
title
"Transcription en français"
kind
"captions"
src
"#YouGetTheIdea,Right?"
lang
"ru"

transcript
id
"theText"
This is the english language
transcript...
transcript
6.
Acknowledgements
This section is non-normative.
The editor would like to acknowledge the awe-filled
respec
github
and
BlueGriffon
, as well as direct contributions to this document by:
Paul Cotton, Daniel Davis, Joan-Marie Diggs, Steve Faulkner, John Foliot, Theresa O'Connor, Silvia Pfeiffer, Janina Sajka, Richard Schwerdtfeger, Cynthia Shelly,
Léonie Watson, and
W3C
's HTML Media Task Force
The editor would like to apologise to anybody whose name was left out of this list, and invites corrections.
7.
Appendix: Alternative approaches
This section is non-normative.
Several other approaches have been considered to meeting the requirements. They are included here in outline, with some notes, for completeness. This appendix is expected to be removed before requesting advancement to Candidate Recommendation.
7.1
Alternative approach: create a new element
Add a new element to HTML representing a link to a transcript for the parent media resource. This requires choosing a name - in
the following we have used
relateTranscript
as a placeholder name, to avoid conflicting with the proposed
transcript
container element - and defining a new element definition as follows:
7.1.1
The
relateTranscript
element
Categories
None.
Contexts in which
this element can be used
As a child of a
media element
Content
model
Empty.
Content
attributes
Global attributes
src
- URL of the transcript
type
- the MIME type of the transcript
Tag
omission in text/html
No
end tag
Allowed
ARIA role attribute
values:
Issue 2
This could be rendered as a
liveregion
controlled by the media resource, or a control for the
media resource.
Allowed
ARIA state and property attributes
Issue 3
rendering, interactive states?
Any
aria-*
attributes
applicable
to the allowed roles
DOM interface
interface
HTMLRelateTranscriptElement
HTMLElement
attribute
DOMString
src
attribute
DOMString
type
attribute
DOMString
media
};
7.1.2
Example 2: Using a
relateTranscript
element
Example 2
controls
src
"video.rm"

relateTranscript
title
"English transcript"
href
"#theText"

relateTranscript
hreflang
"fr"
href
"http://transcripts.example.fr/qqchose#laTexte"
lang
"fr"
title
"Transcription en francais"
kind
"captions"
src
"YouGetTheIdea?Right"
lang
"ru"

transcript
id
"theText"
This is the english language
transcript...
transcript
7.2
Alternative approach: Use the
source
element
The
source
element represents a version of the media resource that can be presented as an alternative to others.
This is what a transcript is.
This approach is not preferred as it will involve complex changes.
Issue 4
The element currently allows a MIME
type
attribute and a
media
query that can be used
to determine when to render a given version. However, although transcripts are likely to have MIME types that are different from
those used for audio or video resources, relying on this difference as a heuristic seems a weak approach to identifying a
transcript.
7.3
Alternative approach: Use the
element with
rel
and
for
attributes
This meets the requirements, but requires defining a new value of
rel
, and changes to the
for
attribute.
Issue 5
Separating the link from the video code requires developers to include it in the visible content of the page, which
leads many developers to try and hide it in the default presentation. A common result is that it is not available to people who
need it, such as users with low vision, or is invisible but can be activated, confusing users.
Issue 6
Separating the link from the block of code can lead to it being lost when the source is copied to be used
elsewhere.
7.3.1
Example 4: Using the
element with
rel
and
for
attribute
Example 3



rel
"transcript"
for
"theVideo"
title
"English transcript"
href
"#theText"
Transcript below


rel
"transcript"
for
"theVideo"
hreflang
"fr"
href
"http://transcripts.example.fr/qqchose#laTexte"
lang
"fr"
title
"Transcription en francais"
transcription aussi disponible en français


controls
id
"theVideo"
src
"video.rm"
kind
"captions"
src
"YouGetTheIdea?Right"
lang
"ru"

transcript
id
"theText"
This is the english language
transcript...
transcript
7.4
Alternative approach: Use an attribute
An attribute could be defined, analogous to the
longdesc
attribute for images.
This approach is not preferred as it makes it very difficult to meet all of the
multiple transcripts
requirement
Issue 7
Allowing a space-separated list of URLs does not provide any information to help choose which transcript to link
to, or use.
7.4.1
Example 5: Using a
relateTranscript
attribute
Example 4
controls
relateTranscript
"#theText
src
"video.rm"
kind
"captions"
src
"YouGetTheIdea?Right"
lang
"ru"

transcript
id
"theText"
This is the english language
transcript...
transcript
A.
References
A.1
Normative references
[HTML5]
Ian Hickson; Robin Berjon; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Theresa O'Connor; Silvia Pfeiffer.
HTML5
. 28 October 2014. W3C Recommendation. URL:
[RFC2119]
S. Bradner.
Key words for use in RFCs to Indicate Requirement Levels
. March 1997. Best Current Practice. URL: