Voice Extensible Markup Language (VoiceXML) Version 2.0 Voice Extensible Markup Language (VoiceXML) Version 2.0 W3C Recommendation 16 March 2004 This Version: Latest Version: Previous Version: Editors: Scott McGlashan, Hewlett-Packard (Editor-in-Chief) Daniel C. Burnett, Nuance Communications Jerry Carter, Invited Expert Peter Danielsen, Lucent (until October 2002) Jim Ferrans, Motorola Andrew Hunt, ScanSoft Bruce Lucas, IBM Brad Porter, Tellme Networks Ken Rehor, Vocalocity Steph Tryphonas, Tellme Networks Please refer to the errata for this document, which may include some normative corrections. See also translations W3C MIT ERCIM Keio ), All Rights Reserved. W3C liability trademark document use and software licensing rules apply. Abstract This document specifies VoiceXML, the Voice Extensible Markup Language. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications. Status of this Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/. This document has been reviewed by W3C Members and other interested parties, and it has been endorsed by the Director as a W3C Recommendation . W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionaility and interoperability of the Web. This specification is part of the W3C Speech Interface Framework and has been developed within the W3C Voice Browser Activity by participants in the Voice Browser Working Group W3C Members only ). The design of VoiceXML 2.0 has been widely reviewed (see the disposition of comments ) and satisfies the Working Group's technical requirements A list of implementations is included in the VoiceXML 2.0 implementation report , along with the associated test suite. Comments are welcome on www-voice@w3.org archive ). See W3C mailing list and archive usage guidelines The W3C maintains a list of any patent disclosures related to this work Conventions of this Document In this document, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in [RFC2119] and indicate requirement levels for compliant VoiceXML implementations. Table of Contents Abbreviated Contents 1. Overview 2. Dialog Constructs 3. User Input 4. System Output 5. Control flow and scripting 6. Environment and Resources Appendices Full Contents 1. Overview 1.1 Introduction 1.2 Background 1.2.1 Architectural Model 1.2.2 Goals of VoiceXML 1.2.3 Scope of VoiceXML 1.2.4 Principles of Design 1.2.5 Implementation Platform Requirements 1.3 Concepts 1.3.1 Dialogs and Subdialogs 1.3.2 Sessions 1.3.3 Applications 1.3.4 Grammars 1.3.5 Events 1.3.6 Links 1.4 VoiceXML Elements 1.5 Document Structure and Execution 1.5.1 Execution within one Document 1.5.2 Executing a Multi-Document Application 1.5.3 Subdialogs 1.5.4 Final Processing 2. Dialog Constructs 2.1 Forms 2.1.1 Form Interpretation 2.1.2 Form Items 2.1.3 Form Item Variables and Conditions 2.1.4 Directed Forms 2.1.5 Mixed Initiative Forms 2.1.6 Form Interpretation Algorithm 2.2 Menus 2.2.1 menu element 2.2.2 choice element 2.2.3 DTMF in Menus 2.2.4 enumerate element 2.2.5 Grammar Generation 2.2.6 Interpretation Model 2.3 Form Items 2.3.1 field element 2.3.2 block element 2.3.3 initial element 2.3.4 subdialog element 2.3.5 object element 2.3.6 record element 2.3.7 transfer element 2.4 Filled 2.5 Links 3. User Input 3.1 Grammars 3.1.1 Speech Grammars 3.1.2 DTMF Grammars 3.1.3 Scope of Grammars 3.1.4 Activation of Grammars 3.1.5 Semantic Interpretation of Input 3.1.6 Mapping Semantic Interpretation Results to VoiceXML forms 4. System Output 4.1 Prompt 4.1.1 Speech Markup 4.1.2 Basic Prompts 4.1.3 Audio Prompting 4.1.4 Element 4.1.5 Bargein 4.1.6 Prompt Selection 4.1.7 Timeout 4.1.8 Prompt Queueing and Input Collection 5. Control flow and scripting 5.1 Variables and Expressions 5.1.1 Declaring Variables 5.1.2 Variable Scopes 5.1.3 Referencing Variables 5.1.4 Standard Session Variables 5.1.5 Standard Application Variables 5.2 Event Handling 5.2.1 throw element 5.2.2 catch element 5.2.3 Shorthand Notation 5.2.4 catch Element Selection 5.2.5 Default catch elements 5.2.6 Event Types 5.3 Executable Content 5.3.1 var element 5.3.2 assign element 5.3.3 clear element 5.3.4 if, elseif, else elements 5.3.5 prompts 5.3.6 reprompt element 5.3.7 goto element 5.3.8 submit element 5.3.9 exit element 5.3.10 return element 5.3.11 disconnect element 5.3.12 script element 5.3.13 log element 6. Environment and Resources 6.1 Resource Fetching 6.1.1 Fetching 6.1.2 Caching 6.1.3 Prefetching 6.1.4 Protocols 6.2 Metadata Information 6.2.1 meta element 6.2.2 metadata element 6.3 property element 6.3.1 Platform-Specific Properties 6.3.2 Generic Speech Recognizer Properties 6.3.3 Generic DTMF Recognizer Properties 6.3.4 Prompt and Collect Properties 6.3.5 Fetching Properties 6.3.6 Miscellaneous Properties 6.4 param element 6.5 Value Designations Appendices Appendix A. Glossary of Terms Appendix B. VoiceXML Document Type Definition Appendix C. Form Interpretation Algorithm Appendix D. Timing Properties Appendix E. Audio File Formats Appendix F. Conformance Appendix G. Internationalization Appendix H. Appendix I. Appendix J. Changes from VoiceXML 1.0 Appendix K. Reusability Appendix L. Acknowledgements Appendix M. References Appendix N. Media Type and File Suffix Appendix O. VoiceXML XML Schema Definition Appendix P. Builtin Grammar Types 1. Overview This document defines VoiceXML, the Voice Extensible Markup Language. Its background, basic concepts and use are presented in Section 1 . The dialog constructs of form, menu and link, and the mechanism (Form Interpretation Algorithm) by which they are interpreted are then introduced in Section 2 . User input using DTMF and speech grammars is covered in Section 3 , while Section 4 covers system output using speech synthesis and recorded audio. Mechanisms for manipulating dialog control flow, including variables, events, and executable elements, are explained in Section . Environment features such as parameters and properties as well as resource handling are specified in Section 6 . The appendices provide additional information including the VoiceXML Schema , a detailed specification of the Form Interpretation Algorithm and timing audio file formats , and statements relating to conformance internationalization and The origins of VoiceXML began in 1995 as an XML-based dialog design language intended to simplify the speech recognition application development process within an AT&T project called Phone Markup Language (PML). As AT&T reorganized, teams at AT&T, Lucent and Motorola continued working on their own PML-like languages. In 1998, W3C hosted a conference on voice browsers. By this time, AT&T and Lucent had different variants of their original PML, while Motorola had developed VoxML, and IBM was developing its own SpeechML. Many other attendees at the conference were also developing similar languages for dialog design; for example, such as HP's TalkML and PipeBeach's VoiceHTML. The VoiceXML Forum was then formed by AT&T, IBM, Lucent, and Motorola to pool their efforts. The mission of the VoiceXML Forum was to define a standard dialog design language that developers could use to build conversational applications. They chose XML as the basis for this effort because it was clear to them that this was the direction technology was going. In 2000, the VoiceXML Forum released VoiceXML 1.0 to the public. Shortly thereafter, VoiceXML 1.0 was submitted to the W3C as the basis for the creation of a new international standard. VoiceXML 2.0 is the result of this work based on input from W3C Member companies, other W3C Working Groups, and the public. Developers familiar with VoiceXML 1.0 are particularly directed to Changes from Previous Public Version which summarizes how VoiceXML 2.0 differs from VoiceXML 1.0. 1.1 Introduction VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications. Here are two short examples of VoiceXML. The first is the venerable "Hello World":
The top-level element is , which is mainly a container for dialogs . There are two types of dialogs: forms and menus . Forms present information and gather input; menus offer choices of what to do next. This example has a single form, which contains a block that synthesizes and presents "Hello World!" to the user. Since the form does not specify a successor dialog, the conversation ends. Our second example asks the user for a choice of drink and then submits it to a server script:
field is an input field. The user must provide a value for the field before proceeding to the next element in the form. A sample interaction is: (computer) : Would you like coffee, tea, milk, or nothing? (human) : Orange juice. C: I did not understand what you said. (a platform-specific default message.) C: Would you like coffee, tea, milk, or nothing? H: Tea C: (continues in document drink2.asp) 1.2 Background This section contains a high-level architectural model, whose terminology is then used to describe the goals of VoiceXML, its scope, its design principles, and the requirements it places on the systems that support it. 1.2.1 Architectural Model The architectural model assumed by this document has the following components: Figure 1: Architectural Model document server (e.g. a Web server) processes requests from a client application, the VoiceXML Interpreter , through the VoiceXML interpreter context The server produces VoiceXML documents in reply, which are processed by the VoiceXML interpreter. The VoiceXML interpreter context may monitor user inputs in parallel with the VoiceXML interpreter. For example, one VoiceXML interpreter context may always listen for a special escape phrase that takes the user to a high-level personal assistant, and another may listen for escape phrases that alter user preferences like volume or text-to-speech characteristics. The implementation platform is controlled by the VoiceXML interpreter context and by the VoiceXML interpreter. For instance, in an interactive voice response application, the VoiceXML interpreter context may be responsible for detecting an incoming call, acquiring the initial VoiceXML document and answering the call, while the VoiceXML interpreter conducts the dialog after answer. The implementation platform generates events in response to user actions (e.g. spoken or character input received, disconnect) and system events (e.g. timer expiration). Some of these events are acted upon by the VoiceXML interpreter itself, as specified by the VoiceXML document, while others are acted upon by the VoiceXML interpreter context. 1.2.2 Goals of VoiceXML VoiceXML's main goal is to bring the full power of Web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management. It enables integration of voice services with data services using the familiar client-server paradigm. A voice service is viewed as a sequence of interaction dialogs between a user and an implementation platform. The dialogs are provided by document servers, which may be external to the implementation platform. Document servers maintain overall service logic, perform database and legacy system operations, and produce dialogs. A VoiceXML document specifies each interaction dialog to be conducted by a VoiceXML interpreter. User input affects dialog interpretation and is collected into requests submitted to a document server. The document server replies with another VoiceXML document to continue the user's session with other dialogs. VoiceXML is a markup language that: Minimizes client/server interactions by specifying multiple interactions per document. Shields application authors from low-level, and platform-specific details. Separates user interaction code (in VoiceXML) from service logic (e.g. CGI scripts). Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers, and platform providers. Is easy to use for simple interactions, and yet provides language features to support complex dialogs. While VoiceXML strives to accommodate the requirements of a majority of voice response services, services with stringent requirements may best be served by dedicated applications that employ a finer level of control. 1.2.3 Scope of VoiceXML The language describes the human-machine interaction provided by voice response systems, which includes: Output of synthesized speech (text-to-speech). Output of audio files. Recognition of spoken input. Recognition of DTMF input. Recording of spoken input. Control of dialog flow. Telephony features such as call transfer and disconnect. The language provides means for collecting character and/or spoken input, assigning the input results to document-defined request variables, and making decisions that affect the interpretation of documents written in the language. A document may be linked to other documents through Universal Resource Identifiers (URIs). 1.2.4 Principles of Design VoiceXML is an XML application [XML] The language promotes portability of services through abstraction of platform resources. The language accommodates platform diversity in supported audio file formats, speech grammar formats, and URI schemes. While producers of platforms may support various grammar formats the language requires a common grammar format, namely the XML Form of the W3C Speech Recognition Grammar Specification [SRGS] , to facilitate interoperability. Similarly, while various audio formats for playback and recording may be supported, the audio formats described in Appendix must be supported The language supports ease of authoring for common types of interactions. The language has well-defined semantics that preserves the author's intent regarding the behavior of interactions with the user. Client heuristics are not required to determine document element interpretation. The language recognizes semantic interpretations from grammars and makes this information available to the application. The language has a control flow mechanism. The language enables a separation of service logic from interaction behavior. It is not intended for intensive computation, database operations, or legacy system operations. These are assumed to be handled by resources outside the document interpreter, e.g. a document server. General service logic, state management, dialog generation, and dialog sequencing are assumed to reside outside the document interpreter. The language provides ways to link documents using URIs, and also to submit data to server scripts using URIs. VoiceXML provides ways to identify exactly which data to submit to the server, and which HTTP method (GET or POST) to use in the submittal. The language does not require document authors to explicitly allocate and deallocate dialog resources, or deal with concurrency. Resource allocation and concurrent threads of control are to be handled by the implementation platform. 1.2.5 Implementation Platform Requirements This section outlines the requirements on the hardware/software platforms that will support a VoiceXML interpreter. Document acquisition. The interpreter context is expected to acquire documents for the VoiceXML interpreter to act on. The "http" URI scheme must be supported. In some cases, the document request is generated by the interpretation of a VoiceXML document, while other requests are generated by the interpreter context in response to events outside the scope of the language, for example an incoming phone call. When issuing document requests via http, the interpreter context identifies itself using the "User-Agent" header variable with the value "/", for example, "acme-browser/1.2" Audio output. An implementation platform must support audio output using audio files and text-to-speech (TTS). The platform must be able to freely sequence TTS and audio output. If an audio output resource is not available, an error.noresource event must be thrown. Audio files are referred to by a URI. The language specifies a required set of audio file formats which must be supported (see Appendix E ); additional audio file formats may also be supported. Audio input. An implementation platform is required to detect and report character and/or spoken input simultaneously and to control input detection interval duration with a timer whose length is specified by a VoiceXML document. If an audio input resource is not available, an error.noresource event must be thrown. It must report characters (for example, DTMF) entered by a user. Platforms must support the XML form of DTMF grammars described in the W3C Speech Recognition Grammar Specification [SRGS] . They should also support the Augmented BNF (ABNF) form of DTMF grammars described in the W3C Speech Recognition Grammar Specification [SRGS] It must be able to receive speech recognition grammar data dynamically. It must be able to use speech grammar data in the XML Form of the W3C Speech Recognition Grammar Specification [SRGS] . It should be able to receive speech recognition grammar data in the ABNF form of the W3C Speech Recognition Grammar Specification [SRGS] , and may support other formats such as the JSpeech Grammar Format [JSGF] or proprietary formats. Some VoiceXML elements contain speech grammar data; others refer to speech grammar data through a URI. The speech recognizer must be able to accommodate dynamic update of the spoken input for which it is listening through either method of speech grammar data specification. It must be able to record audio received from the user. The implementation platform must be able to make the recording available to a request variable. The language specifies a required set of recorded audio file formats which must be supported (see Appendix ); additional formats may also be supported. Transfer The platform should be able to support making a third party connection through a communications network, such as the telephone. 1.3 Concepts A VoiceXML document (or a set of related documents called an application ) forms a conversational finite state machine. The user is always in one conversational state, or dialog , at a time. Each dialog determines the next dialog to transition to. Transitions are specified using URIs, which define the next document and dialog to use. If a URI does not refer to a document, the current document is assumed. If it does not refer to a dialog, the first dialog in the document is assumed. Execution is terminated when a dialog does not specify a successor, or if it has an element that explicitly exits the conversation. 1.3.1 Dialogs and Subdialogs There are two kinds of dialogs: forms and menus Forms define an interaction that collects values for a set of form item variables. Each field may specify a grammar that defines the allowable inputs for that field. If a form-level grammar is present, it can be used to fill several fields from one utterance. A menu presents the user with a choice of options and then transitions to another dialog based on that choice. subdialog is like a function call, in that it provides a mechanism for invoking a new interaction, and returning to the original form. Variable instances, grammars, and state information are saved and are available upon returning to the calling document. Subdialogs can be used, for example, to create a confirmation sequence that may require a database query; to create a set of components that may be shared among documents in a single application; or to create a reusable library of dialogs shared among many applications. 1.3.2 Sessions session begins when the user starts to interact with a VoiceXML interpreter context, continues as documents are loaded and processed, and ends when requested by the user, a document, or the interpreter context. 1.3.3 Applications An application is a set of documents sharing the same application root document . Whenever the user interacts with a document in an application, its application root document is also loaded. The application root document remains loaded while the user is transitioning between other documents in the same application, and it is unloaded when the user transitions to a document that is not in the application. While it is loaded, the application root document's variables are available to the other documents as application variables, and its grammars remain active for the duration of the application, subject to the grammar activation rules discussed in Section 3.1.4 Figure 2 shows the transition of documents (D) in an application that share a common application root document (root). Figure 2: Transitioning between documents in an application. 1.3.4 Grammars Each dialog has one or more speech and/or DTMF grammars associated with it. In machine directed applications, each dialog's grammars are active only when the user is in that dialog. In mixed initiative applications, where the user and the machine alternate in determining what to do next, some of the dialogs are flagged to make their grammars active (i.e., listened for) even when the user is in another dialog in the same document, or on another loaded document in the same application. In this situation, if the user says something matching another dialog's active grammars, execution transitions to that other dialog, with the user's utterance treated as if it were said in that dialog. Mixed initiative adds flexibility and power to voice applications. 1.3.5 Events VoiceXML provides a form-filling mechanism for handling "normal" user input. In addition, VoiceXML defines a mechanism for handling events not covered by the form mechanism. Events are thrown by the platform under a variety of circumstances, such as when the user does not respond, doesn't respond intelligibly, requests help, etc. The interpreter also throws events if it finds a semantic error in a VoiceXML document. Events are caught by catch elements or their syntactic shorthand. Each element in which an event can occur may specify catch elements. Furthermore, catch elements are also inherited from enclosing elements "as if by copy". In this way, common event handling behavior can be specified at any level, and it applies to all lower levels. 1.3.6 Links link supports mixed initiative. It specifies a grammar that is active whenever the user is in the scope of the link. If user input matches the link's grammar, control transfers to the link's destination URI. A link can be used to throw an event or go to a destination URI. 1.4 VoiceXML Elements Table 1: VoiceXML Elements Element Purpose Section Assign a variable a value 5.3.2
Attributes of
Please choose color of your new nineteen twenty four Ford Model T. Possible colors are black, black, or black. Please take your time.
If several prompts are queued before a field input, the timeout of the last prompt is used. 4.1.8 Prompt Queueing and Input Collection A VoiceXML interpreter is at all times in one of two states: waiting for input in an input item (such as , , or ), or transitioning between input items in response to an input (including spoken utterances, dtmf key presses, and input-related events such as a noinput or nomatch event) received while in the waiting state. While in the transitioning state no speech input is collected, accepted or interpreted. Consequently root and document level speech grammars (such as defined in s) may not be active at all times. However, DTMF input (including timing information) should be collected and buffered in the transition state. Similarly, asynchronously generated events not related directly to execution of the transition should also be buffered until the waiting state (e.g. connection.disconnect.hangup). The waiting and transitioning states are related to the phases of the Form Interpretation Algorithm as follows: the waiting state is eventually entered in the collect phase of an input item (at the point at which the interpreter waits for input), and the transitioning state encompasses the process and select phases, the collect phase for control items (such as s), and the collect phase for input items up until the point at which the interpreter waits for input. This distinction of states is made in order to greatly simplify the programming model. In particular, an important consequence of this model is that the VoiceXML application designer can rely on all executable content (such as the content of and elements) being run to completion, because it is executed while in the transitioning state, which may not be interrupted by input. While in the transitioning state various prompts are queued, either by the element in executable content or by the element in form items. In addition, audio may be queued by the fetchaudio attribute. The queued prompts and audio are played either when the interpreter reaches the waiting state, at which point the prompts are played and the interpreter listens for input that matches one of the active grammars, or when the interpreter begins fetching a resource (such as a document) for which fetchaudio was specified. In this case the prompts queued before the fetchaudio are played to completion, and then, if the resource actually needs to be fetched (i.e. it is not unexpired in the cache), the fetchaudio is played until the fetch completes. The interpreter remains in the transitioning state and no input is accepted during the fetch. Note that when a prompt's bargein attribute is false, input is not collected and DTMF input buffered in a transition state is deleted (see Section 4.1.5 ). When an ASR grammar is matched, if DTMF input was consumed by a simultaneously active DTMF grammar (but did not result in a complete match of the DTMF grammar), the DTMF input may, at processor discretion, be discarded. Before the interpreter exits all queued prompts are played to completion. The interpreter remains in the transitioning state and no input is accepted while the interpreter is exiting. It is a permissible optimization to begin playing prompts queued during the transitioning state before reaching the waiting state, provided that correct semantics are maintained regarding processing of the input audio received while the prompts are playing, for example with respect to bargein and grammar processing. The following examples illustrate the operation of these rules in some common cases. Case 1 Typical non-fetching case: field, followed by executable content (such as and ), followed by another field. in document d0
executable content e1 queues prompts {p1}
queues prompts {p2} enables grammars {g2}
As a result of input received while waiting in field f0 the following actions take place: in transitioning state execute e1 (without goto) queue prompts {p1} queue prompts {p2} in waiting state, simultaneously play prompts {p1,p2} enable grammars {g2} and wait for input Case 2 Typical fetching case: field, followed by executable content (such as and ) ending with a that specifies fetchaudio, ending up in a field in a different document that is fetched from a server. in document d0
executable content e1 queues prompts {p1} ends with goto f2 in d1 with fetchaudio fa
in document d1
queues prompts {p2} enables grammars {g2}
As a result of input received while waiting in field f0 the following actions take place: in transitioning state execute e1 queue prompts {p1} simultaneously fetch d1 play prompts {p1} to completion and then play fa until fetch completes queue prompts {p2} in waiting state, simultaneously play prompts {p2} enable grammars {g2} and wait for input Case 3 As in Case 2, but no fetchaudio is specified. in document d0
executable content e1 queues prompts {p1} ends with goto f2 in d1 (no fetchaudio specified)
in document d1
queues prompts {p2} enables grammars {g2}
As a result of input received while waiting in field f0 the following actions take place: in transitioning state execute e1 queue prompts {p1} fetch d1 queue prompts {p2} in waiting state, simultaneously play prompts {p1, p2} enable grammars {g2} and wait for input 5. Control flow and scripting 5.1 Variables and Expressions VoiceXML variables are in all respects equivalent to ECMAScript variables: they are part of the same variable space. VoiceXML variables can be used in a
5.2 Event Handling The platform throws events when the user does not respond, doesn't respond in a way that the application understands, requests help, etc. The interpreter throws events if it finds a semantic error in a VoiceXML document, or when it encounters a element. Events are identified by character strings. Each element in which an event can occur has a set of catch elements , which include: An element inherits the catch elements ("as if by copy") from each of its ancestor elements, as needed. If a field, for example, does not contain a catch element for nomatch, but its form does, the form's nomatch catch element is used. In this way, common event handling behavior can be specified at any level, and it applies to all descendents. The "as if by copy" semantics for inheriting catch elements implies that when a catch element is executed, variables are resolved and thrown events are handled relative to the scope where the original event originated, not relative to the scope that contains the catch element. For example, consider a catch element that is defined at document scope handling an event that originated in a within the document. In such a catch element variable references are resolved relative to the 's scope, and if an event is thrown by the catch element it is handled relative to the . Similarly, relative URI references in a catch element are resolved against the active document and not relative to the document in which they were declared. Finally, properties are resolved relative to the element where the event originated. For example, a prompt element defined as part of a document level catch would use the innermost property value of the active form item to resolve its timeout attribute if no value is explicitly specified. 5.2.1 throw element The element throws an event. These can be the pre-defined ones: or application-defined events: Attributes of are: Table 41: Attributes event The event being thrown. eventexpr An ECMAScript expression evaluating to the name of the event being thrown. message A message string providing additional context about the event being thrown. For the pre-defined events thrown by the platform, the value of the message is platform-dependent. The message is available as the value of a variable within the scope of the catch element, see below. messageexpr An ECMAScript expression evaluating to the message string. Exactly one of "event" or "eventexpr" must be specified; otherwise, an error.badfetch event is thrown. Exactly one of "message" or "messageexpr" may be specified; otherwise, an error.badfetch event is thrown. Unless explicited stated otherwise, VoiceXML does not specify when events are thrown. 5.2.2 catch element The catch element associates a catch with a document, dialog, or form item (except for blocks). It contains executable content. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/vxml
The catch element's anonymous variable scope includes the special variable _event which contains the name of the event that was thrown. For example, the following catch element can handle two types of events:
The _event variable is inspected to select the audio to play based on the event that was thrown. The foo.wav file will be played for event.foo events. The bar.wav file will be played for event.bar events. The remainder of the catch element contains executable content that is common to the handling of both event types. The catch element's anonymous variable scope also includes the special variable _message which contains the value of the message string from the corresponding element, or a platform-dependent value for the pre-defined events raised by the platform. If the thrown event does not specify a message, the value of _message is ECMAScript undefined. If a element contains a element with the same event, then there may be an infinite loop:
A platform could detect this situation and throw a semantic error instead. Attributes of are: Table 42: Attributes event The event or events to catch. A space-separated list of events may be specified, indicating that this element catches all the events named in the list. In such a case a separate event counter (see "count" attribute) is maintained for each event. If the attribute is unspecified, all events are to be caught. count The occurrence of the event (default is 1). The count allows you to handle different occurrences of the same event differently. Each
Please say a primary color red | yellow | blue
then the element is implicitly copied into as if defined below: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/vxml
Please say a primary color red | yellow | blue
When an event is thrown, the scope in which the event is handled and its enclosing scopes are examined to find the best qualified catch element, according to the following algorithm: Form an ordered list of catches consisting of all catches in the current scope and all enclosing scopes (form item, form, document, application root document, interpreter context), ordered first by scope (starting with the current scope), and then within each scope by document order. Remove from this list all catches whose event name does not match the event being thrown or whose cond evaluates to false after conversion to boolean. Find the "correct count": the highest count among the catch elements still on the list less than or equal to the current count value. Select the first element in the list with the "correct count". The name of a thrown event matches the catch element event name if it is an exact match, a prefix match or if the catch event attribute is not specified (note that the event attribute cannot be specified as an empty string - event="" is syntactically invalid). A prefix match occurs when the catch element event attribute is a token prefix of the name of the event being thrown, where the dot is the token separator, all trailing dots are removed, and a remaining empty string matches everything. For example, Caught a connection dot disconnect event
will prefix match the event connection.disconnect.transfer. Caught a com dot example dot my event
prefix matches com.example.myevent.event1., com.example.myevent. and com.example.myevent..event1 but not com.example.myevents.event1. Finally, Caught an event
prefix matches all events (as does without an event attribute). Note that the catch element selection algorithm gives priority to catch elements that occur earlier in a document over those that occur later, but does not give priority to catch elements that are more specific over those that are less specific. Therefore is generally advisable to specify catch elements in order from more specific to less specific. For example, it would be advisable to specify catch elements for "error.foo" and "error" in that order, as follows: Caught an error dot foo event
Caught an error event
If the catch elements were specified in the opposite order, the catch element for "error.foo" would never be executed. 5.2.5 Default catch elements The interpreter is expected to provide implicit default catch handlers for the noinput, help, nomatch, cancel, exit, and error events if the author did not specify them. The system default behavior of catch handlers for various events and errors is summarized by the definitions below that specify (1) whether any audio response is to be provided, and (2) how execution is affected. Note: where an audio response is provided, the actual content is platform dependent. Table 44: Default Catch Handlers Event Type Audio Provided Action cancel no don't reprompt error yes exit interpreter exit no exit interpreter help yes reprompt noinput no reprompt nomatch yes reprompt maxspeechtimeout yes reprompt connection.disconnect no exit interpreter all others yes exit interpreter Specific platforms will differ in the default prompts presented. 5.2.6 Event Types There are pre-defined events, and application and platform-specific events. Events are also subdivided into plain events (things that happen normally), and error events (abnormal occurrences). The error naming convention allows for multiple levels of granularity. A conforming browser may throw an event that extends a pre-defined event string so long as the event contains the specified pre-defined event string as a dot-separated exact initial substring of its event name. Applications that write catch handlers for the pre-defined events will be interoperable. Applications that write catch handlers for extended event names are not guaranteed interoperability. For example, if in loading a grammar file a syntax error is detected the platform must throw "error.badfetch". Throwing "error.badfetch.grammar.syntax" is an acceptable implementation. Components of event names in italics are to be substituted with the relevant information; for example, in error.unsupported. element element is substituted with the name of VoiceXML element which is not supported such as error.unsupported.transfer. All other event name components are fixed. Further information about an event may be specified in the "_message" variable (see Section 5.2.2 ). The pre-defined events are: cancel The user has requested to cancel playing of the current prompt. connection.disconnect.hangup The user has hung up. connection.disconnect.transfer The user has been transferred unconditionally to another line and will not return. exit The user has asked to exit. help The user has asked for help. noinput The user has not responded within the timeout interval. nomatch The user input something, but it was not recognized. maxspeechtimeout The user input was too long exceeding the 'maxspeechtimeout' property. In addition to transfer errors ( Section 2.3.7.3 ), the pre-defined errors are: error.badfetch The interpreter context throws this event when a fetch of a document has failed and the interpreter context has reached a place in the document interpretation where the fetch result is required. Fetch failures result from unsupported scheme references, malformed URIs, client aborts, communication errors, timeouts, security violations, unsupported resource types, resource type mismatches, document parse errors, and a variety of errors represented by scheme-specific error codes. If the interpreter context has speculatively prefetched a document and that document turns out not to be needed, error.badfetch is not thrown. Likewise if the fetch of an document fails and if there is a nested alternate document whose fetch then succeeds, or if there is nested alternate text, no error.badfetch occurs. When an interpreter context is transitioning to a new document, the interpreter context throws error.badfetch on an error until the interpreter is capable of executing the new document, but again only at the point in time where the new document is actually needed, not before. Whether or not variable initialization is considered part of executing the new document is platform-dependent. error.badfetch.http. response_code error.badfetch. protocol.response_code In the case of a fetch failure, the interpreter context must use a detailed event type telling which specific HTTP or other protocol-specific response code was encountered. The value of the response code for HTTP is defined in [RFC2616] . This allows applications to differentially treat a missing document from a prohibited document, for instance. The value of the response code for other protocols (such as HTTPS, RTSP, and so on) is dependent upon the protocol. error.semantic A run-time error was found in the VoiceXML document, e.g. substring bounds error, or an undefined variable was referenced. error.noauthorization Thrown when the application tries to perform an operation that is not authorized by the platform. Examples would include dialing an invalid telephone number or one which the user is not allowed to call, attempting to access a protected database via a platform-specific