Performance, Implementation, and Design

Performance, Implementation, and Design Notes
previous
next
contents
elements
attributes
index
Appendix B:
Performance, Implementation,
and Design Notes
Contents
Notes on invalid documents
Special characters in URI attribute
values
Non-ASCII characters in URI attribute
values
Ampersands in URI attribute
values
SGML implementation notes
Line breaks
Specifying non-HTML data
Element content
Attribute values
SGML features with limited
support
Boolean attributes
Marked Sections
Processing Instructions
Shorthand markup
Notes on helping search engines index your
Web site
Search robots
The robots.txt file
Robots and the META element
Notes on tables
Design rationale
Dynamic reformatting
Incremental display
Structure and presentation
Row and column groups
Recommended Layout Algorithms
Fixed Layout Algorithm
Autolayout Algorithm
Notes on forms
Incremental display
Future projects
Notes on scripting
Reserved syntax for future script
macros
Current Practice for Script
Macros
Notes on frames
Notes on accessibility
Notes on security
Security issues for forms
The following notes are informative, not normative. Despite the appearance
of words such as "must" and "should", all requirements in this section appear
elsewhere in the specification.
B.1
Notes on invalid
documents
This specification does not define how conforming user agents handle general
error conditions,
including
how user agents behave when they encounter elements, attributes, attribute
values, or entities not specified in this document.
However, to facilitate experimentation and interoperability between
implementations of various versions of HTML, we recommend the following
behavior:
If a user agent encounters an element it does not recognize, it should try
to render the element's content.
If a user agent encounters an attribute it does not recognize, it should
ignore the entire attribute specification (i.e., the attribute and its
value).
If a user agent encounters an attribute value it doesn't recognize, it
should use the default attribute value.
If it encounters an undeclared entity, the entity should be treated as
character data.
We also recommend that user agents provide support for notifying the user of
such errors.
Since user agents may vary in how they handle error conditions, authors and
users must not rely on specific error recovery behavior.
The HTML 2.0 specification (
[RFC1866]
) observes that
many HTML 2.0 user agents assume that a document that does not begin with a
document type declaration refers to the HTML 2.0 specification. As experience
shows that this is a poor assumption, the current specification does not
recommend this behavior.
For reasons of interoperability, authors must not "extend" HTML through the
available SGML mechanisms (e.g., extending the DTD, adding a new set of entity
definitions, etc.).
B.2
Special characters in URI attribute values
B.2.1
Non-ASCII characters
in URI attribute values
Although URIs do not contain non-ASCII values (see
[URI]
, section 2.1)
authors sometimes specify them in attribute values expecting URIs (i.e.,
defined with
%URI;
in the
DTD
). For instance, the following
href
value is
illegal
...
We recommend that user agents adopt the following convention for handling
non-ASCII characters in such cases:
Represent each character in UTF-8 (see
[RFC2279]
) as one or more
bytes.
Escape these bytes with the URI escaping mechanism (i.e., by converting
each byte to %HH, where HH is the hexadecimal notation of the byte value).
This procedure results in a syntactically legal URI (as defined in
[RFC1738]
, section 2.2 or
[RFC2141]
, section 2) that
is independent of the
character
encoding
to which the HTML document carrying the URI may have been
transcoded.
Note.
Some older user agents trivially process URIs in
HTML using the bytes of the
character
encoding
in which the document was received. Some older HTML documents rely
on this practice and break when transcoded. User agents that want to handle
these older documents should, on receiving a URI containing characters outside
the legal set, first use the conversion based on UTF-8. Only if the resulting
URI does not resolve should they try constructing a URI based on the bytes of
the
character encoding
in which the
document was received.
Note.
The same conversion based
on UTF-8 should be applied to values of the
name
attribute for the
element.
B.2.2
Ampersands in URI
attribute values
The URI that is constructed when a
form is submitted
may be used as an
anchor-style link (e.g., the
href
attribute for the
element). Unfortunately, the use of the "&" character to separate form
fields interacts with its use in SGML attribute values to delimit
character entity references
. For
example, to use the URI "http://host/?x=1&y=2" as a linking URI, it must be
written or href="http://host/?x=1&y=2">.
We recommend that HTTP server implementors, and in particular, CGI
implementors support the use of ";" in place of "&" to save authors the
trouble of escaping "&" characters in this manner.
B.3
SGML implementation
notes
B.3.1
Line breaks
SGML (see
[ISO8879]
, section 7.6.1) specifies that a line break immediately
following a start tag must be ignored, as must a line break immediately before
an end tag. This applies to all HTML elements without exception.
The following two HTML examples must be rendered identically:

Thomas is watching TV.

So must the following two examples:
My favorite Website

My favorite Website

B.3.2
Specifying
non-HTML data
Script
and
style
data may appear as element content or
attribute values. The following sections describe the boundary between HTML
markup and foreign data.
Note.
The
DTD
defines
script and style data to be CDATA for both element content and attribute
values. SGML rules do not allow
character
references
in CDATA element content but do allow them in CDATA attribute
values. Authors should pay particular attention when cutting and pasting script
and style data between element content and attribute values.
This asymmetry also means that when transcoding from a richer to a
poorer character encoding, the transcoder cannot simply replace unconvertible
characters in script or style data with the corresponding numeric character
references; it must parse the HTML document and know about each script and
style language's syntax in order to process the data correctly.
Element content
When script or style data is the content of an element (
SCRIPT
and
STYLE
), the data begins
immediately after the element start tag and ends at the first ETAGO ("delimiter followed by a name start character ([a-zA-Z]); note that this may not
be the element's end tag. Authors should therefore escape "content. Escape mechanisms are specific to each scripting or style sheet
language.
ILLEGAL EXAMPLE:
The following script data incorrectly contains a """) before the
SCRIPT
end tag:

In JavaScript, this code can be expressed legally by hiding the ETAGO
delimiter before an SGML name start character:

In Tcl, one may accomplish this as follows:

In VBScript, the problem may be avoided with the
Chr()
function:
"This will work<" & Chr(47) & "EM>"
Attribute values
When script or style data is the value of an attribute (either
style
or the
intrinsic
event
attributes), authors should escape occurrences of the delimiting
single or double quotation mark within the value according to the script or
style language convention. Authors should also escape occurrences of "&" if
the "&" is not meant to be the beginning of a
character reference
'"' should be written as """ or """
'&' should be written as "&" or "&"
Thus, for example, one could write:
onchange="if (compare(this.value, "help")) {gethelp()}">
B.3.3
SGML features with limited
support
SGML systems conforming to
[ISO8879]
are expected to
recognize a
number of features
that aren't widely
supported by HTML user agents. We recommend that authors avoid using all of
these features.
B.3.4
Boolean
attributes
Authors should be aware that many user agents only recognize the minimized
form of boolean attributes and not the full form.
For instance, authors may want to specify: