SHORTTAG and OMITTAG
Monday, September 11th, 2006
Errors reported by the HTML Validator often mention SHORTTAG
and OMITTAG
in the description. This has caused some confusion, so I will explain what these two features are and where they come from.
Every SGML language, including HTML and XML, has something called an SGML declaration that defines the lexical rules of the language. It defines which characters are used to delimit tags and other constructs, which character ranges can be used for element names, what kinds of constructs may exist in the document, and other things. The SGML declaration usually isn’t included in the document itself (in fact, most user agents won’t support it if it is), but is usually buried somewhere in the language’s specifications — HTML SGML declaration, XML SGML declaration. Although browsers typically don’t parse actual SGML declarations, they typically choose which parsing rules to follow based on the HTTP Content-Type
header. The HTML Validator is unusual in that it actually selects the parsing mode based on the doctype, so it parses a document with an XHTML doctype as XML even if it’s sent with Content-Type: text/html
.
The SGML declaration defines a heirarchy of settings. One of the main categories is FEATURES
, whose first subcategory is MINIMIZE
. This is where you will find the SHORTTAG and OMITTAG feature settings.
OMITTAG
defines whether or not start or end tags may ever be omitted. If YES
, elements may define in the DTD whether start or end tags may be omitted. If NO
, regardless of what the DTD says, they may never be omitted. OMITTAG
is YES
in HTML, but NO
in XML and thus XHTML.
SHORTTAG
then defines whether or not general shorthand features may be used. The format for this is different between the HTML SGML declaration and the XML SGML declaration. XML uses an extended format that toggles a number of features individually, while HTML (and classic SGML declarations) uses a single boolean value for all of the features. SHORTTAG
consists of three main categories: STARTTAG
, ENDTAG
, and ATTRIB
.
STARTTAG
deals with start tags and contains three features: EMPTY
, UNCLOSED
, and NETENABL
.
EMPTY
defines whether or not the contents of the tag may be omitted. This is not the same as whether or not the contents of the element may be omitted. An empty start tag may look like this: <>
. Instead of specifying the element name, it is assumed to be the same kind of element as the previous sibling (the element that most recently closed). This is legal (YES
) in HTML, although no major browser supports it. It is illegal (NO
) in XML and thus in XHTML.
UNCLOSED
defines whether or not the start tag needs to be closed. Again, this is not the same as whether or not the element needs to be closed. Here is an application of an unclosed start tag: <div<p>This is a P inside a DIV.</p></div>
. The end of the start tag is assumed by the beginning of the next tag. This is legal (YES
) in HTML, although it is poorly supported. It is illegal (NO
) in XML and thus XHTML.
NETENABL
defines whether or not the start tag may use Null End Tag (NET) notation. This replaces the start tag’s closing delimiter and the end tag with special single-character delimiters. Here is an example of an element using a Null End Tag: <title/This is the title of the page/
. The value for this feature may be NO
, ALL
(which is implied if SHORTTAG
is simply YES
), or IMMEDNET
. Null End Tags are always legal (ALL
) in HTML, although, as you might have guessed, no major browser supports it. In XML, it is IMMEDNET
, meaning that it is supported, but only when the Null End Tag closing delimiter is immediately after the opening delimiter, which in turn means that the element must have no contents. XML also uses a different character for the closing Null End Tag delimiter: “>
“. Therefore, a Null End Tag in XML looks like this: <br/>
, which people familiar with XML should recognize.
ENDTAG
deals with end tags and contains two features: EMPTY
and UNCLOSED
.
This EMPTY
is similar to the one in STARTTAG
, but it applies to end tags and it is assumed to close the most recent element that is open. For example, if you have <div>Foo <span>bar</span> baz</>
, the empty end tag closes the div
element. This is legal (YES
) in HTML, but illegal (NO
) in XML and thus XHTML.
This UNCLOSED
is also similar to the one in STARTTAG
, and applies to end tags. The end of the end tag is assumed by the beginning of the next tag. For example, <div><div>Foo</div<p>Bar</p</div>
. This is legal (YES
) in HTML, but illegal (NO
) in XML and thus XHTML.
ATTRIB
deals with attributes and contains three features: DEFAULT
, OMITNAME
, and VALUE
.
DEFAULT
defines whether or not attributes may have default values that are defined in the DTD. This is enabled (YES
) in both HTML and XML and thus XHTML.
OMITNAME
defines whether or not attribute names may be omitted. In such a case, the given attribute value will be used for both the attribute name and attribute value. For example, <input type="checkbox" checked>
is equivalent to <input type="checkbox" checked="checked">
. This is legal (YES
) in HTML, although several major browsers don’t treat it literally in some areas like CSS attribute selectors. It is illegal (NO
) in XML and thus XHTML.
VALUE
defines whether or not attribute values may be specified without delimiting quotation marks if the value uses certain ranges of characters. This is legal (YES
) in HTML, but it is illegal (NO
) in XML and thus XHTML.
So here’s the summary: HTML has a simple YES
for both OMITTAG
and SHORTTAG
, meaning all of the above features are allowed. XML has NO
for OMITTAG
and has a feature breakdown for SHORTTAG
, amounting to YES
for ATTRIB DEFAULT
, IMMEDNET
for NETENABL
, and NO
for everything else.
Although it is technically legal to write your own SGML declaration right into an HTML document, extremely few user agents will even recognize it, let alone support it correctly. It is strictly illegal to write your own SGML declaration into an XML document. SHORTTAG
and OMITTAG
aren’t options you can toggle to please the browser, they are inherent traits of HTML and XML and valid documents must conform to those rules.