HTML good practice checker
Sunday, July 1st, 2007
Do you like clean markup? Do you use HTML and still prefer to quote all your attribute values, use lower-case tag names, and generally follow good clean markup practices? Do you wish you could force the HTML Validator to be even more strict so you could quickly identify stray XHTML-style self-closing tags in your HTML and other issues that it usually ignores?
If so, then you may find my new HTML good practice checker useful. It sets up a custom SGML declaration (for markup parsing rules) and DTD (for document structure rules) which instruct the W3C HTML Validator to be more strict with your document.
Here is a partial list of the new rules enforced:
- All tag and attribute names must be lower-case.
- All attribute values must be quoted.
- Declarations are case-sensitive like in XML.
- SGML Null End Tags (NET) are not allowed. This means that the validator will recognize that a
<br />
in an HTML document is a problem. - End tags must be used on all non-empty elements. Note: If an end tag is forbidden in normal HTML, it’s still forbidden here.
- Start tags must be used on all elements.
- You may not write
<tr>
tags directly inside thetable
contents; you must include them in atbody
. In fact, in terms of document structure,tr
was never truly allowed as a child oftable
in HTML. They were normally assumed to be within atbody
element with omitted start and end tags. So this rule is actually just a natural consequence of the above two rules. Note that HTML’s behavior is different from XHTML, wheretr
actually is allowed as a child oftable
, and the good practice rule of an explicittbody
element improves consistency between HTML and XHTML. - Nested tables are not allowed.
- Unclosed tags and empty tags (obscure and poorly-supported SGML shorthand rules) are no longer allowed.
- Attributes may no longer use minimized form (for example, the
disabled
attribute must be writtendisabled="disabled"
). - Hexadecimal character references must use a lower-case “x” like in XML.
- The following presentational elements may not be used:
tt
,i
,b
,big
,small
. - The
q
element may not be used, due to major unresolvable compatibility issues. - The
width
andheight
attributes are required onimg
elements. - The
name
attribute has been removed on thea
element. You should useid
instead. - The following attributes were removed from the
table
element:width
,border
,frame
,rules
,cellspacing
,cellpadding
,datapagesize
(a reserved attribute). - The following attributes were removed from all other table-related elements:
width
,align
,char
,charoff
,valign
. - On the
script
element, the reservedevent
andfor
attributes have been removed. - In order to avoid issues when user agents confuse
UTF-8
andISO-8859-1
, characters above~
are no longer allowed to be written directly in the document. You should use character references for them.
I’m always open to feedback. For the most part, the things this system can check are currently limited to rules you can specify in the SGML declaration and DTD. Keep in mind that this system is new and it’s possible that there are bugs. If you come across any, please let me know.