HTML good practice checker
Sunday, July 1st, 2007
Do you like clean markup? Do you use HTML and still prefer to quote all your attribute values, use lower-case tag names, and generally follow good clean markup practices? Do you wish you could force the HTML Validator to be even more strict so you could quickly identify stray XHTML-style self-closing tags in your HTML and other issues that it usually ignores?
If so, then you may find my new HTML good practice checker useful. It sets up a custom SGML declaration (for markup parsing rules) and DTD (for document structure rules) which instruct the W3C HTML Validator to be more strict with your document.
Here is a partial list of the new rules enforced:
- All tag and attribute names must be lower-case.
- All attribute values must be quoted.
- Declarations are case-sensitive like in XML.
- SGML Null End Tags (NET) are not allowed. This means that the validator will recognize that a
<br />in an HTML document is a problem. - End tags must be used on all non-empty elements. Note: If an end tag is forbidden in normal HTML, it’s still forbidden here.
- Start tags must be used on all elements.
- You may not write
<tr>tags directly inside thetablecontents; you must include them in atbody. In fact, in terms of document structure,trwas never truly allowed as a child oftablein HTML. They were normally assumed to be within atbodyelement with omitted start and end tags. So this rule is actually just a natural consequence of the above two rules. Note that HTML’s behavior is different from XHTML, wheretractually is allowed as a child oftable, and the good practice rule of an explicittbodyelement improves consistency between HTML and XHTML. - Nested tables are not allowed.
- Unclosed tags and empty tags (obscure and poorly-supported SGML shorthand rules) are no longer allowed.
- Attributes may no longer use minimized form (for example, the
disabledattribute must be writtendisabled="disabled"). - Hexadecimal character references must use a lower-case “x” like in XML.
- The following presentational elements may not be used:
tt,i,b,big,small. - The
qelement may not be used, due to major unresolvable compatibility issues. - The
widthandheightattributes are required onimgelements. - The
nameattribute has been removed on theaelement. You should useidinstead. - The following attributes were removed from the
tableelement:width,border,frame,rules,cellspacing,cellpadding,datapagesize(a reserved attribute). - The following attributes were removed from all other table-related elements:
width,align,char,charoff,valign. - On the
scriptelement, the reservedeventandforattributes have been removed. - In order to avoid issues when user agents confuse
UTF-8andISO-8859-1, characters above~are no longer allowed to be written directly in the document. You should use character references for them.
I’m always open to feedback. For the most part, the things this system can check are currently limited to rules you can specify in the SGML declaration and DTD. Keep in mind that this system is new and it’s possible that there are bugs. If you come across any, please let me know.