Web Devout tidings


Other web standards experts worried about HTML 5

May 11th, 2007 by David Hammond

More web standards experts have begun expressing worries about the direction HTML 5 is currently going. Roger Johansson, writer for the excellent 456 Berea Street blog, has written a few posts on the subject. From one of the posts: “What is currently going on in the W3C HTML Working Group is very disappointing and something I never expected to see when I joined it. I was naive enough to think that everybody joining the HTML WG would be doing so out of a desire to improve the Web. Unfortunately, that does not seem to be the case”

Check out the posts and comments in the following links:

Re: 55 reasons to design in XHTML/CSS

May 8th, 2007 by David Hammond

Someone pointed me to an article entitled “55 reasons to design in XHTML/CSS“, which attempts to explain some of the benefits of certain types of webpage design over others. I’m not going to argue against the points which favor semantic markup over presentational markup (because semantic markup is indeed a good thing), but there are lots of false or plain pointless claims about XHTML in there, so I want to address the points one-by-one. Open the above link in another tab so you can see the original points along with my responses.

1a. The CSS Zen Garden is about separation of content and presentation, not XHTML versus HTML. They happen to use XHTML (incorrectly, I might add, since most of the designs fall apart when you force a browser to parse it as XML instead of HTML like they’re currently doing), but this site doesn’t demonstrate any benefits of XHTML over HTML. If anything, it demonstrates problems with XHTML, for the reason described in the previous sentence.

1b. Stylegala accepts either XHTML or HTML content, as long as it’s valid. Like the Zen Garden, this is a separation of content and presentation issue, not an XHTML vs. HTML issue.

1c. CSS Import is the same deal. It’s about separation of content and presentation, not XHTML vs. HTML.

1d. CSS Beauty is the same deal again.

2. This point is ridiculous. I work exclusively with HTML, but I don’t have to spend any extra time thinking about whether or not to quote attribute values. I always quote attribute values. I always use lower-case tag names and attribute names. I always include end tags for non-empty elements. I always escape my ampersands and less-than signs with entities (as well as double quotes and greater-than signs, if only for consistency, although that isn’t a difference between HTML and XHTML). These are just good practice markup rules, and anyone who does web design regularly has their own styles (hopefully subtle variants of already established best practices) which they do without thinking. If you’re sitting at your computer contemplating whether or not you should write “input” or “INPUT”, you must not have much experience.

3. How is this any different from XHTML following the legacy compatibility guidelines (which are required for Internet Explorer support)? If you’re serving XHTML as text/html, you simply cannot write <div /> and expect browsers to consider that closed. Likewise, you simply cannot write <br></br> and expect all browsers to consider that a single element. Several major browsers actually consider that last example to be two br elements! In XHTML, not only do you have to think about which elements require which style of closing, but you also have to think about new compatibility issues between typical HTML parsing of XHTML (which is currently the most common way XHTML is parsed) and proper XML parsing of XHTML (which, sadly, is often overlooked, even though this should be the single correct way of parsing).

4. That’s a semantic markup issue, not HTML vs. XHTML.

5. Another semantic markup issue.

6. This isn’t really true. XHTML 2.0 is, by design, completely incompatible with XHTML 1.x. They’re both supposedly parseable as XML, but so what? You can do an XSL transformation? The semantics and document structure in the current proposed drafts of XHTML 2.0 have some fundamental differences from XHTML 1.x, and you can’t just perform an XSL transformation on any XHTML 1.x document and expect it to result in XHTML 2.0 with all of the proper semantics. In order for XHTML 2.0 to be used correctly, you’ll have to do the markup from scratch. That is, unless you don’t mind the Web being polluted with a bunch of XHTML 2.0 documents containing improper semantics.

7. Separation of content and presentation issue, not HTML vs. XHTML.

8. Semantic markup / separation of content and presentation issue, not HTML vs. XHTML.

9. Semantic markup issue, not HTML vs. XHTML.

10. Separation of content and presentation issue, not HTML vs. XHTML.

11. Separation of content and presentation issue, not HTML vs. XHTML.

12. (X)HTML vs. Flash issue, not HTML vs. XHTML.

13. (X)HTML vs. Flash issue, not HTML vs. XHTML.

14. Separation of content and presentation issue, not HTML vs. XHTML.

15. Separation of content and presentation issue, not HTML vs. XHTML.

16. I’m not even sure what this point is trying to say. CMSs basically exist so that you don’t have to write a whole site backend yourself. What does clean markup have to do with whether or not you need an elaborate site backend?

17. I completely agree with this point, but it has nothing to do with HTML vs. XHTML. Check out the source at http://www.webdevout.net/. How is this any less “clean” than the XHTML equivalent at http://www.webdevout.net/?output=xhtml (aside from the removal of the comments, which were just done to simplify the on-demand conversion)? If anything, XHTML has more junk thrown in. You may personally prefer seeing “/” characters in anything that ends an element, but you must be pretty darn new to web design if you don’t realize that link elements always end immediately after starting. And if you’re that new, you probably can’t read the XHTML version any easier than the HTML version. In any case, I have seen XHTML markup far, far uglier than my HTML markup. Following best practices, typical HTML and typical XHTML do not have significant differences in “cleanliness” compared to other issues like indentation and semantic markup.

18. Separation of content and presentation issue, not HTML vs. XHTML.

19. Semantic markup / separation of content and presentation issue, not HTML vs. XHTML.

20. Separation of content and presentation issue, not HTML vs. XHTML.

21. Semantic markup / separation of content and presentation issue, not HTML vs. XHTML. And I probably wouldn’t use the word “automatically” here.

22. This is more a CSS issue than a markup issue at this time, and I don’t think it’s quite as true as a lot of people seem to believe. In regard to XHTML, the pace of browser development isn’t likely to be swayed to any significant degree by a few more websites adopting “real” (IE-incompatible) XHTML. The Internet Explorer development team has already said that they plan to add real XHTML support in an upcoming version, but they just want to make sure they support it correctly before they roll it out. It would suck for IE to launch it early and have bugs in well-formedness detection (some well-formed pages might not display at all, or malformed pages might accidentally be seen as well-formed, which means the web developer might not catch the error and other browsers might not be able to display the page) or the kinds of CSS and DOM bugs that Opera currently has with XHTML parsed as XML.

23. In HTML, all elements are closed. An empty element with an omittable end tag is automatically closed right after the start tag in HTML (and any other SGML-based language that allows omittable end tags). I assume he meant it looks cleaner with end tags (or null end tags, the /> things). That’s a matter of personal preference. As far as the markup itself, I think XML generally looks less clean with the extra / characters everywhere and/or end tags immediately after start tags. Sure, you could say that the document structure in XML is more obvious without having to know any of the DTD rules, but hey, the navigation structure at craigslist is obvious, but I wouldn’t call it “clean”.

24. This is absolutely wrong. Since XHTML sent with the typical text/html content type isn’t even parsed as XML by any major browser, well-formedness is a non-issue when it comes to how those browsers parse the document. Take a well-formed XHTML page, change every instance of /> to just >, send it along as text/html, and see what browsers do with it. No difference. You have a horribly malformed XML document, and there isn’t a single major browser that even notices. Now go back to your original well-formed XHTML document and add a few <div /> things here and there. It’s still well-formed, and those few empty divs shouldn’t make much of a difference, right? Wrong. Most major browsers suddenly make a complete mess of your page because they think everything after each of the <div /> things is inside the div! Now take an XHTML document with an XML declaration, and insert a blank line above the XML declaration. Go ahead and validate that with the W3C Validator. It says it’s valid? Great, except guess what? It isn’t well-formed. The current W3C Validator doesn’t check for well-formedness, only validity with an SGML parser in XML mode, so when parsed with a real XML parser (any major non-IE browser if the document is sent as application/xhtml+xml), this lovely XHTML page to which the W3C Validator gave the green light completely fails to load! Try the new W3C Validator beta which uses a real XML parser. The new one says it failed validation, while the old one said it passed. Looks like well-formedness is a much more slippery issue than the writer of this article would have you believe.

25. Huh? You really think any major browser today is spending any significant amount of time on the HTML parser error handling anymore? Are we still living in the ’90s? That job is pretty much done. Instead, the author is suggesting that browsers like IE spend more time adding support for proper XHTML rather than things like CSS? Seems a bit self-contradictory to me.

26. Almost every page on the Web is currently sent with the text/html content type. This means all major browsers use an HTML parser on them. Because no major web browsers are enforcing well-formedness, the vast majority of XHTML content on the web is malformed and can only be parsed using a typical HTML parser. Nearly the entire Web currently requires an HTML parser. Future web browsers will have to support this content. This is why HTML 5 is going to define exactly what browsers are currently doing with HTML (rather than referencing the SGML specification which browsers were supposed to be following). If you use HTML, it will work in future web browsers, or else nearly the entire Web will be broken in them. No browser developers who want people to use their product for general web browsing will let all of that content just break. In the real world, HTML is very future-compatible.

27. It’s true that there are still some mobile browsers which use XML parsers exclusively, but these will die out in time. It’s already possible for a full HTML-parsing browser to run on a mobile device without a significant difference in energy consumption compared to an XML-only one. As technology improves, it will be trivial to put something like Opera or KHTML (Konqueror’s free and lightweight engine, from which forked Safari’s engine) on any mobile device with no significant energy or performance impact. The XML-only mobile browsers today are seldom intended for regular web browsing, since well-formed and valid XHTML is nearly nonexistent on the Web, relatively speaking, and that isn’t likely to change any time in the near future. Market forces are simply going to require HTML parsing on those devices if they are intended for general web browsing. Again, let’s be realistic. HTML isn’t going away.

28. Again, no major browser treats typical XHTML as XML, so you aren’t really dealing with XML in the first place. You’re dealing with bad HTML that has adopted some aspects of XML, but is still treated as just HTML. Learning XML by using XHTML is like learning web design by going to an MS FrontPage class: you may acquire some superficial exposure to it, but you’ll probably learn a bunch of crap in the process.

29. Separation of content and presentation issue, not HTML vs. XHTML.

30. What exactly would I be converting XHTML to? I’ve never come across a situation where such a thing would be useful with HTML/XHTML. If I want to provide an RSS or Atom feed, I probably already have the data in an SQL database to begin with. Doing an XSL transformation in this case would cause unnecessary overhead. And then, if I really want to, it’s not exactly difficult to convert typical HTML to XHTML. Remember the ?output=xhtml thing I mentioned above? I wrote that converter in under an hour as an afterthought. If you have a specific document you’re working with in a specific format, it’s even easier to toss around a couple regular expressions and get the data you want. I always hear people referring to XSL as an advantage of XHTML, but I wonder how many of those people have even used XSL. I’ve never needed to.

31. Semantic markup issue, not HTML vs. XHTML.

32. Semantic markup issue, not HTML vs. XHTML.

33. Ha. Well, thankfully, I don’t have to use XHTML in order to put it on my resume.

34. This isn’t really an issue of HTML vs. XHTML, but it’s worth pointing out that Firefox 2.0 and below actually typically render XHTML content parsed as XML more slowly than HTML. Whether it’s HTML or XML, parsing speed is usually faster than download speed, so it has usually parsed the entire document by the time it finishes downloading. When using the HTML parser, it begins to display the webpage while it’s being parsed (and thus, while it’s still downloading). However, when it’s using the XML parser, it won’t display anything until it has checked for well-formedness throughout the entire document. That means nothing gets displayed to the user until the entire XHTML page has downloaded. So under both HTML and XML modes, the document usually finishes rendering at about the same time, but the HTML parser starts rendering much sooner. Firefox 3 will support incremental rendering of XML content, so the two will be about the same speed on typical Internet connection speeds.

35. But what is “the right way” exactly?

36. They aren’t: Roger Johansson, Anne van Kesteren, Jonathan Snook, Eric Meyer, the Safari team

The links in the following sub-points go to a page which simply sends the webpage contents with the application/xhtml+xml content-type which triggers XML parsing in web browsers. This is exactly how you would see the respective websites if they sent the correct XHTML content type.

36a. Not only does SimpleBits not validate, it isn’t even well-formed, so an XML parser would completely fail to load it.

36b. Shame the “Job Board” section on Jeffrey Zeldman’s site doesn’t work when the page is parsed as XML. document.write and document.writeln don’t exist in XML documents. Being a supposed standards expert, you’d think he’d know that.

36c. Wow, the stylesheet really falls apart when Jason Santa Maria’s page is parsed as XML. Perhaps he doesn’t realize that CSS isn’t supposed to follow the legacy HTML rules when the page is parsed as XML, and when you set the background on the body element, it really goes on the body element, not the html element. Too bad.

36d. Shaun Inman should fix those weird spacing issues that happen when the page is parsed as XML. The search box looks out of shape.

36e. Cameron Moll’s site has a whole slew of validation errors, plus well-formedness errors. The page doesn’t display at all when parsed as XML.

36f. On StopDesign, the “Latest links” don’t appear when the page is parsed as XML. Once again, document.write doesn’t work in XML.

36g. Dave Shea’s mezzoblue has a bunch of validation and well-formedness errors. The page doesn’t display at all when parsed as XML.

For those keeping count, every single example the article gave of web standards experts using XHTML had problems when parsed as XML. Three of them (that’s 43%) couldn’t even be parsed as XML. XHTML was designed specifically to be an XML version of HTML. If these sites don’t work correctly when treated as XML, why are they XHTML? If they depend on browsers treating them like HTML and don’t make use of any benefit XHTML is supposed to offer, why weren’t they just written in HTML?

37. You’re part of the masses who use XHTML without really understanding it. I consider myself part of a movement to educate people about the problems using XHTML this way. To each his own.

38. This isn’t really specific to the HTML vs. XHTML issue.

39. Hooray! Although I wish the same fate about some other elements which are still lingering in drafts of future specifications.

40. Which is why I write my HTML to strict guidelines.

41. Thankfully, I can write books about XHTML without using it.

42. Yes, it’s always good to know the technologies. I know XHTML very well. Of course, it doesn’t mean I use it when HTML is the better option.

43. For someone who doesn’t seem to understand that browsers handle most XHTML content on the Web as regular old HTML, I’m not sure this author is one to speak.

44. Sounds like a CSS issue.

45. You should be caring about this stuff if you’re using HTML, too.

46. Separation of content and presentation issue, not HTML vs. XHTML.

47. “XHTML has a cooler name than HTML”. This doesn’t warrant a response.

48. Yeah, and I’ve definitely seen a lot of disadvantages to using XHTML. Unfortunately, there is a lot of people who get religious about it and refuse to listen to anything bad about XHTML. But for those among you who are open-minded, I hope you read my articles on the subject.

49. Separation of content and presentation issue, not HTML vs. XHTML.

50. Or free tools, like Bluefish on Linux.

51. 1,060,000 columbus discovered the world was round > 826,000 columbus didn’t discover the world was round. The myth that Columbus had any connection with a debate over whether or not the world was round originated entirely from a fictional work published in 1828. If you read the actual history of Christopher Columbus, it was not a debate over whether or not the world was round (it was already well-known and proven that the world was round, and it was contested by very few among the masses and even fewer from more educated backgrounds), it was actually a debate over the circumference of the Earth. Columbus had some errors in his calculation (including confusing two different “mile” units from different measurement systems) and thought the world was much smaller than it turned out to be. In every respect, Columbus was wrong about his predictions as he set sail, and had absolutely no part in proving that the world was round. Yet, according to Google search results, the popular belief is that which was derived from the 1828 work of fiction. Just because something is popular on Google doesn’t mean it’s true.

52. Oh yes, just like those sites I covered in number 36. All I did for my initial check was use the Force Content-Type extension in Firefox, which just causes the browser to treat the site as if it had the application/xhtml+xml content type (also known as a MIME type). Every single one had problems. All of them. I sure hope everyone who’s writing XHTML content takes a moment to check how an XML parser would see the page, because there’s a very high occurrence of things in the markup, stylesheets, and scripts breaking, often to a major degree. If you’ve only been testing the page when it’s sent as text/html, you’ll probably have a lot more work to do than just switching the content type.

53. As mentioned before, Microsoft is already working on it. Making pages that break in IE and thus get very few visitors isn’t going to put any significant pressure on Microsoft to put more people on the job or work faster. All it means is that a lot of visitors will think your site is broken. Let Microsoft take their time and release a good XHTML engine. There’s no rush. Really, XHTML isn’t that immediately important.

54. That’s just valid markup, whether it’s HTML or XHTML.

55. Just 16 of the points above were about HTML vs. XHTML. The rest were about other issues like semantic markup. Among those 16, most of them were plainly false, and the rest were pretty much false or irrelevant. Check out my Beware of XHTML article, which explains a lot more reasons why you should probably use HTML rather than XHTML.

By the way, the article entitled “55 reasons to design in XHTML/CSS” contains invalid and malformed XHTML, and a browser attempting to parse it as XML completely fails to load the page. Here’s the parsed-as-XML version. The only reason a browser is able to display the original article at all is because all major browsers normally treat his XHTML as regular old HTML. I suggest learning the technology before spreading myths about it.

Oops. New Webpage Test system sort of up

May 1st, 2007 by David Hammond

Well, I had planned to make this new Webpage Test version a big release with lots of fanfare, but during a minor site-wide update I accidentally put up part of the new system and overwrote the old version. So let’s call this a version 2.5.

Here’s the scoop:

Over the last few months, I’ve been working on a full-featured SGML parser in PHP in whatever spare time I had between work and dealing with a spine/leg problem. Right now, it’s nearly finished aside from a few important bugs and some tweaks here and there. But the SGML parser is not live right now. The point of using a full SGML parser for syntax highlighting rather than a traditional regexp-based highlighter is that I want it to be accurate, and it currently doesn’t handle certain situations correctly.

What’s up right now is the highlighter I planned to use for content that included PHP and other preprocessing languages. This makes use of a regexp-based highlighting framework I designed inspired by Bluefish’s syntax highlighting framework. Unlike the SGML parser, this syntax highlighter does not have knowledge of the document structure and does not check content against a DTD. It uses a hard-coded list of known elements, attributes, and entities, and generally makes an effort to highlight typical HTML markup reasonably well for a regexp-based highlighter. I plan to also use this framework for CSS and ECMAScript syntax highlighting in the future, at least at first.

This new version of the Webpage Test system displays the highlighted (X)HTML source when viewing a saved page. The next version will also display the highlighted CSS and ECMAScript source if relevant.

Here are some of the design criteria that went into this new feature:

  • Pure (X)HTML markup should be highlighted as accurately as possible with proper indicators for errors and common mistakes. This would have been delivered well with the SGML parser, but for now you’ll get the cheap imitation.
  • (X)HTML markup containing server-side preprocessing instructions like PHP cannot be assumed to be valid (X)HTML before those instructions are executed, so a full SGML parser is not appropriate for this source. A simpler regexp-based highlighter will be used.
  • Highlighting color schemes should be consistent across languages. Constructs from different languages which serve similar respective purposes should be the same color if possible. By default, strings are red, variables are cyan, “special” constructs are yellow, escaped characters are blue, comments are grey and italic, etc.
  • No highlighting color scheme will please everyone. The content is ambitiously highlighted using code elements and semantic class names, and a single modular stylesheet is used for the styling. This allows people to change the styles through user stylesheets or for Web Devout to easily provide options for highlighting schemes in the future.
  • The highlighted source shouldn’t be altered from the original source other than adding in the highlighting elements. The highlighter doesn’t mess with whitespace, throw in extra attributes, screw with empty elements, or anything of the sort. The characters you see are the characters that were inputted.
  • Lines should be numbered, but if possible, selecting the highlighted source and copy/pasting into another application shouldn’t result in any extra characters beyond the original source. An ordered list can’t be used since the different list items would overlap with the highlighting elements, and Firefox and other browsers are known to copy the numbers to the clipboard when copy/pasting. Instead, all modern browsers should get generated content with CSS counters. Since Internet Explorer doesn’t support generated content and CSS counters, it is unfortunately given conditional comments with the numbers as inline data, which means copy/pasting would include the line numbers if you’re using Internet Explorer.
  • A single test case will likely be viewed several times. Because of this, the syntax highlighting is only done once upon submission and is cached.

There are several other improvements in this version besides the syntax highlighter:

  • Newly saved test cases now reuse expired IDs in order to minimize the length of the URL.
  • You may now easily load in remote sites by typing something like http://www.webdevout.net/test?http://www.w3.org/. The HTTP headers are displayed and highlighted for clarity.
  • There’s nicer feedback when saving a test case.

There are still some known bugs in this version. The highlighter doesn’t yet make sure that all highlighting code elements are closed, so it’s possible for the output to be invalid if your markup ends unexpectedly. As mentioned above, this wasn’t supposed to go live yet, but it’s reasonably stable anyway, so I figure it isn’t worth regressing it to the old version.

HTML 5: common practice vs. good practice

April 29th, 2007 by David Hammond

By the way it’s documenting current browser practices, the HTML 5 specification may inadvertently be encouraging bad web development practices.

One of the big reasons for a lot of the junk that’s currently planned in HTML 5 is that new user agents developed in the future should be able to reasonably handle preexisting content on the Web. The HTML 5 specification will describe what browsers are currently doing with a lot of content which maybe wasn’t considered part of a standard before now, and future web browsers would only have to follow the HTML 5 specification in order to handle much of the legacy content on the Web.

I do recognize the need for this. However, I don’t think this idea is being delivered properly. If we aren’t careful, web developers will end up looking at things like the font and embed elements and say, “Well hey, they’re here, they’re standard, why not use them?” I don’t care how many times you say in the specification that authors shouldn’t use them, if people receive even the slightest hint that they’re considered standard, they will.

What I feel needs to happen is a much more clear and physical separation between the parts of the standard meant only for browsers rendering legacy content and the parts meant for web developers following good practices. I’m not sure if I’d go as far as publishing two separate and complimentary standards, but I feel that the legacy stuff should at least be isolated into its own major section of the specification with a fat heading something along the lines of “Crap that browsers support for legacy reasons”. None of this legacy content should pass a webpage validation, and it should be made perfectly clear that use of those features on new webpages is a violation of the standard.

But I somehow have a feeling that this won’t happen. Even the current Web Applications 1.0 specification says that WYSIWYG editors are allowed to use font tags in their output. It mentions nothing to dissuade the use of the embed element even though common uses like Flash can be done purely through an object element in all of today’s major browsers. The specification generally carries an attitude of “if it’s out there right now, it’s valid” which I think will end up only encouraging bad practices and resulting in a lot more problems in the future.

The whimzical world of HTML 5

April 23rd, 2007 by David Hammond

A lot of scary stuff is going on in HTML 5 development. You know all the things we’ve learned about browser/engine-neutral code, building standards on top of other standards, using semantic markup, and so on? Well from what I’ve seen, the HTML working group seems to be throwing all of that out the window.

I should first note that I only just recently subscribed to the HTML WG mailing list, and I haven’t yet had a chance to read the full breadth of the discussion, but the talk right now seems to be gathered around something called “bugmode”, a new standard mechanism for browsers to add an infinite number of “quirks modes” which webpages can subscribe to. It’s currently proposed as something like this:

<html bugmode="ie7 gecko1.8 opera9">

This would basically cause these browsers to use snapshots of the respective layout engines when displaying the page. All future versions of Internet Explorer would use the IE 7 engine, all future versions of Firefox would use the Gecko 1.8 engine, and all future versions of Opera would use the Opera 9 engine.

Am I the only one who thinks this is a terrible idea?

First of all, since when do web developers experience significant problems with new versions of Firefox or Opera? I’ve never had anything important break with the release of a new version. I’ve only experienced such a problem in Internet Explorer, since IE has to fix major implementation flaws in very fundamental areas of the standards, like the basic behavior of the width and height properties. IE is uniquely in this position because most of their engine was developed before the current CSS standards were in place (they basically extrapolated off of CSS 1 however they saw fit at the time) and the engine had no development work for half a decade to correct the inconsistencies.

So I personally wouldn’t mind it if IE added some sort of conditional comment type of thing to target new quirks modes in IE, but I don’t see why there should be a whole new attribute added to the HTML standard just for triggering new browser-specific quirks modes.

I should point out that this is still very much a brainstorming session, and this idea may fade away in a couple weeks, but I’m still bothered by the number of people who seem to be taking this discussion seriously.

I talked a little more about this issue in a comment on Chris Wilson’s blog.

Now, Ian Hickson, who was responsible for a lot of the WHATWG Web Applications 1.0 work and will serve as an editor for the W3C HTML 5 specification, has it in his mind that the HTML WG is chartered to deviate HTML 5 from SGML. That is, he believes it is one of the stated intentions of the HTML WG that future versions of HTML will not be SGML languages.

Here is the charter quote from which he derived this idea:

The Group will define conformance and parsing requirements for ‘classic HTML’, taking into account legacy implementations; the Group will not assume that an SGML parser is used for ‘classic HTML’.

The charter uses the term “classic HTML” to refer to non-XHTML HTML. In SGML terms, this would be the markup using HTML 4.01’s SGML declaration, rather than XML as used by XHTML. Currently, no major web browser uses a full-featured SGML parser to parse classic HTML content. Therefore, it is wise not to assume that a browser can handle any SGML rules thrown at it in a new version of HTML. What the charter is saying is that the group will take into account this fact when developing the new standard. It does not say that HTML 5 shouldn’t be parseable by an SGML parser; it just says not to assume that an SGML parser will always be used.

However, Ian Hickson and others in the HTML WG have used this twisted interpretation of the charter as an excuse to unnecessarily break compatibility with the SGML standard. I’ll say it again: unnecessarily breaking compatibility with the SGML standard. I haven’t yet seen anything they’re trying to accomplish with HTML 5 that couldn’t be done in an SGML-compatible way.

They want to circumvent the issue of XML-style self-closing tag constructs causing problems in user agents which support the default SGML syntax for null end tags? Just set NETENABL to NO in the SGML declaration. This wouldn’t expressly allow XML-style self-closing constructs in HTML, and in these cases the “/” character would be considered invalid, but it brings a fully-compliant SGML parser to the behavior that all major browsers currently exhibit. Note that this would only be handled as intended when used on elements defined as EMPTY, as is currently the case in all major web browsers. If you were to truly support XML-style self-closing tags even for non-EMPTY elements (which may indeed require a significant departure from how HTML is currently constructed), that would cause problems with legacy user agents, which the HTML WG charter says to avoid. A change to the SGML declaration would be somewhat of an issue for fully compliant SGML parsers, since they generally use the Content-type header to determine which SGML grammar is being used, and we should probably avoid giving HTML 5 a different content-type than HTML 4, but at least this would keep HTML 5 compatible with SGML so it isn’t impossible for an SGML parser to parse it.

It has also been proposed that HTML 5 should have no DTD. For similar reasons, I ask, why? I’ve seen the proposed elements in Web Applications 1.0, which is roughly considered the starting point for HTML 5 development, and I don’t see anything there that would require the absence of a DTD. I’m curious what the W3C Validator development team thinks of this. The W3C Validator currently operates strictly via an SGML/DTD parser (the upcoming new version of the Validator also comes equipped with an XML parser in order to also check for well-formedness). Without a DTD, the validator would have to hard-code all of the rules for HTML 5. And how exactly does omitting a DTD benefit anyone?

Not only does Ian Hickson want to omit a DTD, but he doesn’t seem to think that a version indicator is even necessary. His proposed new doctype declaration is simply <!DOCTYPE html>. So that’s it. Every future version of HTML had better be 100% backwards compatible. No mistakes may be made or else the HTML standard is screwed for life. I think history has shown us that this assumption that we can reasonably keep a sane standard backwards-compatible forever is a bit unwise. At one time, the isindex element seemed like a good idea. There are plenty of people who want the q element redefined in HTML 5 so that the browser doesn’t display quotation marks by itself. HTML 5 already attempts to redefine some elements and attributes from HTML 4. I guarantee that there are features currently in Web Applications 1.0 which people are going to see as a mistake several years down the road and want to correct. It will end up causing compatibility problems if there isn’t a version number to go along with those changes. Maybe we’ll have to use bugmode after all.

Speaking of new features, let’s talk about some of them. To start off, there are some good things proposed in Web Applications 1.0. I like the section element, nav element, article element, aside element, the redefinition of the dl element, and some of the other stuff. But there are some elements and attributes that just make me scratch my head:

  • Why do we have a canvas element? Why not simply use a script to apply some state to any given element to turn it into a canvas? People who have worked with the Google Maps API are familiar with the idea of using a script to replace an arbitrary element (be it a div, p, etc.) with a new object. In most cases, a canvas element could be replaced with a div element, and then the script just sets it to a canvas just as browsers often allow scripts to set arbitrary elements to be contentEditable. What ever happened to semantic markup? What semantics does a canvas element express?
  • ping attributes? In my a? Thanks for slowing my Web experience and using up more of my bandwidth so that advertising companies can track my habits. Much appreciated. I hope my browser quickly adds an option to disable this functionality, because I for one don’t want it. If a website is going to gossip to others about how I’m using the site, it should put in the effort to do it server-side with its own bandwidth.
  • embed element, why won’t you die? Is it the popular thing these days to just call whatever is out there on the Web “the standard”?

I could go on, but my point is that a lot of stuff is being proposed pretty quickly, and I question the motivation and thought behind a lot of these propositions. People seem to be caught up on how to add such-and-such functionality to web apps rather than focusing on semantics and other things we were supposed to have learned since the old boom days of the Web. I dunno, it just feels like we’ve been through all of this before. Even though this is being discussed in a public forum, the types of propositions are all too reminiscent of the seemingly random “sounded-good-at-the-time” features Netscape and Internet Explorer kept adding during the last browser wars. Does anyone know where I can buy some cheap shock collars?