Web Devout tidings


Archive for the 'Specifications' Category

Self-contradictions in the HTML WG

Tuesday, May 22nd, 2007

HTML 5 will include the embed element, font element, and other elements which could easily be replaced using better and well-supported features. Ian Hickson wants them included anyway.

HTML 5 will not include the headers attribute for tables, which was designed to aid accessibility. Ian’s reason for not including them? They can easily be replaced with another feature (the scope attribute).

Ian wants all of the bad practice features of HTML included in HTML 5 because they are widely used.

Ian doesn’t want the good practice headers attribute included even though it’s widely used.

Can someone please tell me what’s going on?

Update 2007-06-01: Here’s a relevant article from Juicy Studio: The HTML Scope/Headers Debate.

Other web standards experts worried about HTML 5

Friday, May 11th, 2007

More web standards experts have begun expressing worries about the direction HTML 5 is currently going. Roger Johansson, writer for the excellent 456 Berea Street blog, has written a few posts on the subject. From one of the posts: “What is currently going on in the W3C HTML Working Group is very disappointing and something I never expected to see when I joined it. I was naive enough to think that everybody joining the HTML WG would be doing so out of a desire to improve the Web. Unfortunately, that does not seem to be the case”

Check out the posts and comments in the following links:

HTML 5: common practice vs. good practice

Sunday, April 29th, 2007

By the way it’s documenting current browser practices, the HTML 5 specification may inadvertently be encouraging bad web development practices.

One of the big reasons for a lot of the junk that’s currently planned in HTML 5 is that new user agents developed in the future should be able to reasonably handle preexisting content on the Web. The HTML 5 specification will describe what browsers are currently doing with a lot of content which maybe wasn’t considered part of a standard before now, and future web browsers would only have to follow the HTML 5 specification in order to handle much of the legacy content on the Web.

I do recognize the need for this. However, I don’t think this idea is being delivered properly. If we aren’t careful, web developers will end up looking at things like the font and embed elements and say, “Well hey, they’re here, they’re standard, why not use them?” I don’t care how many times you say in the specification that authors shouldn’t use them, if people receive even the slightest hint that they’re considered standard, they will.

What I feel needs to happen is a much more clear and physical separation between the parts of the standard meant only for browsers rendering legacy content and the parts meant for web developers following good practices. I’m not sure if I’d go as far as publishing two separate and complimentary standards, but I feel that the legacy stuff should at least be isolated into its own major section of the specification with a fat heading something along the lines of “Crap that browsers support for legacy reasons”. None of this legacy content should pass a webpage validation, and it should be made perfectly clear that use of those features on new webpages is a violation of the standard.

But I somehow have a feeling that this won’t happen. Even the current Web Applications 1.0 specification says that WYSIWYG editors are allowed to use font tags in their output. It mentions nothing to dissuade the use of the embed element even though common uses like Flash can be done purely through an object element in all of today’s major browsers. The specification generally carries an attitude of “if it’s out there right now, it’s valid” which I think will end up only encouraging bad practices and resulting in a lot more problems in the future.

The whimzical world of HTML 5

Monday, April 23rd, 2007

A lot of scary stuff is going on in HTML 5 development. You know all the things we’ve learned about browser/engine-neutral code, building standards on top of other standards, using semantic markup, and so on? Well from what I’ve seen, the HTML working group seems to be throwing all of that out the window.

I should first note that I only just recently subscribed to the HTML WG mailing list, and I haven’t yet had a chance to read the full breadth of the discussion, but the talk right now seems to be gathered around something called “bugmode”, a new standard mechanism for browsers to add an infinite number of “quirks modes” which webpages can subscribe to. It’s currently proposed as something like this:

<html bugmode="ie7 gecko1.8 opera9">

This would basically cause these browsers to use snapshots of the respective layout engines when displaying the page. All future versions of Internet Explorer would use the IE 7 engine, all future versions of Firefox would use the Gecko 1.8 engine, and all future versions of Opera would use the Opera 9 engine.

Am I the only one who thinks this is a terrible idea?

First of all, since when do web developers experience significant problems with new versions of Firefox or Opera? I’ve never had anything important break with the release of a new version. I’ve only experienced such a problem in Internet Explorer, since IE has to fix major implementation flaws in very fundamental areas of the standards, like the basic behavior of the width and height properties. IE is uniquely in this position because most of their engine was developed before the current CSS standards were in place (they basically extrapolated off of CSS 1 however they saw fit at the time) and the engine had no development work for half a decade to correct the inconsistencies.

So I personally wouldn’t mind it if IE added some sort of conditional comment type of thing to target new quirks modes in IE, but I don’t see why there should be a whole new attribute added to the HTML standard just for triggering new browser-specific quirks modes.

I should point out that this is still very much a brainstorming session, and this idea may fade away in a couple weeks, but I’m still bothered by the number of people who seem to be taking this discussion seriously.

I talked a little more about this issue in a comment on Chris Wilson’s blog.

Now, Ian Hickson, who was responsible for a lot of the WHATWG Web Applications 1.0 work and will serve as an editor for the W3C HTML 5 specification, has it in his mind that the HTML WG is chartered to deviate HTML 5 from SGML. That is, he believes it is one of the stated intentions of the HTML WG that future versions of HTML will not be SGML languages.

Here is the charter quote from which he derived this idea:

The Group will define conformance and parsing requirements for ‘classic HTML’, taking into account legacy implementations; the Group will not assume that an SGML parser is used for ‘classic HTML’.

The charter uses the term “classic HTML” to refer to non-XHTML HTML. In SGML terms, this would be the markup using HTML 4.01’s SGML declaration, rather than XML as used by XHTML. Currently, no major web browser uses a full-featured SGML parser to parse classic HTML content. Therefore, it is wise not to assume that a browser can handle any SGML rules thrown at it in a new version of HTML. What the charter is saying is that the group will take into account this fact when developing the new standard. It does not say that HTML 5 shouldn’t be parseable by an SGML parser; it just says not to assume that an SGML parser will always be used.

However, Ian Hickson and others in the HTML WG have used this twisted interpretation of the charter as an excuse to unnecessarily break compatibility with the SGML standard. I’ll say it again: unnecessarily breaking compatibility with the SGML standard. I haven’t yet seen anything they’re trying to accomplish with HTML 5 that couldn’t be done in an SGML-compatible way.

They want to circumvent the issue of XML-style self-closing tag constructs causing problems in user agents which support the default SGML syntax for null end tags? Just set NETENABL to NO in the SGML declaration. This wouldn’t expressly allow XML-style self-closing constructs in HTML, and in these cases the “/” character would be considered invalid, but it brings a fully-compliant SGML parser to the behavior that all major browsers currently exhibit. Note that this would only be handled as intended when used on elements defined as EMPTY, as is currently the case in all major web browsers. If you were to truly support XML-style self-closing tags even for non-EMPTY elements (which may indeed require a significant departure from how HTML is currently constructed), that would cause problems with legacy user agents, which the HTML WG charter says to avoid. A change to the SGML declaration would be somewhat of an issue for fully compliant SGML parsers, since they generally use the Content-type header to determine which SGML grammar is being used, and we should probably avoid giving HTML 5 a different content-type than HTML 4, but at least this would keep HTML 5 compatible with SGML so it isn’t impossible for an SGML parser to parse it.

It has also been proposed that HTML 5 should have no DTD. For similar reasons, I ask, why? I’ve seen the proposed elements in Web Applications 1.0, which is roughly considered the starting point for HTML 5 development, and I don’t see anything there that would require the absence of a DTD. I’m curious what the W3C Validator development team thinks of this. The W3C Validator currently operates strictly via an SGML/DTD parser (the upcoming new version of the Validator also comes equipped with an XML parser in order to also check for well-formedness). Without a DTD, the validator would have to hard-code all of the rules for HTML 5. And how exactly does omitting a DTD benefit anyone?

Not only does Ian Hickson want to omit a DTD, but he doesn’t seem to think that a version indicator is even necessary. His proposed new doctype declaration is simply <!DOCTYPE html>. So that’s it. Every future version of HTML had better be 100% backwards compatible. No mistakes may be made or else the HTML standard is screwed for life. I think history has shown us that this assumption that we can reasonably keep a sane standard backwards-compatible forever is a bit unwise. At one time, the isindex element seemed like a good idea. There are plenty of people who want the q element redefined in HTML 5 so that the browser doesn’t display quotation marks by itself. HTML 5 already attempts to redefine some elements and attributes from HTML 4. I guarantee that there are features currently in Web Applications 1.0 which people are going to see as a mistake several years down the road and want to correct. It will end up causing compatibility problems if there isn’t a version number to go along with those changes. Maybe we’ll have to use bugmode after all.

Speaking of new features, let’s talk about some of them. To start off, there are some good things proposed in Web Applications 1.0. I like the section element, nav element, article element, aside element, the redefinition of the dl element, and some of the other stuff. But there are some elements and attributes that just make me scratch my head:

  • Why do we have a canvas element? Why not simply use a script to apply some state to any given element to turn it into a canvas? People who have worked with the Google Maps API are familiar with the idea of using a script to replace an arbitrary element (be it a div, p, etc.) with a new object. In most cases, a canvas element could be replaced with a div element, and then the script just sets it to a canvas just as browsers often allow scripts to set arbitrary elements to be contentEditable. What ever happened to semantic markup? What semantics does a canvas element express?
  • ping attributes? In my a? Thanks for slowing my Web experience and using up more of my bandwidth so that advertising companies can track my habits. Much appreciated. I hope my browser quickly adds an option to disable this functionality, because I for one don’t want it. If a website is going to gossip to others about how I’m using the site, it should put in the effort to do it server-side with its own bandwidth.
  • embed element, why won’t you die? Is it the popular thing these days to just call whatever is out there on the Web “the standard”?

I could go on, but my point is that a lot of stuff is being proposed pretty quickly, and I question the motivation and thought behind a lot of these propositions. People seem to be caught up on how to add such-and-such functionality to web apps rather than focusing on semantics and other things we were supposed to have learned since the old boom days of the Web. I dunno, it just feels like we’ve been through all of this before. Even though this is being discussed in a public forum, the types of propositions are all too reminiscent of the seemingly random “sounded-good-at-the-time” features Netscape and Internet Explorer kept adding during the last browser wars. Does anyone know where I can buy some cheap shock collars?

XHTML 1.1 Second Edition WD allows text/html… why?

Saturday, February 17th, 2007

An XHTML 1.1 Second Edition Working Draft has just been published in attempt to correct some problems with the previous Recommended specification. However, in the Strictly Conforming Documents section, I feel that they have introduced a brand new problem. The concerned paragraph is:

XHTML 1.1 documents SHOULD be labeled with the Internet Media Type text/html as defined in [RFC2854] or application/xhtml+xml as defined in [RFC3236]. For further information on using media types with XHTML, see the informative note [XHTMLMIME].

text/html is now included with application/xhtml+xml as one of the content types XHTML 1.1 “should” be sent as. I see this as a big mistake that should be fixed before the specification advances further. In theory and in practice, the content type header instructs the user agent on what kind of file it is dealing with. Every major web browser parses text/html documents as HTML, not XML / XHTML. The fact that XHTML 1.0 allowed this has lead to a huge overall misunderstanding in what XHTML is and how user agents handle it, which in turn has resulted in the vast majority of so-called “XHTML” documents being in such a state that, if properly handled as XHTML was meant to be handled, they would fall apart. Even supposed web standards experts fall victim to this misunderstanding all the time, as is evident in this list of standards-related sites that break as XHTML.

XHTML was designed in part to progress from the old state of the Web that tolerated invalid markup and sloppy legacy behaviors in CSS and elsewhere. However, because XHTML was allowed to be sent as text/html, people are in essence writing XHTML just like they wrote HTML before, or even worse. The document looks like XHTML but they depend on browsers treating it as HTML. Most so-called XHTML pages on the Web today aren’t well-formed, which XML and XHTML were designed to forcefully not tolerate. XHTML on the Web has been the same disaster HTML was, except the situation is even more complicated than before. XHTML 1.0 has failed.

Now, XHTML 1.1 is about to do the same thing. By allowing the use of an incorrect content type that instructs browsers to use incorrect behavior, the specification authors are promoting the incorrect use of XHTML.

What warrants this change? Is it because most XHTML pages on the Web use the wrong content type? The road to progress is not to simply approve of whatever poor and harmful practices are used on the Web. XHTML is an XML format. That’s the only significant thing that sets it apart from HTML. If you’re going to allow it to be served and handled as plain old HTML, why bother having an XHTML standard at all? To word it another way, if you’re going to allow an XHTML document to be sent as text/html, which in turn would cause all major browsers to treat it as plain old HTML, why not instead recommend the use of HTML for those documents? Doing otherwise simply further pollutes the already poor state of XHTML on the Web.

For further reading about the problems with XHTML on the Web today, see the Beware of XHTML article.