Web Devout tidings


The problem with the NET

April 16th, 2006 by David Hammond

Use of XHTML today is decidedly harmful to the health of the Web when the documents are sent with the text/html content type for Internet Explorer compatibility. This is why I wrote the Beware of XHTML article, explaining some of the reasons why this misuse can lead to future problems and illustrating the situation with some examples. Now I’d like to talk about another potential problem with XHTML sent as text/html: Null End Tags.

As explained in the article, when a document is sent with the text/html content type, most major browsers including Internet Explorer, Firefox, and Opera treat the webpage as if it is actually regular HTML. As a result, they don’t correctly handle self-closing tags, instead treating them as start tags with an erroneous character inside them.

However, there’s more of a problem than just this. Technically speaking, if the browsers are really treating the page like HTML (and therefore SGML), they shouldn’t think that the closing slash is an erroneous character. According to the rules of SGML, they should think that it’s part of a Null End Tag, a kind of shorthand for simple elements that only contain character data. For example, according to the rules of SGML, a title element could be written <title/This is the title of the page/ which would be the equivalent of <title>This is the title of the page</title>. The first slash finishes the start tag, and the second slash represents the end tag if the element can have one.

Now let’s look at a situation in which this would present a problem. Think about this markup: <div>This<br/>is<br/>a<br/>test.</div>. That would work as expected in XHTML, but here’s how a browser should see it when treating the page as HTML: <div>This<br>>is<br>>a<br>>test.</div>. Notice the extra > after each br tag. Since the br element is defined as an empty element (it doesn’t have contents and doesn’t have an end tag), only one slash is relevant for each element. The slash finishes the tag right there, meaning the > character isn’t considered part of the tag, but rather character data after the tag. The presence of a space before the slash makes no difference.

The above markup should result in the following output when treated like HTML:

This
>is
>a
>test.

…That is, if browsers supported Null End Tags. Unfortunately, the major ones currently don’t, meaning that this issue gets entirely overlooked by most web developers. Rather than properly treating the slash as part of a Null End Tag, they treat it as an error and just skip past the character, often resulting in something quite like what it would get with the correct XHTML treatment, but for the wrong reasons.

Keep in mind that there are many issues like this, and use of XHTML should be avoided unless used correctly, with a correct XHTML content type.

Firefox reflow branch reportedly passes Acid2 test

April 12th, 2006 by David Hammond

The Gecko reflow branch, being developed by David Baron to significantly improve fundamental aspects of the rendering engine in Mozilla browsers, now reportedly passes the Acid2 test.

The Acid2 test was developed by the Web Standards Project (WaSP) as a way to demonstrate some inconsistencies major browsers have with literal interpretations of the standards. It covers a wide range of HTML and CSS features, including the box model, selectors, objects, strict CSS and comment parsing, CSS display values, generated content, and more. Although it only covers relatively small portions of the standards, it was designed specifically to illustrate some bugs in every major web browser.

Throughout April 2005, Dave Hyatt focused on getting the Safari web browser to pass the Acid2 test, and succeeded by the end of the month, making Safari the first major web browser to render the Acid2 test correctly in its internal developmental builds. By June 2005, Macintosh browser iCab passed the test, followed by the Konqueror browser for Linux. In December, the Prince XML file converter passed the test, and in March 2006, a technical preview of Opera 9 succeeded in passing it.

Now, developmental builds of Firefox join the list of browsers that pass the test. The reflow branch, which has received those last fixes, will eventually merge with the trunk to premier in Firefox 3.0, currently planned for release in 2007. The upcoming Firefox 2.0 will not have any webpage layout engine changes, but will focus solely on user interface improvements instead. The layout engine changes, including the move to the Cairo graphics backend, are very significant and will require more time for testing, so they will all be incorporated in the Firefox 3.0 release.

The only major remaining graphical browser that doesn’t pass the Acid2 test in developmental builds is Microsoft’s Internet Explorer. The developers have said that passing the Acid2 test is not a high priority because their customers are putting demand on other more specific features. After a several-year-long development halt of Trident, Internet Explorer’s layout engine, the developers are currently working to add support for features other browsers have supported for quite a while, rather than focusing on some of the refinement details illustrated by the Acid2 test.

W3C to standardize the Window object

April 10th, 2006 by David Hammond

Following the recent XMLHttpRequest draft, the World Wide Web Consortium (W3C) is continuing its move to define standards for common-in-practice technologies with a new Window object draft. The Window object is one of the oldest, most commonly used proprietary technologies for use on webpages, and the W3C has set out to define a minimal standard feature set for it.

The specification draft notes that dispite the name, which is “Window” for legacy reasons, the object is not limited to visual user agents. The Window object extends the previously standardized DOM Level 2 AbstractView interface and provides interfaces for document locations and time-based events.

Several commonly supported features of the Window object, such as history navigation, dynamic generation of new windows, alerts, and prompts, are not yet covered in this draft. The draft is only a work-in-progress and is expected to be superceded by following drafts.

W3C to standardize the XMLHttpRequest object

April 6th, 2006 by David Hammond

The World Wide Web Consortium (W3C) has published the first working draft for the XMLHttpRequest object.

XMLHttpRequest is a popular tool for making dynamic requests on webpages to remote servers. It is the cornerstone for what has come to be known as AJAX. It was originally implemented by Internet Explorer 5.0 as an ActiveX object, followed by Mozilla 1.0 as a native object, and then Opera 8.0 as a simple native frontend to the W3C-standardized DOM Load and Save model, currently unsupported by Internet Explorer and Mozilla browsers. Internet Explorer 7 will offer the object natively like Mozilla and Opera.

Although DOM Load and Save became a W3C Recommendation in April 2004, its lack of support and relative difficulty of use has made it less attractive to web developers than the much simpler XMLHttpRequest object. Two years later, the W3C has acknowledged the popularity of XMLHttpRequest and is now attempting to standardize a minimal implementation of the object based on the WHATWG’s research on existing behavior in modern web browsers.

From the draft:

The XMLHttpRequest object is implemented today, in some form, by many popular Web browsers. Unfortunately the implementations are not completely interoperable. The goal of this specification is to document a minimum set of interoperable features based on existing implementations, allowing Web developers to use these features without platform-specific code. In order to do this, only features that are already implemented are considered. In the case where there is a feature with no interoperable implementations, the authors have specified what they believe to be the most correct behavior.

This specification is currently a working draft and is thus subject to change.

CSS Naked Day

April 5th, 2006 by David Hammond

Today is the first annual CSS Naked Day. This is the day we take down our stylesheets in support of structural markup.

As all professional web designers know, it is important to separate webpage content from presentation. The purpose of HTML is to express the content of the webpage — to describe the meaning of each component of the webpage in a way that would interest automated agents such as search engines that are attempting to understand the information that your webpage is expressing. CSS then provides a presenation layer to define how human beings should experience the page content.

If you aren’t following this model correctly, there will likely be problems when the stylesheet is disabled. You may see things shoved up against the right side of the page, images cut up into pieces, border or background images placed oddly here and there, or other oddities that make the webpage very difficult to use. However, if you’re following the model correctly, everything should be nicely lined up against one side of the page, the font and color should be consistant, images don’t look out of place, navigation and heirarchical information structures should be in nice organized lists, and the page should generally look like a well-organized text document ready for printing.

Annual CSS Naked Day is a way to encourage web developers to write proper structural markup. The idea is that the quality of your markup should be reflected by how readable and usable your webpage is when stylesheets are disabled. The exercise is pointless if you are using presentation elements like font, big, and b. In most cases, those elements should be replaced with some combination of structural elements like p, h2, and strong and CSS to define how the elements should appear. There are special cases where elements like b and br should be used, but they are quite rare.

And the big problem this exercise is trying to show is the use of tables for layout purposes. Table elements have structural meaning and are the correct elements to use in cases where the information being presented is tabular in nature. However, when table elements are used just to visually position things on a page, it sends the wrong message to search engines and other agents that try to make use of the markup semantics. The agent is told that each row and column is a set of strictly related data, when in fact the only relationship is the intended visual position of the table cell contents. Imagine that the agent is trying to learn something new from you and you tell it that a pig is to bacon as a list of cities is to both a book and a copyright statement. You can see how the agent might walk away quite confused by your website. That probably isn’t what you want to do to your blind visitors who expect the page to be read out in some kind of logical order, or to search engines if you want them to give you a high ranking.

So check out how your page looks without a stylesheet. In Firefox, go to the View menu » Page Style » No Style to see the current page without stylesheets. In Opera, go to the View menu » Style » User mode and uncheck any boxes below it. Internet Explorer and Safari don’t have a straight-forward way to disable author stylesheets even though it is a CSS 2.1 conformance requirement. Fix up your page if you need to, make sure it validates with a Strict doctype, and have a nice CSS Naked Day!