XHTML in Layman's Terms

8 comments | Posted: 16 November 05 in Tutorials, by Nathan Smith

Needing Closure:

Everything that goes up must come _ _ _ _.

You know how that sentence is supposed to end. It does not conclude properly though. Instead, there are just four blanks. It is supposed to end with a four letter word (not that one), yet it just drops off. Even though it’s an age-old saying, we are left in a state of suspense, and have to draw our own conclusions. It takes intuition on the part of the reader in order to properly finish the sentence.

Let’s go ahead and do that, so the OCD people out there can relax: “Everything that goes up must come down.” There, that’s better right? No more guess-work, just a simple statement, with a proper beginning and end. It simply wouldn’t make sense if something was thrown into the air, and it didn’t eventually land again. This not only defies gravity, but it bothers our sense of logic as well.

This is not unlike how XHTML differs from HTML 4.01. In XHTML, everything that has a beginning must have an end. In HTML, this is not the case. You can start a paragraph, but never finish it. You can have a list of things, but never actually bring each item to completion. If we had to communicate with someone who never finished sentences, it would be confusing to say the least.

Think of it this way: HTML was meant to be like a candy wrapper, with a twist at each end. However, even the strict specification is somewhat loose, meaning the candy wrapper need not be closed. Now, I don’t know about you, but I’m certainly more likely to eat a Snickers that’s completely sealed than one that’s been sitting open for who knows how long.

Examples:

Below is a code example of some ugly looking HTML that is valid, even under the 4.01 Strict Document Type:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<HEAD>
<meta http-EQUIV="Content-TyPe" content="text/html; CHARSET=utf-8">
<titlE>Testing HTML 4.01</TITLe>
</head>
<body>
<P>This is some sloppy HTML 4.01!
<oL>
<LI>List item one
<LI>List item two
<li>List item three</LI>
</Ol>
</BODy>
</HTmL>

Comparatively, here is the re-write of how it is required to look in XHTML 1.1:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8" />
<title>Testing XHTML 1.1</title>
</head>
<body>
<p>This is some clean XHTML 1.1!</p>
<ol>
<li>List item one</li>
<li>List item two</li>
<li>List item three</li>
</ol>
</body>
</html>

As you can see, the second example is more readable (assuming you know HTML syntax). Everything (aside from Doctype) is lowercase, and everything that is begun is properly finished, and in the right order. This is not only easier to pick through, it saves a browser the guess-work of where each closing tag is supposed to go. For instance, an open <li> can be closed at the end of the line, but it’s also possible to nest things within list items.

A browser doesn’t know until it reaches the next opening tag whether it should close the previous tag or leave it open. So, there is confusion not just for a person reading the code, but for the actual rendering of the page. Implicitly, this extra necessary logic causes longer loading times, despite the extra bytes it may take to type a closing tag each time. Clean code renders faster, period. For more evidence on that, check out this article (not the site’s code).

Self-Terminate:

At the end of Terminator 2, Arnold says “I cannot self-terminate.” In the case of XHTML though, it’s a good thing. If everything that starts needs to have a finish, then there are a few spots in which HTML, even when well-formed, has a few problems. In the example above, there is one such spot in the meta describing content type.

In HTML, it ends with a ">" meaning that technically it would still be open. XHTML fixes this by self-terminating the line with a trailing slash "/>". Other such tags include img, br, and hr – all typically left “open” in HTML, that are now self-terminating in XHTML. This makes it much easier to read for me personally, knowing everytime there’s a trailing slash, it’s the end of the tag.

Summary:

So, laying aside the whole argument over whether to serve it as text/html or application/xhtml+xml, I prefer XHTML because of the inherent rigidity required for it to validate. This of course isn’t to say that HTML can’t be written with the same standard in mind, it’s just not required. Make sure you don’t get me wrong, or think I’m advocating XHTML the de facto way.

This isn’t some sort of elitist attitude. I certainly acknowledge the fact that HTML can be just as semantic and structured as XHTML. Roger Johansson and Robert Nyman are experts in web development, and both make great use of HTML 4.01 Strict. They both properly close everything, and make use of lowercase code. Robert even wrote an article about it here. Roger is clearly passionate about proper closing of tags, as seen in his article here.

Doug Bowman, another legend in the field, makes use of XHTML 1.0 Strict, but serves it as text/html, no big deal. I can see the logic on each side of the arguement in wanting to cater to older browsers, etc. If you’ve visited my site in IE, you’ve no doubt seen my attempt at humor within conditional tags, and so I reveal some of my biases against Microsoft there.

So, let’s get to the point. Have we solved any great mysteries here today? No. Is this article to prove that one Doctype is better than another? No. I just wanted to give a brief overview of what drew me to XHTML initially, and why I still prefer to use it. The basic point it, no matter what you choose, stick to it. Don’t mix and match, and please – keep tags lowercased and terminated.

Discuss This Topic

  1. 1 Robert

    Let’s not forget that it is XHTML not HTML that has helped focus the attention to Web Standards.

    It is not that HTML cannot follow Web Standards, it is just that it can be validated without following Web Standards. Validation of HTML is not set to be strict, even when written as strict, to enforce Web Standards, as XHTML Validation does.

     
  2. 2 Nathan Smith

    Robert: That is exactly the point I’m trying to make. It’s not that HTML can’t be standards compliant, but XHTML must be, which starts everyone out on a level playing field, so to speak.

     
  3. 3 Robert

    Exactly, which really has to make one wonder why there are people who get so upset about XHTML. I tend to think it is a bit of geek snobery.

     
  4. 4 Tilde

    It is not that HTML cannot follow Web Standards, it is just that it can be validated without following Web Standards. Validation of HTML is not set to be strict, even when written as strict, to enforce Web Standards, as XHTML Validation does.

    You’re confusing “Web standards” with “clean code.” A Web standard is a set of rules for some Web technology or language. HTML 4.01 is a Web standard, a very good and very clearly defined standard. It’s all there, in black and white: http://www.w3.org/TR/html401/

    The W3C validator only checks that your document follows the DTD. It doesn’t check whether it follows the standards. You could, for example, use the p element as a heading in an XHTML document, and it would still validate, even though the standard says not to do this. You could use table elements to lay out a page in either language, even though this goes against both standards.

    And if you’re using XHTML Transitional, you can do all kinds of disgusting things—iframes, b and u elements—that you would not be allowed to do with HTML 4.01 Strict. Go ahead, look it up.

    Extra slashes are nice (I mean it, they are), but your choice of elements is much more important. I think we should end this XHTML vs. HTML debate and start just mandating good code in general. ;)

    As it stands now… There are advantages and disadvantages to both languages, but both of them are very, very, very good.

     
  5. 5 Nathan Smith

    Tilde: Very good point. I stand (actually, sit) corrected. I was using the term “web standards” rather loosly. Interesting point about Iframes, as they’re also allowed in XHTML 1.0 Transitional, but not HTML 4.01 Strict. So, in general – let’s just have good clean code.

    By setting the minimum for feature consideration on Godbit.com to XHTML 1.0 Transitional, we are making sure that there is a base-level of code cleanliness for all the people that are new to the concept of clean code. All the veterans, well they can march to their own drum.

     
  6. 6 Tilde

    I know we’re all playing for the same team. ;)

     
  7. 7 Nathan Smith

    Indeed. I think that too many times, we can start knit-picking over little details, and end up alientating each other over opinions on things that are trifle at best, rather than trying to help educate others. When there are two camps, each arguing strongly for each side, it confuses the new people, and they give up on web standards altogether.

     
  8. 8 paul haine

    “And if you’re using XHTML Transitional, you can do all kinds of disgusting things—iframes, b and u elements—that you would not be allowed to do with HTML 4.01 Strict. Go ahead, look it up.”

    ‘b’ is allowed in HTML 4.01 Strict. So is ‘i’, but you’re right about ‘u’ and ‘iframe’

     

Comments closed after 2 weeks.