Now that y'all have read my sermon Back To Basics: HTML some of you might be scratching your heads, "Where do I start?" Well, you can head to a book store and pick up a 990-page HTML Bible or something like that. Make it two: two big thick "ultimate guides" that unleash and debunk HTML. I'm sure you can ditch that BBQ party or a trip to the mountains and spent your time mastering HTML. Or...
Or you can do to W3C and download those same schemas you see in DOCTYPE declarations. If the word DOCTYPE looks new to you, please read What's In Your DOCTYPE? first and come back.
W3C declared several DTDs which are listed on their site. Yep, there's a physical dtd file behind HTML and XHTML DOCTYPEs. You can download all three XHTML DTDs—Strict, Transitional and Frameset—from the XHTML Recommendation page (zip).
Lets take a look at the XHTML 1.0 Strict DTD and see what we can learn. Pull up xhtml1-strict.dtd from the DTD folder.
Block-Level and Inline-Level Elements
This is a quick refresher on block- and inline-level elements. In (X)HTML some elements are defined as block-level, i.e. each of them generates an element box that fills its parent element and generates line breaks before and after the element box. For example, <p> and <div> are block-level elements.
Other elements are defined as inline-level, i.e. an inline element generates a box inside its parent element and does not generate line breaks. For example, <span> and <strong> are inline-level elements.
Here comes the really interesting part. The spec tells you which element may contain what type(s) of other elements. For example, how would you know which code snippet is correct:
<p>
<div>blah-blah</div>
</p>
or
<div>
<p>blah-blah</p>
</div>
The paragraph tag, <p>, is "defined" as follows:
<!ELEMENT p %Inline;>
<!ATTLIST p
%attrs;
>
This declaration tells you what elements can appear inside <p>. Go back and see which elements are grouped into the %Inline "type". Actually, there are quite a few: br, span, img, strong, abbr, q, etc. A whole bunch of inline elements. Block-level elements are excluded which also goes for <div> (except ins, del and script, but we won't touch these right now). Therefore, the first code snip is incorrect. Any validator will tell you that <div> is not allowed within <p>.
Let's see how <div> is defined.
<!ELEMENT div %Flow;> <!-- generic language/style container -->
<!ATTLIST div
%attrs;
>
The <p> element is grouped with block elements and therefore is part of the %Flow "type". Thus, the second code snip is correct.
One more example. If you have server controls on a page which require postback handling you'll end up with two hidden fields in your form:
<form ....>
<input type="hidden" name="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" value="" />
</form>
The mighty validator doesn't like it. Let's consult the spec.
<!-- form uses %Block; excluding form -->
<!ENTITY % form.content "(%block; | %misc;)*">
...
<!ELEMENT form %form.content;>
<!ATTLIST form
%attrs;
action %URI; #REQUIRED
method (get|post) "get"
enctype %ContentType; "application/x-www-form-urlencoded"
onsubmit %Script; #IMPLIED
onreset %Script; #IMPLIED
accept %ContentTypes; #IMPLIED
accept-charset %Charsets; #IMPLIED
>
That is, a form wants block-level elements inside it plus a couple of "miscellaneous" ones, such as noscript, script, ins, del.
The input element is grouped with other inline-level elements and should not be nested in the form without a block-level parent. This code would be just right:
<form ...>
<div>
<input type="hidden" name="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" value="" />
</div>
</form>
You can also put a table inside the form which is a block-level element as well, and it'll make the validator happy too.
On a related note, some elements are empty, i.e. they'd better not have other element inside them. The spec in question defines several of them: hr, img, br, input, area, etc. Next time you're tempted to cut corners with <div /> don't—div isn't an empty element. Neither should you do <img...></img>. It should be <img ... />. Validators won't complain and browsers will put it with this code, but don't trick them. My guess is a good browser ignores such a tag (I sense a big flame coming my way). For example, W3C discourages the use of empty <p> elements:
We discourage authors from using empty P elements. User agents should ignore empty P elements.
By the same token you can look up which element attributes are required and which are optional. Taking another look at form you'll see that the action attribute is mandatory while other ones are optional. For that matter, the method attribute defaults to "get".
If you're still awake, congratulations: you're done with XHTML DTD 101. Spend some time to get familiar with the DTD and instead of digging through PDF specs you'll be able to look things up in no time!