LoginRegisterCommercial SupportContact Us


Development & IT > Doctypes

Doctypes

posted on 3:23 PM, July 18, 2008
Every web page can optionally declare its DOCTYPE using a special tag at the start of the document.  This is not required, but without a DOCTYPE, the browser uses a special mode ("quirks" mode) whose behaviour is not well-defined because it attempts to support a wide range of features from different HTML standards, and to guess which of these (or mix of these) you require. Quirks mode is a powerful (indeed indispensable) feature of most web browsers, but all that guessing and uncertainty can make the rendering of your pages a lot less efficient in some cases.

Declaring a specific HTML standard to use for your pages can give the browser a big head start in guessing your intentions, and make your pages render more efficiently.  Here are the most common choices:
  • HTML 3.2 - a previous-generation HTML standard from the days of the browser wars, HTML 3.2 attempted to reconcile many of the differences between the big players such as Netscape and IE.  However, it supports many markup practices now widely perceived as bad, and is generally only supported for legacy files and documents, not for new content.
  • HTML 4.01 Transitional - Version 4 of HTML cleaned up a lot of the problems with earlier versions, and provided a relatively full-featured and level playing field for all browsers to support.   The transitional variant allows authors to use some deprecated practices from earlier HTML standards.
  • HTML 4.01 Strict - The strict version of HTML 4 does not permit the use of deprecated features from HTML 3 and earlier versions.  Rather, the author is expected to mark up documents exclusively in the modern way.
  • XHTML - a variation of HTML that uses pure XML syntax rules. XHTML is very strict in its markup rules, and does not support a lot of the user-friendly features of HTML syntax.  However, without some of the ambiguity that HTML allows for, XHTML allows for very exact specification of the document structure, which is useful in certain applications.
Each web page declares its standard with a DOCTYPE declaration at the start of the document.  For instance:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

ExSite System Screens

ExSite defaults to HTML 4.01 Transitional.  It uses transitional rather than strict because this allows more for backwards-compatibility with older and badly-written plugins, and with old HTML content that is cut-and-paste or uploaded into the CMS.  However, if you are starting fresh with new content, and select up-to-date plug-ins to work with, then you can change to a different standard if you wish.

We don't recommend moving to an old, out-of-date standard like HTML 3.2, but you may want to switch to HTML 4.01 Strict.  To do that, simply specify your preferred DOCTYPE in your configuration file, eg:

doctype = <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

ExSite's system output should be compatible with Strict mode, so no other changes are required.  However, control panels display content exactly as it is returned to them by various plug-ins.  ExSite does not validate that content as being compliant with the declared DOCTYPE, so old or poorly-written plug-ins may not be strictly compatible.  Fortunately browsers don't consider this a fatal error, and will display the content correctly for the most part.  However, it make make them less efficient at parsing and rendering the content if it does not match the declared DOCTYPE, because when parsing fails due to an incompatibility, the browser may reset into quirks mode and start over.

XHTML

A little more effort is required to switch to XHTML, because XHTML is not compatible with HTML.  You must also declare a new doctype, eg,

doctype = <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

but you must also set some markup rules to ensure that incompatible tags are written in XML format instead of HTML format.  (For example, an HTML linebreak tag <BR> has to be re-written as <br /> in XHTML.)  Adding the following configuration setting will do this automatically in all ExSite system screens:

markup.xml = 1

With these two settings, your ExSite screens should now be written in XHTML markup.

Note that your pages will still be served with a Content-type of "text/html", however.  Technically, XHTML documents could be served with a Content-type of "application/xhtml+xml" to get even stricter interpretation of the markup. However, this may affect the compatibility of your pages with some older browsers, as well as ensuring parsing errors if an incompatible plug-in or piece of injected content contaminates your pages with bad markup.  Leaving the content type header as "text/html" allows you be more resilient in handling these cases.

Template Design

You can design your templates to use whatever DOCTYPE or markup standard you want.  Simply include the DOCTYPE element at the top of your template, and take care to ensure that your template markup is consistent with this declaration.

The nature of the CMS is to stuff content from various sources (plug-ins, direct data entry, cut-and-paste, file uploads, etc.) into your templates, so the stricter you go in your DOCTYPE choice the more careful you have to be in your content handling.  ExSite defaults to a lazy standard (HTML 4.01 Transitional) to allow for maximum leeway in this regard.

Note that the ExSite WYSIWYG editor uses the ExSite system DOCTYPE declaration for formatting WYSIWYG content.  That means it's a good idea to make sure your template DOCTYPE declarations are compatible with the system DOCTYPE.  For example, do not create strict XHTML templates on an installation that uses HTML 4 for system screens.  This would have the effect of composing HTML 4 content in the WYSIWYG editor, and then inserting that into your XHTML templates;  this would cause parsing errors, forcing your pages into quirks mode.  The worst of all worlds would result;  you would be going to the extra effort of writing XHTML markup in your templates, but your pages would end up using the least efficient rendering mode.

Plug-in Development

Developers writing plug-ins should give some thought to the type of markup they return to their callers.  In simple cases, you can just manually mark up your plug-in content and return it in a simple string, but this method quickly becomes limiting.  Any use of self-closing tags (such as BR and IMG) will create an incompatibility between HTML and XHTML systems, even if you are careful to write your tags in the most portable ways.

Using a package that can generate markup in various formats can help here. ExSite has its own built-in markup language generator (ExSite::ML) that automatically picks up the configuration settings noted above and generates markup that is compatible with these different modes.  Using this package (or another such as CGI) will ensure that your plug-in can easily switch between markup systems without problems.

Filed under: programming