This is a toolkit of convenience methods for generating tags and
elements in an SGML-like markup language. It has many shortcuts for
generating HTML markup, but can also be used for XHTML, and even XML.
Most commonly, it will be used to assemble well-formed HTML elements
and tags. This gives less risk of broken HTML, better portability
between HTML and XHTML, and the ability to build complex and aggregate
structures in a single convenient call.
It can be used to generate snippets of markup, or to assemble numerous
snippets into partial or complete documents.
The settings in $config{markup} define basic syntax rules:
- xml
-
Use xml syntax for self-closing tags, ie. <tag />, and force
tags to lower case.
- minattr
-
Minimize unset attributes, eg.
selected instead of selected="selected"
- safe_content
-
Assume content is HTML-safe. If not true, then content will be
HTML-escaped before insertion into the document.
- safe_attributes
-
Assume attributes are HTML-safe. If not true, then attributes will
be HTML-escaped before insertion into the tags.
- nl
-
Append newline characters to the end of each element, for formatting purposes.
Creates an ML object to work with. %opt contains settings to
override the default config settings, noted above.
You also pass a doc option, which initializes the ML object with a
preformatted (already marked-up) document.
Example: create a markup language object with XML syntax rules:
my $ml = new ExSite::ML(xml=>1);
Returns the current document as a string.
Prints the current document to stdout. The second form includes a
content-type header.
Sets the current document to $text.
Appends $text to the end of the current document.
Prepends $text to the beginning of the current document.
Blanks or resets the current document.
Sets a document preamble, which will be prepended to the whole
document before Writeing or Printing. This is typically used
for doctype declarations such as:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "href="http://www.w3.org/TR/html4/loose.dtd">
or
<?xml version="1.0"?>
The value of $text should contain the entire preamble string.
Note that the doctype is not considered part of the document contents,
so it will always appear at the top, no matter how or when you use
Append(), Prepend(), or Wrap().
A markup element is a tag with a set of attributes and some contents.
The tag is essential, and the contents and attributes are optional.
Generates a markup entity. We do not validate the element against
any DTD or other standard. We simply generate a text string with an
SGML-like or XML-like structure, eg.
$ml->Element("tag"); # outputs <tag> in non-xml mode
# outputs <tag /> in xml mode
# outputs <tag></tag> if element normally holds content
$ml->Element("tag",
"contents"); # outputs <tag>contents</tag>
$ml->Element("tag",
"contents",
{attribute=>"value"}); # outputs <tag attribute="value">contents</tag>
The main purpose is to ensure consistent formatting, valid syntax, and
easier switching between XML- and non-XML-based formats (specifically,
HTML and XHTML).
$tag is a simple string, used as the tag name. This should be a
word (no whitespace), but this is not validated.
$attributes is a hashref of key/value pairs; values will be quoted.
Attributes with undefined values will be output as either
name="name"
or
name
depending on the minattr configuration option.
$data is the contents, which can be a scalar, array or hash. This
will be interpreted by the Content() method, below.
The safe_content configuration setting indicates that string content
can be safely inlined right into the element; if false, the content
will be escaped (using HTML escape values) first. The safe_attributes
configuration setting has the same effect for attribute values.
This method only generates regular elements, not other markup such
as comments, document types, CDATA, etc.
Generates an HTML-style comment tag, ie.
<!-- $text -->
Double-hyphens are removed from the comment text to prevent accidental
premature closure of the comment.
All HTML 4 strict elements have a shortcut method
$ml->tag($content,$attributes);
where tag is an element name. This is equivalent to
$ml->Element($tag,$content,$attributes);
$content and $attributes are optional. $content is run
through Content() (above) to resolve data structures.
Tag shortcuts for the following elements are supported:
a, abbr, acronym, address, applet, area, b, base, big, blockquote,
body, br, button, caption, cite, code, col, colgroup, dd, del, dfn,
div, dl, dt, em, fieldset, form, frame, frameset, h1, h2, h3, h4, h5,
h6, head, hr, html, i, iframe, img, input, ins, kbd, label, legend,
li, link, map, meta, noscript, object, ol, optgroup, option, p, param,
pre, q, samp, script, select, small, span, strong, style, sub, sup,
table, tbody, td, textarea, tfoot, th, thead, title, tr, tt, ul, var.
Note that if you provide content and attributes, the element will be
built accordingly, even if HTML does not support attributes or content
for that tag. In other words, we compose a syntactically complete
element, not a semantically correct one.
Example:
my $link = $ml->a( "Google", { href => "http://google.com" } );
Elements can be cumulatively aggregated in the ML object. The
"current document" is just the current blob of marked-up text that has
been accumulated. Text can be accumulated from the top-down,
bottom-up, or in layers like an onion.
To add marked-up text to the beginning of the current document:
$ml->Prepend($text);
Note that in this and the Append() method below, the text is not
validated, which means you can break your syntax if you stuff your own
tags into it carelessly.
To add marked-up text to the end of the current document:
$ml->Append($text);
Or, you can use the auto-append methods. The methods Element(),
Comment(), and the HTML tag shortcuts above all have an auto-append
version which automatically appends their output to the current
document. The auto-append method begins with an underscore but the
rest of the method is the same.
# compose a link and return it to the caller
$ml->a( "Google", { href => "http://google.com" } );
# compose a link and append it to the current document
$ml->_a( "Google", { href => "http://google.com" } );
To wrap the current document in a markup element (ie. create a markup
element with the current document as its content):
$ml->Wrap( $tag, $attributes );
As a convenience you can use the auto-wrap methods. The methods
Element(), Comment(), and the HTML tag shortcuts above all have
an auto-wrap version which automatically uses the current document
as the element contents. The auto-wrap method begins with a
double-underscore but the rest of the method name is the same.
# enclose current document in a body (with optional attributes)
$ml->__body( $attributes );
# prepend a head section (containing a title element)
$ml->Prepend( $ml->head( $ml->title("Document Title") ) );
# wrap the whole shebang in an html tag
$ml->__html();
Note that it is easy to create bizarre HTML constructions. The caller
is responsible for nesting their elements appropriately. For
instance, the following will be processed without complaint, despite
not being a legal HTML construction:
$ml->_p("A paragraph."); # add a paragraph to the document
$ml->__style(); # wrap document in style tags (!?)
Given a data structure of nested elements, we try to transform it into
markup text. The elements of our data structure may refer to text,
element parameters, or more data structures that have to be resolved
recursively. We do not necessarily know the tag in all of these
cases, but we can often infer the tag based on the element we are
nesting under (eg. if we are in a <ol>, then a nested
element is likely to be a <li>).
If $data is a scalar, it is taken to be explicit text or mark-up.
If $data is an arrayref, it is taken to be an Element description
([tag,content,attributes], or [content,attributes]), a list of
explicit markup text, or a list of more data structures.
If $data is a hashref, it is taken to be a set of tag => content pairs.
In cases where we are not given the tag explicitly, we can often
determine it from context. (Eg. if we are in a <ol>, then
a nested element is likely to be a <li>.) To get a
context, we need to have been called recursively from a parent
element, and that parent element must define a default child tag (see
the %default_child variable).
To get a sense for how this works, you can examine the HTML shortcut
calls in the following examples. The shortcut call defines the
top-level element, which gives a context for determining how the
content data structure should be converted into markup.
List Examples: These calls will all generate lists, using various
structures to represent the list items.
# list items are explicit contents
$ml->ul( [
"list item 1",
"list item 2",
"list item 3",
] );
# list items are element descriptors (tag, content)
$ml->ol( [
[ "li", "list item 1" ],
[ "li", "list item 2" ],
[ "li", "list item 3" ],
],
{ type=>"i" } );
# list items are hashes of tag=>content
$ml->dl( [
{ dt => $title1, dd => $description1 },
{ dt => $title2, dd => $description2 },
{ dt => $title3, dd => $description3 },
] );
Table Examples: These calls will all generate a 2-column table with
numeric data in the cells. Some have header and footer rows, others
do not.
# simple table, no headers or footers
$ml->table( [
[ 123, 456 ],
[ 789, 123 ],
[ 456, 789 ],
],
{class=>"Report"},
);
# table with head, body, foot, and caption
$ml->table( { caption => "Sample Table",
thead => [
[ "head1", "head2" ]
],
tbody => [
[ 123, 456 ],
[ 789, 123 ],
[ 456, 789 ],
],
tfoot => [
[ 1368, 1368 ]
],
},
{class=>"Report"},
);
When creating markup, there are a few parameters that we use for defining
some basic nesting and formatting rules.
- alltags
-
This references a list of all standard tags. This is not used to
validate tags, so you can create tags not in this list. However, it
is used to help identify items that look like tag names in data
structures.
- emptytags
-
This references a list of tags that are not supposed to contain
content. If these tags are created with undefined content, they will
result in a single (self-closing) tag; otherwise, an open and close
tag will be created.
- no_nl
-
We normally terminate all closing tags with a newline character for tidier
formatting. In inline elements, newlines are treated as whitespace, and
can cause minor formatting defects in some cases. Tags in this list will
not receive any terminating newline.
- default_child
-
This is a hashref of tag => child-tag, which helps us guess what
element nests underneath a parent tag, if it has not been explicitly
defined in a data structure.
- default_order
-
This is a hashref of tag => list of tags, which helps us figure
out which order to output tags when they have been provided to us in
an unordered hash.
The ML class includes default rules for all of the above, which are
sufficient for HTML 4 or XHTML composition. If building a markup
document of a different type and you want to make use of the data
structure feature to build complex markup in one call, then you will
need to provide a set of rules to replace the default HTML rules. You
can set these rules by providing alternate definitions for the above
parameters, like this:
$ml->set("emptytags",["foo", "bar"]);
To make your output XHTML-compatible, set the xml option when
creating your ML document, or set markup.xml=1 in your
configuration file to make this the default. This forces tags to be
lower case, and changes the format of self-closing tags. For example,
the call
$ml->Element("BR");
will produce <BR> if xml is off, and <br /> if xml is on.
Note that
$ml->br();
will produce a lower-case br in all cases.
This effectively changes the syntax to xml, but it still does not
validate against a DTD. It also does not manage the syntax of
explicitly-coded markup that may have been passed in as content. It
only affects the syntax of elements it itself has generated.
For instance, the following will generate correct output all of the time:
$ml->br();
However, if the safe_content flag is on, then the following will
not produce correct XML, since the content contains explicitly-coded
markup that is not XML-compatible:
$ml->p("Linebreak<br>");
(If safe_content is off, then the br tag will be escaped and
will be presented as regular content, which keeps it XML-compatible,
but may not be what the author intended.)
If you can avoid the latter situation, then it is possible to switch
quickly from HTML to XHTML with a single configuration setting.
This class has a lot of convenience functions for HTML markup, but it
actually doesn't care about the tags you use. That means you can use
it to generate XML documents that have no relation to HTML. For
example, here is a recipe to generate an XML RSS file:
# make an RSS feed
my $rss = new ExSite::ML(xml=>1);
$rss->Doctype('<?xml version="1.0"?>');
# note auto-append calls
$rss->_Element("title","My Feed");
$rss->_Element("description","About My Feed");
$rss->_Element("link","http://myurl.com");
# make an item - do not use auto-append methods
$item = $rss->Element("title","1st Item");
$item .= $rss->Element("description","1st description");
$item .= $rss->Element("link","http://link.com");
# now append this item to the document
$rss->_Element("item",$item);
# repeat for as many items as necessary
# wrap the document up - note wrap calls
$rss->__Element("channel");
$rss->__Element("rss",{ version => "2.0" });
$rss->Print;