LoginRegisterCommercial SupportContact Us


Development & IT > Introduction to Web Application Programming

Introduction to Web Application Programming

posted on 5:01 PM, July 27, 2007

An ExSite Web Application has the usual features associated with web applications, most importantly 2-way interactivity with the web site user, in which data is taken in, and dynamic web pages generated in response. ExSite Web Applications also have some additional features made possible by the ExSite Website Operating System framework:

  • a web-friendly back-end adminstrator control panel
  • dynamic content management features that allow you to embed web application "objects" into normal HTML web pages
  • the ability for different web applications to call each other
  • support for AJAX/DHTML dynamic content handling

There are four common types of web applications:

Client-side Applications

Client-side applications use special browser-based languages such as Javascript, Java, and Flash, to run directly on the user's computer. This is good when fast response is needed and little contact with the server is required. Since the program (or links to the program) must be embedded into the web page, client-side applications are also another form of web page content that is managed using ExSite's Content Management tools.

PHP/ASP

These applications embed programs into documents that are published and served like normal HTML files. Because PHP and ASP programs are published right into web page files, they are handled in ExSite as simply another form of web page content. ExSite supports .php and .asp extensions to web page files, indicating that they should be pre-processed before being served to the end user. Since the programming elements of your pages are just another form of content, you also gain the advantages of ExSite's Content Management features, such as versioning and access control.

CGI

CGI (Common Gateway Interface) applications use self-contained programs to dynamically generate web pages. These programs are spawned by the web server when a CGI request arrives, and they generate complete web pages, which are given back to the server (and then to the client). ExSite can operate in CGI mode (and does so by default in the regular distribution), but web applications are not treated as separate CGI programs. Rather, web applications are handled as "callbacks" that the standard ExSite CGI programs (in particular page.cgi, which generates web pages) use to obtain their content and insert them into templates. Existing CGI programs can be modified to conform to this framework using Dynamic Content Drivers (see below).

Embedded Server Applications

Embedded Server Applications are coded right into the web server engine itself. This is typically done for performance reasons, since it is faster if the web server doesn't have to farm out dynamic page generation to other processes. ExSite supports the mod_perl mode of the Apache web server, allowing it to run as an embedded server application. This gives your web application access to the internals of the webserver as it runs. To use this functionality, your application will have to be coded as a Dynamic Content Driver (see below).

Dynamic Content Drivers

A Dynamic Content Driver (or DCD) is a way of coding a web application so that it can behave as a dynamic content module and take advantage of ExSite's website operating system framework. They are called dynamic content drivers because:

  1. their main purpose is to generate dynamic content
  2. their structure is that of a driver: that is to say, it is a simple API that connects the ExSite kernel to the high-level application.

In end-user documentation, DCDs are often referred to as "plug-ins", "modules", or "web applications".

DCDs are written in Perl, the native language of ExSite. There are methods for interfacing DCDs to applications written in other languages, if your existing code is not written in Perl.

DCDs are invoked on the public side using content management tags that are inserted into web pages. A single web page can contain DCD tags for multiple web apps. Many DCDs are like "applets", and accomplish small but useful tasks. Many different DCDs may have to work together to build a full page, by dynamically creating page elements such as:

  • menus
  • time stamps, calendars
  • personalizations
  • web counters
  • and more...

ExSite Dynamic Content Drivers follow the model of Unix device drivers, which is to say that they support three calls:

  1. read() - handles all input to the driver, ie. data moving from the user to the web server, such as form inputs, query strings from the URL, cookies, and so forth.
  2. write() - handles all output from the driver , ie. dynamically-generated content that is destined for a web page.
  3. ioctl() - any other operations that are neither reads nor writes. This includes a few standard requests that the ExSite kernel will use to query a driver to find out what features it supports.

These three calls into the DCD can be invoked using tags embedded into the raw (pre-published) HTML of the page, as described later on in DCD Calling Conventions.

ExSite interacts with the web application only through these three entry points. In particular, information is returned to the viewer only by the write() function. Access control to the application can be concentrated at these points, simplifying your security.

read()

The read() method is used to process all input to the DCD. ExSite provides an Input class library that will manage various forms of input in such a way that they can be shared by multiple DCDs simultaneously. This is strongly recommended over reading your own input directly, since if there are multiple DCDs at work within a single web page, one can potentially steal input that belongs to another. In particular, once stdin is read, it is gone, and the other DCDs won't be able to see the input that may have been intended for them. The Input class library handles the sharing of input between different DCDs, and is documented fully in the appendices of this document.

Among the various possible inputs that a DCD's read() method could be sensitive to are:

  • POST data (stdin)
  • GET data (environment variable QUERY_STRING)
  • cookies (environment variable HTTP_COOKIE)
  • path information (environment variable PATH_INFO)
  • session manager data
  • browser or client system type
  • contents of a local data file
  • ???

It is up to the application developer to determine which values are of interest to the application, and to acquire and store those values in the read() method.

The read() method will accept an optional argument string. Although not often used, this feature is available to give more control over how the read() method functions. The read() method can be called from a web page, using special content management tags, and alternative read options can be passed to it from those same tags. See DCD Calling Conventions, below.

write()

The write() method handles the actual generation of dynamic content. Like the read method, it will accept an optional argument string, which can contain special arguments or parameters controlling the details of the requested page. The write method does not generate any output directly; rather, its output is simply assembled into a string that is returned to its caller.

The argument string that write() receives, comes from the content management tag used to invoke the application's DCD. For example, this CMS tag:

<!--&MyDCD(options)-->

is translated into the following call:

&Modules::MyDCD::write("options");

A useful trick is to format the options string like URL-encoded form data or query string so that it can be treated like a "default" query string to initialize the DCD if no other input is provided. For example:

sub write() {
my ($this,$options) = @_;

# use the query string preferentially, but default to the options string
my $input = $ENV{QUERY_STRING} ? $ENV{QUERY_STRING} : $options;

#...
}

Or, if you are using the built-in input handling:

sub write() {
my ($this,$options) = @_;

my %hard_options = &DecodeString($options); # parse the options string
my %soft_optoins = %{$this->{input}}; # pre-parsed input

# use the user-supplied soft options preferentially,
# but fall back on the hard-coded options coded in the tag,
# or to a default command if nothing at all has been provided

my $cmd = $soft_options{cmd} || $hard_options{cmd} || "default_command";

#...
}
ioctl()

The ioctl() (I/O Control) method handles all non read/write operations. It behaves like a generic function wrapper, receiving a simple string argument (the "request"), and simply does its job, returning whatever value (or values) it deems appropriate. The return value is not treated as content, and will not appear in a web page if called directly from a CMS tag. Rather the caller must know what to do with the return value. If ioctl() is called from a CMS tag in a web page, the return value will be ignored.

ioctl() is used for all communications with the DCD that do not involve either reading web input, or generating web content. One of the major uses is responding to queries from the ExSite kernel to identify the attributes and capabilities of the DCD. For example, the following are standard ioctl() requests that will be used by the ExSite kernel and content management system:

Access
The access level an administrator should have to use this module. Defaults to 2.
Category
The category that the module is displayed under on the web-top. (Defaults to 'Applications'.)
ControlPanel
Two possible values can be returned from this request. If a simple string is returned, it is assumed to be a URL, pointing to a control panel page that adminstrators can use to configure the application's back-end. If a code reference is returned instead, it is assumed to be a directly-callable control panel routine in the DCD. This routine can do anything, but should return a block of HTML (represending the control panel itself, not an entire page) in a simple string value. (More on Control Panels later.) If neither of these values is returned from this request, ExSite assumes there is no valid control panel, and the application will have no back-end interface available from the adminstrator's page.
DynContentParameter
The module can generate its own parameter field on the Insert Web Application dialog. Simply return the HTML for the field, which should have the name 'param'.
isRestricted
returns true/false depending on whether the website needs explicit authorization to use this DCD (isRestricted is true), or whether the DCD is freely useable (isRestricted is false or not defined). (See Restricted Web Applications, below.)
isService
returns true/false depending on whether the DCD can operate as a service. (See Services, below.)
isStatic
should return true (1) if the module's content is always the same for a given page. This allows for some efficiency improvements when precompiling pages.
ModuleName
returns the name of the application. If the DCD does not respond to this request (ie. ioctl() is not coded to return the name in a string when it receives this request), then the raw name (the name of the DCD file) will be used instead.
Publish
returns a code reference to an internal DCD routine that publishes files for better performance. (See Publishing.)
Search
returns a code reference to an internal DCD routine that can add special items into the general search index.
ToDo
returns a code reference to an internal DCD routine that generates a to-do list of tasks requiring the administrator's attention.
Unpublish
returns a code reference to an internal DCD routine that removes files that the DCD has published.

Any particular DCD can respond to these requests or not. In general a DCD can function without any ioctl() functionality at all, but invoking some of these special DCD features will require some ioctl() configuration.

BaseDCD class

ExSite provides a BaseDCD class that can serve both as a template for writing DCDs, and also as a class that simple DCDs can inherit from to save on redundant code common to many DCDs. In particular, the BaseDCD class includes a generic constructor, new(), and a generic read() method.

The generic read() is particularly useful, because it automatically prefetches all QUERY_STRING and POST input, parses it using URL-encoding conventions, and stores the combined results in a hash reference under the {input} key of the DCD object. Furthermore it does this in a neighbourly way so that POST input can be shared with any other DCD that may need to use it. The generic new() method automatically calls read() so that this input is available to the write() method without any additional effort.

These generic methods are useful to many basic DCDs, meaning the developer only needs to code a write() method, telling the DCD how to generate content, and optionally an ioctl() method, if the DCD does anything extra or has an adminstrator interface.

The method for inheriting from the BaseDCD class is simply to include the following lines near the top of your DCD module:

# inherit from BaseDCD class
use Modules::BaseDCD;
use vars qw(@ISA);
@ISA = qw(Modules::BaseDCD);

In addition to generic versions of these standard methods, the BaseDCD class also has a few useful utility functions that may come in handy when programming the DCD:

icon()
returns a URL to an icon for the DCD. This icon is used on the webtop, for instance. ExSite looks for an icon in _Modules/DCD_name/icon.gif (or icon.png). If none is found, it uses _Modules/icon.gif.
module_name()
returns the name of the DCD file, less the .pm extension.
link()
returns a URL back to this DCD. Since the DCD can be inserted on any page, the location of the DCD is not fixed. If the DCD is not running as a service, then link() with no arguments effecively returns the current URL. However, a hash of parameters/values may be provided in the arguments to modify this URL. Setting a parameter/value will modify or add this parameter to the QUERY_STRING for the page. If the value is undef, however, it will be removed from the QUERY_STRING. For example, if the current URL is /cgi-bin/page.cgi?_id=123¶m1=value1¶m2=value2, the the following call will return a URL to /cgi-bin/page.cgi?_id=123¶m1=xxx:

$this->link(param1=>"xxx",param2=>undef)

This is extremly useful when the DCD has to generate new links back to itself for recursive operation. Note that this works, even if the original page is published to HTML.

If the DCD is running as a service, then the new link that is returned will be to whichever page is running that service.

Recursive Pages

Dynamically-generated web pages can be visited recursively with different results; that is you can return to the same page using a different set of inputs, and end up with a different set of content being displayed.

This is no different in principle than a CGI program generating different output with different inputs. In fact, the page generator is a CGI program (page.cgi), so this is exactly what is happening. The important difference is that page.cgi is a clearing house for all of your web applications, so all input and output go through the same program, instead of having separate CGI programs for every function on your site.

The default version of a page is viewed using the URL to the published page, if it exists, or by using the page.cgi program to dynamically render it, ie:

http://mydomain.com/cgi-bin/page.cgi?_id=42

(This tells ExSite to display page number 42 in its database.) This page can call itself recursively, adding additional data to the URL as needed. For instance:

http://mydomain.com/cgi-bin/page.cgi?_id=42&action=view&product_id=295

Using these additional inputs, the DCD embedded in the page may make completely different decisions about what output to return to the page. This process can go on indefinitely, depending on how many variables and values your DCD can respond to.

The QUERY_STRING accepted by page.cgi is URL-encoded by convention, and the data parameters and values specified in it are visible to the internals of page.cgi and to all the DCDs that are embedded in that page.

For this reason, it helps to have some conventions to help sort out what data belongs to whom. By convention, data intended for page.cgi itself uses parameter names with a preceeding underscore (such as _id in the above URL examples). Data intended for DCDs is named with no preceeding underscore.

Creating Recursive Links

Inside a DCD it is not obvious which page you are generating content for. (Indeed it is not obvious that you are generating content for an actual page at all.) This can make it tricky to figure out what URL you need to point to when creating recursive links back to the DCD. The relink() and link() functions are invaluable for generating URLs to link back to yourself.

&relink(param=>"value");

The relink() function (in the ExSite::Misc library) returns a URL to the current page, but with the given arguments added or modified. For example, if the current URL is /cgi/page.cgi?_id=25¶m=oldvalue, then the above call will return /cgi/page.cgi?_id=25¶m=value. If you want to clear a parameter (ie. remove it from the QUERY_STRING entirely, set its value to undef. Note that the relink() function works for any URL-encoded QUERY_STRING on any CGI program, not just page.cgi.

$this->link(param=>"value");

The link() function is similar, but has additional features. For starters, it is a class method inherited from ExSite::BaseDCD. Rather than linking back to the same page, it generates a link back to the DCD. In simple cases, this links back to the same page (since the DCD is embedded in the page), which gives the same behaviour as relink(). However, if the DCD has been configured to be served from a specific page on the site (see interpage services), then the link could jump to another page entirely. Furthermore, link() is sensitive to AJAX/DHTML methods, and will, in certain cases, generate Javascript to dynamically update certain elements of the page, instead of reloading the entire page.

Web Application Collisions

Since any number of DCDs can be embedded into a single page, it is normal for DCDs to see input that was not intended for them. DCDs should be programmed to gracefully ignore input that makes no sense, since that input might have been targeted at another web application on the same page.

It is also conceivable that two DCDs will see input that makes sense to both of them. It may be the case that the user is only trying to communicate with one of the DCDs, but the consequence is that both will respond, possibly with unexpected results. This is called a DCD collision. To avoid this situation:

  • Try to use parameter names that are likely to be unique to your DCD. Parameters like id and number are so generic that they could easily be acted upon by multiple drivers. More unique names like event_id and numWidgets are less likely to be confused.

  • Encode all your DCD input into a single unique parameter, which your read() method can decode and parse out into its sub-parameters.

  • If neither of the above is feasible, be careful about sharing page space with other DCDs that accept inputs.

Control Panels

The control panel of an application is the administrator back-end, for configuring and managing the application. Not all applications have or need a control panel; they inform ExSite of the existence of a control panel by responding to the "ControlPanel" ioctl request. If the return value is a string scalar, that string is taken by ExSite to be a URL that will bring up the control panel. If the return value is a code reference, ExSite assumes that is a directly callable routine that will return the control panel HTML in a string. For example:

sub ioctl {
my ($this,$request) = @_;
if ($request eq "ControlPanel") {
# case 1) return URL of the control panel
return "/cgi-bin/myapp.cgi";
# case 2) return code reference to the control panel
return \&my_control_panel;
}
}

sub my_control_panel {
#...

If your application's control panel is also a form of content (that is, it can be accessed from one of your site's web pages in addition to the ExSite administrator interface), then you can simply include a hook in the write() method to call the control panel method. (This only works if your control panel method is encoded in the DCD itself; if it is an external CGI program, it cannot be used as content.) Example:

# control panel accessible from both admin interface and the site itself

sub write {
my ($this,$options) = @_;
if ($options eq "ControlPanel") {
# insert the control panel into a site page
# (probably want some access control or security checks here)
return &my_control_panel;
}
else {
...
}
}

sub ioctl {
my ($this,$request) = @_;
if ($request eq "ControlPanel") {
# admin interface has built-in security checks
return \&my_control_panel;
}
}

sub my_control_panel {
#...

Porting Existing CGI Web Applications

A quick-and-dirty method for bringing CGI programs into ExSite is to simply run the CGI program from within the DCD, capture the content, and pass it back to ExSite for further processing. Example:

# CGI wrapper DCD

sub write {
my ($this,$options) = @_
# the options string is the URL of a CGI program
$options = /(\w*.cgi)(?(.*))$/;
my $prog = $1;
$ENV{QUERY_STRING} = $2; # careful - this changes our own query string, too!
# execute the CGI program
my $output = `$prog`;
# the ouput will include headers and extraneous HTML, so strip that gak
$output =~ s/(.*)<\/body>/i;
return $1;
}

If your web application consists of Perl CGI programs, then it is relatively easy to completely absorb them into ExSite. In the simple case, you can take each Perl CGI program and convert it to its own DCD. Your original CGI program can be reformatted as a subroutine in this DCD, and the DCD write method set up to call this particular subroutine as required. Your code should be modified to only output the relevant content, since ExSite takes care of the templates and wrappers on its own.

If you have many CGI programs that act in concert to provide all of the functions of your web application, it may be more sensible in the long run to combine them into a single DCD. Here is one method to do this:

  1. convert each CGI program to a subroutine in the DCD module. If the CGI program also contains subroutines, those are also brought into the module, of course. The main subroutine for each script must have a unique name (it's easy just to name it similarly to the original CGI program), and should contain the necessary logic to generate the DCD-specific content and layout, but not additional content (such as HTML wrappers).

  2. give each program an action name by which it can be selected.

  3. configure the write() method of your DCD to act as a switch statement to choose between these actions.

  4. add an action parameter of some kind to your query strings, to select from the various options to the write() method.

If you formally invoked your web application using a series of URLs such as:

http://mydomain.com/cgi-bin/task1.cgi
http://mydomain.com/cgi-bin/task2.cgi
http://mydomain.com/cgi-bin/task3.cgi

Those tasks/URLs are now invoked as:

http://mydomain.com/cgi-bin/page.cgi?_id=XXX&action=task1
http://mydomain.com/cgi-bin/page.cgi?_id=XXX&action=task2
http://mydomain.com/cgi-bin/page.cgi?_id=XXX&action=task3

where the web application has been embedded into page XXX. Additional parameters to the original CGI programs can still be used with the new URLs.

Forms and Shared Input

Well-behaved DCDs share their input with their fellow DCDs, by using the ExSite::Input class. This is especially important when dealing with POST data, since once it is read from stdin, it is no longer available for other DCDs to read. Normally a form's input will be directed at a single DCD, so it may not seem like a big deal to share it. However, if you have several DCDs in a page that can potentially respond to form input, then the first one will grab the input to see if it makes sense to it. Without shared input, subsequent DCDs will not see the input stream, even if the first DCD did nothing with it.

See the example DCD in the appendices for an example of this.

DCD Calling Conventions

DCDs are "called" using tags embedded in the raw HTML of the ExSite Content Management System. The HTML is only "raw" before the page is fully constructed. The content management tags are replaced during the page construction process. Once a page is written to disk as a .html or .php file, it has been fully processed. No more content management tags exist in it, and no more replacements will occur. (PHP and other server-side substitutions will still occur, however, allowing a whole second level of content management to take over...)

If you have a web applet or application called "App", then it is called from your raw HTML using a tag like this:

<!--&App.method(options)-->

where "method" is one of read, write, or ioctl. For instance:

<!--&Debug.write(POST)-->

This would call our Debug applet (from the appendices, below), invoking the write() method, with argument "POST". More formally, it results in the equivalent of these Perl statements being executed:

require Modules::Debug;
my $d = new Modules::Debug;
print $d->write("POST");

Note that the only "methods" you can use in these tags are the three standard methods. All others (eg. showhash(), from the Debug example) will be ignored.

It is useful to note that if no method is given in the tag, then the write() method is assumed, so the above can be written more concisely as:

<!--&Debug(POST)-->

The default output of the DCD can be invoked with no arguments, eg:

<!--&Debug()-->

AJAX/DHTML content-handling methods can be used automatically with minor changes to this notation, as described here.

Alternative read logic can be invoked in some pages but not others by calling read() as needed, for instance (using an imaginary online shopping application):

<!--&Shop.read(currency-cookie)-->

<!--&Shop(action=invoice)-->

(Note that the middle line here is a normal HTML comment, which will pass through ExSite unchanged.)

In principle, you can call ioctl() using CMS tags as well. Since ioctls generate no visible output, this is only useful for executing certain tasks behind the scenes. For example, to receive an email whenever a certain page is viewed, you might use an ioctl like this:

<!--&Shop.ioctl(send_to=morgan@foo.com)-->

Calling Other Applications

It is easy for one web application to call another. It simply has to include the tags to make the call as part of the output generated from its own write() methods. For example, if package A wants to invoke package B, it can do something like this:

# in package A...

sub write {
#...
return "<!--&B(options)-->";
}

This feature also makes it simple to maintain dynamic content libraries, consisting of re-useable utility functions, such as clocks, calendars, and so on, which can be re-used by any other applications. For example:

>!-- display current date and time, using the ExLib DCD --<
Page generated on <!--&ExLib(datetime)-->.

A more direct form of calling can be performed by simply using or requiring the DCD module, and then invoking its methods directly. To ensure portability and compatibility, it is best to restrict your calls to the "public" driver methods read(), write(), and ioctl(), but in fact you are not prevented from making a call to any "private" DCD method using this trick.

# direct call to another DCD
if (eval 'require Modules::OtherDCD') {
my $dcd = new Modules::OtherDCD;
$output = $dcd->write("option-string");
}
else {
$output = "OtherDCD: $!";
}

A similar effect can be achieved by inheriting another DCD, and simply overloading its methods in the cases where the new DCD differs.

Services

Most DCDs will generate new content that is substituted for the same tag on the calling page. This effectively keeps the DCD confined to the space allocated for that tag. If the DCD is placed in a page body, that's no big deal, since the body is usually designed to expand or contract as much as necessary to hold its content. In other cases, however, the DCD may be embedded in a sidebar, header, footer, or other confined space. Rather than expanding the DCD there, you would prefer that the DCD request be served by a different DCD tag that has more room. In some cases, you want the request to be served by an entirely different page. Such DCDs are called "services" because there is a particular tag or page that services the requests that come from other tags.

In-page services

If you want the DCD request to be served by a different tag on the same page, that can be accomplished by selective processing of the DCD input. For example, say the following tag generates a form to search:

<!--&MyDCD(searchform)-->

And the next tag displays search results:

<!--&MyDCD(dosearch)-->

Then, the form can be embedded in some part of the page such as a header, sidebar, or right-align

, while the results of the search are displayed elsewhere (such as the body). Only one instance of the MyDCD object is created on each page, so both of these tags will reference the same code object, and will be privy to each other's private data. Thus, the search form could be primed with the same search term that was used to produce the search results.

Inter-page services

If you want the DCD request to be served by a different page, that is a little more involved. The DCD must be configured to run as an inter-page service, as follows:

  • it must reply true to the "isService" ioctl() request.
  • the service must be bound to a particular page in that site, which will process all requests to that DCD.*
  • the DCD must be coded using service-sensitive recursive linking so that it knows how to find the service page.

* To bind a service to a page, you create a record in the Service table in the ExSite database (this can be done using the WebDB tool, or the Security plug-in). There are two fields that must be set:

  • Name - the name of the DCD, eg. "MyDCD"
  • Page to run this service - select the page here

Restricted Web Applications

To restrict a DCD so that it can only be used by selected websites, it must be configured similarly to an inter-page service:

  • it must reply true to the "isRestricted" ioctl() request.
  • the service must be bound to a particular site*

* To bind a service to a site, you create a record in the Service table in the ExSite database (this can be done using the WebDB tool, or the Security plug-in). There are two fields that must be set:

  • Name - the name of the DCD, eg. "MyDCD"
  • Site - select the website here
Filed under: programming