LoginRegisterCommercial SupportContact Us


POD documentation > Web Protocols > URI.pm

URI.pm

posted on 3:30 PM, July 12, 2009
Handling URIs/URLs


ExSite::URI

A class for parsing and composing URIs (web addresses).

Note that a URI is composed of the following components:

    scheme://authority/path?query#fragment

The scheme defaults to ``http''.

The authority is typically comprised of hostname, domain, and TLD, delineated by ``.''.

The path may be comprised of multiple sequential names, delineated by ``/''. This typically consists of multiple path segments, the first of which refers to a script_name, and the remainder of which is the path_info, which may in turn consist of several sub-segments that are concatenated together. For example, /cgi-bin/script.cgi/A/B/1/2.

The query is typically comprised of multiple key=value pairs, delineated by a separator character (which defaults to ``&'').

Usage

    my $uri = new ExSite::URI(%option);

%option can include:

    separator    => parameter separator character (eg. ";" or "&")

    plaintext    => output plaintext URLs if true
                    otherwise, output HTML URLs
    uri          => a URI string to initialize the object with
    secure_query => encrypt the query data to make it tamper-resistant

By default, the object will be initialized with the current URI, will use '&' as the parameter separator, and will output HTML URIs.

The only difference between HTML and plaintext URIs is whether or not HTML metacharacters such as '&' are escaped. (In plaintext mode they are left unescaped.)

You can change the separator character at any time:

    $uri->separator(';');

The current separator character is used for both parsing URIs and composing new URIs, so you may need to switch if you want to use a different separator character for your input and output.

You can change the text mode with the following calls:

    $uri->plaintext;    # output plaintext URIs
    $uri->html;         # output HTML URIs

At any time, you can extract a structure with all of the parsed URI data using:

    %parsed_uri = $uri->info;

You can also fetch individual URI components using:

    $data = $uri->get($component);

where component is one of the keys in the hash returned by info(), namely ``scheme'', ``authority'', ``path'', ``path_info'', ``script_name'', ``query'', ``query_data'', or ``fragment''. Note that ``query'' is the raw query string, and ``query_data'' is a hash of parsed keys/values. Also, ``path'' is the concatenation of ``script_name'' and ``path_info''.

URI Sources

This class can manage URIs from any source, in principle. Its defaults are optimized for handling ExSite URIs. ExSite URIs use a conventional format which assumes the following additional rules:

  • path
  • The path component of the URI constists of a script_name and extra path_info concatenated together. For example: /cgi-bin/script.cgi/extra/path/data

  • query
  • The query component of the URI consists of multiple key=value pairs, joined by a separator character (``&'' by default).

These are common URI conventions, so this class should be fairly versatile, even with non-ExSite URIs. You might encounter minor issues with non-ExSite URIs that do not use the same conventions. For example, not all query strings are sequences of key/value pairs, so we might not be able to extract intelligible parameters from unconventional query strings. Also, it may not be possible for URI to tell which part of a path corresponds to a script_name and which to a path_info, or even if those are sensible ways to divide the path. In that case, you may get no script_name or path_info parsed out of the URI, and it will all be aggregated into a single path. Attempting to set query parameters or path segments may not give expected results in these cases.

If you do not pass an explicit URI, the object will initialize itself with the URI of the current request, as read from the Apache environment.

You can re-initialize the object with a different URI at any time:

    $uri->setup($new_uri);

Resetting the URI

After modifying the URI (see below), it is often the case that you want to reset it back to its initial state. You can do this:

    $uri->reset();

If the URI was explicitly passed to the object, this will restore the original state completely. If the URI was implicitly determined from the local environment, however, it may be different, depending on how local definitions have changed in the meantime. If the path or query data have been altered in ExSite's input buffers, then the URI will reflect those changes.

Sometimes you want this behaviour for explicit URIs. For example, the object may be forced to an explicit URI that is meant to reflect a local URI that would normally be implicit. (This happens when publishing, for instance, where we spoof the URI and environment for each page that we generate.) To get the implicit reset behaviour on a an explicit URI, do this:

    $uri->use_input();

This tells the object to use any updated input data when constructing the implicit URI.

Query Strings

The query is the part of the URL after a question mark. It is typically broken into key=value pairs by a separator character, which is ``&'' by default.

To change a parameter in the URI:

    $uri->parameter($key, $value);

To remove a parameter completely:

    $uri->parameter($key,undef);  # OR
    $uri->parameter($key);

To change multiple parameters:

    $uri->query(%parameters);

The query string is written as key1=val1&key2=val2..., although the parameter separator character ``&'' can be changed as noted above.

Secure Links

If you make the URI object secure:

    $uri->secure();

then your query strings will be encrypted, making them tamper-proof. This is not recommended for normal usage, as it is quite convenient to be able to inspect and alter query strings. However, you may wish to make exceptions in some cases where sensitive data may be exposed in the query string, or there are security issues associated with editable query strings.

To go back to normal query strings, use:

    $uri->insecure();

(This is a misnomer, since there is nothing really insecure about a normal query string.)

Path Info

The URI path includes the slash-separated values after the domain name and before the '?'. This is typically broken down into two parts, script_name and path_info.

    /path = 
    /script_name/path_info

The script_name is typically broken down into a diskpath to a CGI program, while the path_info is treated as path-like data that is then passed on to this program. For example:

    /script_name   +  /path_info =
    /cgi/page.cgi  +  /store/catalog.html/widgets/blue_grommet

In principle the path_info can be further broken down into segments that refer to different types of resources, which are concatenated together, eg.

    /path_info(CMS segment) +  /path_info(Catalog segment) = 
    /store/catalog.html     +  /widgets/blue_grommet

The breakdown of different path_info segments is done using the Input manager (ExSite::Input), if this is an implicitly defined URI. Once they are defined, you can redefine specific segments in isolation in the URI object. For example, if the path_info is divided into the CMS and Catalog segments, as in the above example, then we can redefine either segment alone as follows:

    $uri->path("CMS","/store/catalog.html");   # scalar method
    $uri->path("Catalog","widgets","red_grommet"); # array method

These new path segments will replace the original path segments, without altering the remaining segments of the path.

If you define a new path segment unknown to the Input manager, then the new path segment will be appended to those that are already defined. For example,

    $uri->path("extra","foo");

would result in ``/foo'' being appended to the existing path, resulting in a new path_info of:

/store/catalog.html/widgets/blue_grommet/foo

To delete a path segment, just pass nothing as the segment data:

    $uri->path("Catalog",undef);
    $uri->path("Catalog");        # equivalent

To completely override the path segments defined by the Input manager, and explicitly define your path, use these:

    $uri->script_name($path);
    $uri->path_info($path);

Service Pages

A service page is a special page in the ExSite CMS that services requests for a particular plug-in. If a page generates a URL that will be processed by that plug-in, it should automatically adjust the target URL so that it redirects to the service page. This is done in the URI class by the service_page() method.

To change the current URI so that it directs to the service page instead of whatever page it happens to be on, use this:

    $uri->service_page($module);

where $module is the plug-in (either a module object, or simply the name of the plug-in).

Not all plug-ins are configured to use service pages, but there is no harm in calling this method in those cases; it will leave the current URI unchanged.

Security - Privileged URIs

Some URIs direct to pages/screens that require a certain level of user access to view. Simply using the URI is not sufficient to view the contents; you also need to be logged in as a user with sufficient access. If you do not have this level of access, you are likely to get a permission denied error message, or be prompted for a login and password.

There is a feature by which you can include authentication credentials in a URI so that the user will not receive an error or login prompt. This trick uses encrypted ``authtokens'' embedded into the parameter string.

There are two things to consider when using authtokens:

To generate an encrypted authtoken string:

    my $authtoken = $uri->authtoken($login_id, $expiry_in_days);

To modify the current URI to include an authtoken granting that URI special access:

    $uri->authorize($login_id, $expiry_in_days);

You then must output the URI (see below) to actually use it. You cannot really modify the URI any further at this point, because then the authtoken won't match the updated URI, and it will fail to validate. It may be necessary to reset the URI or remove the _auth parameter to get back to a working URI. To generate a URL with an embedded authtoken, but leave the URI object in a normal working state so that it can be further modified, use:

    my $auth_url = $uri->authorize_url($lgin_id, $expiry_in_days);

Output

After a URI has been modified using the above methods, you can obtain the changed URI using the write methods.

    $newuri = $uri->write($type);

$type can be ``relative'' or ``full'' (full is the default):

    $newuri = $uri->write_relative();

This returns the URI after the authority. It presumes the same authority as the referrer.

    $newuri = $uri->write_full();

This returns the full URI including the scheme and authority.

Modifications to the URI are cumulative, so you can make changes, output the new URI, make more changes, output again, etc. If you want to reset the URI to its original state so that changes are not cumulative, use the reset method:

    $uri->reset();

This also syncs with the Input manager to retrieve any new path segments that were defined since the URI object was instantiated.