LoginRegisterCommercial SupportContact Us


Development & IT > Data Persistence

Data Persistence

posted on 1:54 PM, December 24, 2008
Because HTTP is a stateless protocol, the normal behaviour of websites is to forget everything about a visitor after a page has been served.  If we are to remember things about a visitor on subsequent page views, we must reload all of that data from storage when the visitor returns.  This is straightforward enough in principle—simply re-execute all of the SQL queries to load that data, every time the viewer requests a page.

This "brute force" approach has its drawbacks, however.  You are repeatedly executing the same queries over and over, which can be computationally expensive.  This will impact how much traffic your website can serve, since it may be spending most of its resources repeating similar operations over and over.  What is desired is a way to remember data once you have gone to the effort of fetching it, so that it is much less costly to re-fetch it again.

ExSite has an optional feature, the persistent data store, which is useful for managing data persistence in this way. The persistent data store can be used to:
  • improve performance on repetitive queries and computations
  • reduce traffic/load to the database server
  • reduce start-up time by caching configuration files and settings
  • preserve the state of a user's visit
The persistent data store is sometimes called the "store", meaning a place to store things (not an e-commerce store).

Enabling the Persistent Data Store

To use the store, you first need to initialize the store database.  The store.pl utility script distributed in the bin directory of the ExSite distribution can be used to do this.  From your cgi-bin directory, run the command:
../bin/store.pl --reset
This will create the STORE database and the STORE.lock lockfile in your cgi-bin directory.

Now you need to configure ExSite to make use of the store.  The store configuration is encoded in the routine &store_conf() in the file cgi-bin/Local.pm.  This routine is disabled by default by being renamed to store_conf_disabled().  Simply removed the "_disabled" part to enable it.  (Add the _disabled part back at any time to temporarily disable the store.)  The contents of this routine are configuration parameters that can be modified to "tune" the store.

(The reason that store configuration is not placed in the system configuration file, exsite.conf, like all other configurations, is that we try to cache the system configuration in the store itself for faster loading, so we cannot read the general system configuration until after the store is configured.)

Database Cache

SQL queries are computationally expensive compared to fetching items from the persistent data store.  That means we can cache the results of our SQL queries in the store and save a lot of time when we repeat those queries.  This can get quite complicated, however, since if we alter the database, the results of previous queries may change.  We have to know which cached items are safe to continue using, and which ones should be forcibly expired from the cache because they may have been superceded by new data in the database.

This is managed for you by the Cache class, which provides a specialized, higher-level interface to the Persistent Data Store, specifically for caching SQL query results.  You do not need to make direct use of %store to use the database cache;  the Cache object will handle all interactions with %store on your behalf.

Note that the database cache is automatically configured and used whenever you instantiate a database object, so you do not have to do anything to get the benefit of this subsystem.  However, you may want to make direct use of the Cache object if you perform any custom queries that you would like to cache the results of.

If inspecting the contents of the Persistent Data Store directly, you will see numerous items prefixed with the label "cache:...".  These are used by the Cache object to track the cached queries and which ones need to be forcibly expired on updates.

The database cache still works even if the Persistent Data Store is not enabled.  However, it only caches query results for the duration of that request.  On a subsequent page view, it will go back to the original database again.

Disabling Persistent Caching

There are some cases where you may not want your queries to be cached.  One case is when your database is being modified by 3rd-party applications that do not know about ExSite's caching system.  Then the cache can get out of sync with the database, and data integrity problems can result.

To disable persistent caching, without disabling the persistent store entirely, use the following configuration setting:
cache.persistent = 0
If you do this, queries will still be cached, but only for the duration of the current request.  Subsequent requests will have to fetch the data all over again.

Sessions

Session management is used to track the state of a user's visit.  You can store arbitrary key/value pairs in the users' %session hash, and this information will still be available in this hash on subsequent page views if persistent storage is enabled.  (If not enabled, the values in %session are forgotten after each page view.)

Unlike the general %store, the contents of %session are unique to each user.  In other words, a user sees only the contents of their own %session, and changes to the contents of %session are only seen by that user.

If inspecting the contents of the Persistent Data Store directly, you may see numerous items prefixed with the label "session:...".  These are individual private sessions, each one corresponding to a particular user.  When you manipulate the contents of %session, you are changing the values inside one of these only.

If persistent storage is not enabled, you can still read and write to %session, but the changes will not persist.

Configurations

If ExSite detects that persistent storage is enabled, it will try to save its configuration files there, to avoid reloading and parsing them on each page view. These include the system conf files as well as the dbmap database description files.

An important consequence of this is that if you change any of these files, ExSite will ignore those changes, preferring the stored version instead.  To work around this, you should go into the Persistent Data Store plug-in, and clear any affected items in the store.  This will force them to reload.

Performance

The persistent data store has a significant effect on performance, especially reducing the amount of time spent on repetitive database queries.  If similar pages are being hit repeatedly, subsequent page views will often not hit the database at all, because the necessary queries (eg. CMS content lookups, authentication requests) have all been cached by earlier page views.

The effect is most significant on smaller systems that run the database on the same server as the website, since you do not have to divert as many system resources to the database.  On systems with separate database servers, the effect may be different, depending on how heavily loaded the database servers tend to be.

Badly-behaved robots can clobber a site with hundreds or thousands of hits in a very short time span.  In this case, the store caching may not help as much as you would like because by the time a query result has been cached, the other page views have already checked the cache, found nothing, and issued their own queries to the database.  The store works best once the cache has been primed, so the effect of a bad robot will be reduced on a site that receives regular traffic that keeps the cache full.  Conversely, the effect of a bad robot will be worst when a robot strikes the site after a period of low activity that expires everything from the cache.  You can tune the lifetime of stored items in the store configuration to mitigate these effects.

In an extreme denial-of-service situation, the store may itself become a point of resource contention, as different processes struggle to get the lock on the database that they may need to make updates.  We recommend utilizing the "busy" kill-switch as a counter-measure against DOS and DDOS attacks against your website, if this is a concern.

In a plain CGI setup, the persistent data store can increase throughput by 25% or more.  In a Persistent Perl setup, the increase can be more than 250%.  And Persistent Perl together with the persistent data store can improve throughput over plain CGI by a factor of 10.  (Factors of 5 to 20 improvement are not unusual, depending on the specifics of the system and the nature of the traffic.)

Maintaining the Persistent Data Store

storeAdm

Use the StoreAdm plug-in to perform manual maintenance of the store contents. For example:
  • you want to change some system configurations, and need to clear the old configurations from the cache
  • you want to inspect the contents of an item in the store for debugging or technical support purposes
  • you want to terminate the session of a certain user (by deleting their session entry in the store)
  • you want to clear and reset the entire persistent data store to a pristine state.

store.pl

ExSite also ships with a command-line tool called store.pl, which can be used to inspect and maintain the store from a shell session.  This is included in the bin directory of the distribution, but should be executed from the cgi-bin directory. Examples of use include:

List all items in the store, and their expiry times:
../bin/store.pl --list
Display a particular item in the store:
../bin/store.pl itemname
Reset the store back to a pristine state:
../bin/store.pl --reset
Reclaim unused diskspace:
../bin/store.pl --rebuild
Note that this last command is useful when your store data file grows large over time.  The store does not free unused disk space when it expires old items, but holds onto the space for re-use.  It is a good idea to periodically rebuild the store file to free up disk space that may not be needed.

There are other switches to store.pl, as well.  Consult the comments for more information.

Task Manager

The ExSite task manager can be used for automated maintenance of the persistent data store.  Tasks can be set up for the StoreAdm plug-in, to run at hourly, daily, or weekly intervals, or at preset times.  The following task actions are accepted:

rebuild - compacts the store database file to reclaim unused disk space

reset - resets the store to its initial state (clears all data)

purge - clear all expired data.  Note that purging (ie. garbage collection) should be done automatically by the store, so it is not strictly necessary to set up an automated task to do this.

delete - clear a specific item (which should be named in the task ID field)

Programming with the Persistent Data Store

The persistent data store is accessed through the tied hash %store.  To save a piece of data persistently, simply place it into this hash under a unique key.  For example:
$store{foo} = $bar;
$bar in this case can be a scalar, or a complex structure.

On a subsequent page view, you can quickly retrieve this data by looking it up under the key you specified:
my $bar = $store{foo};
By default, data placed into the store will persist for a limited time.  (The default is one hour.)  After that, garbage collectors will dispose of your data to free up space for other items.  That means if you are using the store, you have to be prepared to reload the data from the original source if you do not find it.  (Then you can place it back into the store again, if you like.)  Used in this way, the store acts as a cache for the original data.

Because the persistent data store is an optional feature, if it is disabled, then %store behaves like a normal hash variable.  Anything you place into it will persist only for the remainder of the current request.  There is no harm in using this hash variable, and as long as you don't require data to be present in the store, most code need not know whether the store is enabled or not to take advantage of its benefits.  Code should always check the store for a "fast" copy of some data, and then fall back on the original source of the data for a "slow" copy if the former is not found.  For example:
# this code works whether the store is enabled or not
my $bar = $store{foo};
if (! $bar) {
# not found in store;  fetch from database instead
$bar = $db->get_query("ReallySlowQuery");
$store{foo} = $bar;
}
If there is a case where you do need to know whether the persistent store is enabled, you can use this:
(tied %store)->is_persistent();  # returns TRUE if store is persistent

Managing expiry times in the store

If you want to control when your item expires from the store, you have two options at your disposal.  First, when you save the item in the store, you can explicitly specify the expiry time (as a Unix timestamp, eg. from the time() function):
(tied %store)->put($key,$value,$expiry);
Alternatively, if you want to change the expiry time of an item already in the store, you can modify the expiry time alone:
(tied %store)->renew($key,$expiry);
In either of these cases, if you specify an expiry time of zero, that will be understood to mean that the item should never be expired.

Store Implementation

The store is implemented at a low level using a DBM database, and GDBM_File in particular.  Data is encoded in the store using the Storable package for linearizing Perl data structures.  See ExSite::Store.pm for the ExSite implementation built on top of these technologies, which manages expiry times, locking, garbage collection, etc.

Filed under: IT, programming