LoginRegisterCommercial SupportContact Us


Content Management > Multilingual Content Management

Multilingual Content Management

posted on 2:41 AM, July 26, 2007
ExSite provides a number of tools and facilities for managing multilingual web sites.

Character Sets

Which languages can be represented natively in your web site templates will depend on which character set your site uses. Character sets are typically set using the HTTP Content-Type header, but it is much easier to use the equivalent meta-tags in your templates.Typical choices are:




The former (ISO-8859-1) supports Western European languages, and is the default on many older browsers and web sites. The latter (UTF-8) supports a wide range of European, non-Latin, and Asian languages, and is the default on more modern browsers and web sites. They are compatible for basic Latin characters (eg. US English), but not for non-Latin languages.

If unsure which character set to use, we recommend UTF-8, which has the widest language support. As of ExSite 3.5, UTF-8 is the default character set for all ExSite administration screens and the HTML editor. This can, however, be changed in in the system configuration file (system configuration parameter "charset") -- if your web sites are predominantly in one character set, you may find it most reliable to have the ExSite admin screens and editor also be in that character set, eg.

# exsite.conf
charset = ISO-8859-1

Note that with some browsers, you can still enter unsupported characters into the editor (eg. pasting Chinese characters into an editor that is configured to use the ISO-8859-1 character set). The editor will convert the characters to HTML Unicode escape sequences so that they render correctly. This is sufficient to make the characters visible to a human reader (assuming their browser supports those escape codes). However, since the characters are escape sequences and not native characters, there may be problems with machine readability (which will impact accessibility and search engine compatibility).

Attempting to do the reverse will usually cause problems, however. That is, if the system character set is UTF-8, then it is easy to enter native international characters into the editor. However, if that content is then inserted into a template that supports a more limited character set such as ISO-8859-1, it will turn into gibberish.

How to Enter Other Character Sets

Native speakers of languages with alternate characters sets may be able to type the characters directly, if they have an appropriate keyboard.

The ExSite editor has tool for entering accented characters, which will suffice for typing the following accented Latin and Western European characters:

À
Á
Â
Ã
Ä
Å
Æ
Ç
È
É
Ê
Ë
Ì
Í
Î
Ï
Ð
Ñ
Ò
Ó
Ô
Õ
Ö
Ø
Ù
Ú
Û
Ü
Ý
Þ
ß
à
á
â
ã
ä
å
æ
ç
è
é
ê
ë
ì
í
î
ï
ð
ñ
ò
ó
ô
õ
ö
ø
ù
ú
û
ü
ý
þ
ÿ

Note that the accent tool inserts HTML escape codes for these characters, not the native character, so it should not matter which character set you are using. However, it may have implications for accessibility and search engines, which may not understand the escape codes. It is most useful for entering occasional foreign words, rather than for full support of a native language.

Otherwise, the only method for reliably entering alternate language text using a keyboard that does not support that language, is to cut-and-paste it from an alternate source.

Multilingual Web Sites

We present here four strategies for maintaining a multilingual web site. Which one you choose will depend on the quantity of content, and how tightly your multilingual content is coupled -- in other words, does language B have to mirror language A as closely as possible, or do you simply need to present some basic text in an alternate language? How you work with translators and how proficient your content editors are with the alternate languages may also be of importance.

Same-page Mirroring

This approach involves repeating the same content in all supported languages on the same page. For instance, write out the English text, followed by French text, with some kind of separator. The languages can be separated horizontally (using a table) or vertically, depending whether you want to give the languages the same or different precedence. You can also distinguish one language from another using font styles (or size, or colour), and intermix them. Another technique is to load some flag images into a library, and mark the beginning of each language section with an appropriate flag.


English text...


Texte Français...

The advantages of this method are that it is simple to manage, and easy for non-experts to understand. When new pages are created, even casual system users will know how to add the alternate languages to the page.

This method can be cumbersome if more than 2 or 3 languages need to be supported, as the number of languages on the page can make it difficult to find information. Another problem is that it only works in the page body; menu labels, page titles, and descriptions are only in one language.

Special Multilingual Pages

This method simply creates some special pages on the web site that are in the alternate language(s). For instance, if the main site language is English, but you want to welcome French visitors, you might create a special French page.


English text...



Texte Français...

One advantage of this method, is that it isolates the alternate languages into selected pages so that normal content editors never need to touch it. This is useful if the normal content editors are not fluent in the alternate language, so there is less risk of them accidentally changing/breaking it. It is also useful if the alternate language pages are basic welcome/about pages that do not require regular maintenance. It also allows you to support as many alternate languages as are necessary. The alternate languages can show up in the menus, and their page titles and descriptions are in the alternate language. It also opens up the possibility of creating alternate templates for the other languages.

This method can quickly become cumbersome if the alternate language pages require regular maintenance, or have to mirror content in the main part of the site, because it means that you have two or more versions of certain pages, and the system is unaware of their relationship.

Alternate Language Subsections

This method isolates each language into its own section. Essentially each language has its own web site. The site can be given an appropriate URL such as http://mysite.com/japan or http://japan.mysite.com.

Each subsection can create as many pages as it requires. Because menus reflect the local section's pages, the site will appear to be exclusively dedicated to one language, avoiding the impression that the alternate language is just an afterthought.

It is very easy to set up different sets of administrators for the subsections, so content management can be divided up more appropriately among speakers of each language. Language-specific templates can also be isolated in these subsections.

This method gives complete freedom to the alternate language sections to structure themselves as they see fit. There is no requirement that the subsections mirror the main section in any way. This could be an advantage or a disadvantage, depending on the site.

Page Mirroring

The most advanced method of handling multilingual content is full page mirroring. This allows you to create pages which are alternate versions of regular pages. The alternate versions "inherit" all of their content from the regular versions, except where the alternate page provides replacement content.

Typically you would create a web site in a default language, such as English. Then you would go through the site, and create alternate versions of the pages in a second language, such as French. For instance, you would create a home page index.html, and a French version of the home page, say index_fr.html. The French version page should be configured (use the "Configure page" option in the website manager) with the following settings:

Type: alternate
Version: Français
Parent: index.html

This says that this page is an alternate version (the "Français" version, to be specific) of index.html. This modifies ExSite's content search algorithm slightly: the French version of the page will be searched first for missing content, then the parent (English) page, then the templates, etc. This means we can provide French versions of any page element (body, sidebars, images, etc.) and they will be used preferentially, but if no such French page element is found, the system will automatically fall back on the default (English) element, ensuring that a complete page is generated even if complete translations do not exist.

Because the French page is defined as a separate page, it can also define its own templates, menu labels, title, and meta-data. The file name of the alternate language page can follow any convention you like, such as appending the 2-letter language code to the English version of the file name (eg. contact_us.htmlcontact_us_fr.html), or translating the filename into French (eg. contact_us.htmlnous_contacter.html). It is a page like any other, so the only restriction is that there can be no other pages in the section with that name.

Because the system ties the alternate language page directly to its master page, the system can automatically switch languages and build appropriate menus for the viewer. For instance, if the viewer visits a French version page, the system understands that it is now in French "mode", and will automatically populate the menus with French versions of any other pages that are available. If the web site does not have French versions of every page, then the system will fall back on the default (typically English) version for those missing pages, ensuring that something appropriate is presented if no translations are available. (Note that if the viewer visits an English page because no French version is available, the system will switch back to English "mode", and default to English menus.)

Any number of alternate versions can be supported, and all will be treated in the same fashion. The version string must be identical on all pages of a certain version, since this is how ExSite matches versions together. Since it is possible to use this Version string as a menu item/version switcher, it is recommended that the Version string be the native name of the language, eg. use "Français" instead of "French". The default version of a page should have no version name.

ExSite comes with a standard "VersionMenu" plug-in, which generates a language switcher menu for any given page. If your website supports three languages, say, English, French, and Spanish, and you are currently on a page that has all three versions available, then the VersionMenu plug-in will spit out a switcher menu that looks like this:

English | Français | Español

(Assuming you are currently viewing the English version.) Using this menu you can quickly switch modes back and forth between the available languages. If a version is missing, then it is will not be shown in the menu.

Filed under: content management