Software Globalization: Internationalization and Localization

An organization's response to Globalization needs to be pervasive: mulit-lingual call centers, branch offices, brochures, signage, product packaging, and software in several languages.

Simultaneous release of software in several languages at the same time is a common requirement today. So tools for construction of software today need to assume an international audience.

The process for construction of internationalized software and localized products depends on whether there is an assumption of an American user base, as in the past before lower cost computers and the internet.

Pop-up in a new window this Visio diagram:

Internationalization

Source code embedded with language strings are "internationalized" by extracting "hard coded" text strings and replacing them with references to a resource file external from the source code. This has the advantage of unicode compliance. This separation of logic and data also makes it easier to distribute the work of translating strings to other languages.

Alternatively, text strings in source code can simply be replaced with text of another language. This carries less risk than changing source code logic.

The logic for translating specific words and phrases can reside in an organization's Translation Memory database.

Internationalization (I18N)

Internationalization is the process of designing applications so that they can be localized (adapted) to various languages and regions without engineering changes.

This word is commonly abbreviated to I18N by removing the 18 characters between the first character (I) and last character (n) of the word.

Internationalized References to Resources

Instead of hard coding application text intermingled among logic code, internationalized code obtain translated strings and objects using a key that is in every localized resource file. In each resource file is the text that goes with each key. Applications obtain text to a different language simply by looking in a different resource file.

A Java program references either resource bundle files containing key-value pairs in external files or in a ResourceBundle class file loaded into RAM.

With Microsoft .NET Framework Visual Basic and C# applications, each culture references a separate Satellite Assembly .resources file assembled as DLLs from .resx files after being signed with strong names.
The actual string or object that gets loaded depends on the current user interface Thread.CultureInfo.currentUIculture property for each thread.
So, unlike Java, .NET applications need to start again after every switch in culture. Bummer, I know.
[STAThreadAttribute]
static void Main()
{
}

For VC++ in Microsoft Visual Studio 6.0, add compiler directives #ifdef and #ifndef to define the logic so different resource files are used when creating executables for different languages.

Mozilla source code stores cross-platform localizable strings in the file include/allxpstr.h. Message ID's within this file are organized into sections. Obtain a message from this file with #include "xpgetstr.h" that provides function XP_GetString(MESSAGE_ID) that returns a pointer to a global buffer containing the human readable string. Use XP_GetStringForHTML(MESSAGE_ID, CharSetID, EnglishMessage) for HTML text.

For Javascript, Guido Wesdorp (aka 'Johnny deBris') put up a tar library in 2005.
How to Internationalize preents a good overview.
Nicholas C. Zakas, Author, Professional JavaScript for Web Developers (ISBN 0764579088)

Extractions

Modifications to programming source code to enable localization include:

Lookup and use the locale selections defined for the user's operating system.
Convert hard-coded programming constructs to library-based functions that are sensitive to specific locales.
- Date/time formats (separators, order of day/month/year, names of weekdays and months)
- Number formats (decimal and thousand separators)
- Currency formats (symbol and format)
- Sort order and string comparison - collate (sort) text based on a locale-specific sort key.
- Use string comparison functions that works with Unicode double-byte (16 vs, 8-bit) characters.
Replace hard-coded text with message keys used to retrieve actual text from a hierarchy of Resource Bundles -- files that encapsulate locale-sensitive data in a portable, independent way
Program configuration and control values in databases (such as "Yes" or "TRUE") that can be modified by users or administrators.

Localization (L10N)

Localization is the process of translating text into local languages for specific locales, then testing the product for each locale implemented.

Localization may include creating graphics files containing localized images (such as flags) as well as the translation of text in message resource properties files.

For application code that has not been internationalized, the localization process may also include adapting or adding software components for a specific locale. There are software programs that look for application text strings in programs.

An important part of localization is testing, to detect when localized text (such as long words in German) do not fit on a screen designed for smaller English words.

Localization Associations

American Translators Association (ATA)
Society for Technical Communications (STC)
Localization Industry Standards Association
Society for the Industry of Multilingual Communication and Translation (SIMCAT)
Utah Information Technology Association (UITA)
PAL and TILP join forces to address needs of individual professionals
GALA and ALC represent companies
W3C Internationalization Work Group
XLIFF Web Site
Localisation Consideration in DTD design
XML Internationalization FAQ
Directory of Commercial Translation Resources

The Translation Process

Translation Memory

A translation memory (TM) is a database of translation assets, usually spanning over several projects of an organization. TMs are created using CAT software (Computer Aided Translation) and localization software. The best known offerings include enterprise-priced Trados and the less expensive Wordfast.

Trados setup the translationzone.com portal.

Each early translation software vendor developed their own proprietary file formats. But most translation software now support the TMX (Translation Memory eXchange) industry-wide open standard developed by lisa.org/tmx, a project of the OSCAR (Open Standards for Container/Content Allowing Re-use) Special Interest Groups of LISA (Localization Industry Standards Assn).

TMX is an XML-based standard. This means that a easy way to check whether a TMX file is valid is to rename the file to a ".xml" extension and open it file in an Internet Browser, which displays XML in tree view. The browser will point out where the file is not valid.

TuMatXa is a web server application (that runs in Linux Zope) for managing a TM repository.

translate.google.com enables the simple sharing of TXM files on the internet while improving on their automated translation.

The TMX XML format consists of a header and a body. An example of the header where <TMX takes the place of more familiar <HTML:

<?xml version="1.0" ?>
<tmx version="1.4">
<header
	creationtool="XYZTool"
	creationtoolversion="1.01-023"
	datatype="PlainText"
	segtype="sentence"
	adminlang="en-us"
	srclang="EN"
	o-tmf="ABCTransMem">
</header>

Within the <body are tu (translation unit) tags identified by a tuid attribute. Translated text is between <seg> and </seg> (segment) tags within a <tuv> (translation unit value) tag with an attribute such as xml:lang="en" identifying its language.

<body>
	<tu tuid="hello" datatype="plaintext">
		<tuv xml:lang="en">
			<seg>hello</seg>
		</tuv>
		<tuv xml:lang="it">
			<seg>ciao</seg>
		</tuv>
	</tu>
	<tu tuid="world" datatype="plaintext">
		<tuv xml:lang="en">
			<seg>world</seg>
		</tuv>
		<tuv xml:lang="it">
			<seg>mondo</seg>
		</tuv>
	</tu>
</body>
</tmx>

Nicola Asuni implemented Masaki Itagaki's initial proposal with code on SourceForge which extends the Java ResourceBundle class so that it directly reads (usually large) TMX XML text files.

Microsoft Terminology Translations (.csv in zip file) of over 9,000 terms

Joom!Fish, a component of the Joomla! CMS, from JoomlaCode.org is used to manage the translation process and the dynamic content translated within a site.

Blogs on Internationalization

Communities

Proz.com

Lantra-L mailing list for translators.

OpenTag