JCEA Part 1: Internationalisation

 

Contents

  1. General
  2. References
  3. Syllabus
    1. Concepts
    2. Common architectures
    3. Legacy connectivity
    4. EJB
    5. EJB Container model
    6. Protocols
    7. Applicability of J2EE
    8. Patterns
    9. Messaging
    10. Internationalisation
    11. Security
  4. Reference
    1. UML
  5. Resources
  6. Further Tips
 
 

Syllabus

  • State three aspects of any application that might need to be varied or customized in different deployment locales
  • List three features of the Java programming language that can be used to create an internationalizable/localizable application
 

Information

Background

An internationalized program has the following characteristics:

  • With the addition of localization data, the same executable can run worldwide.
  • Textual elements, such as status messages and the GUI component labels, are not hard-coded in the program. Instead they are stored outside the source code and retrieved dynamically.
  • Support for new languages does not require recompilation.
  • Culturally-dependent data, such as dates and currencies, appear in formats that conform to the end user's region and language.
  • It can be localized quickly.
  • Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

There are many types of data that vary with region or language. Examples of this data are:

  • Messages
  • Labels on GUI components
  • Online help
  • Sounds
  • Colors
  • Graphics
  • Icons
  • Dates
  • Times
  • Numbers
  • Currencies
  • Measurements
  • Phone numbers
  • Honorifics and personal titles
  • Postal addresses
  • Page layouts
  • Legal rules, e.g. tax calculations

Java Technologies to Support Internationalization

Properties

The java.util.Properties class represents a persistent set of properties. The Properties can be saved to a stream or loaded from a stream. Each key and its corresponding value in the property list is a string.. A properties file stores information about the characteristics of a program or environment including internationalization/localization information.

By creating a Properties object and using the load method a program can read a properties file. The program can then access the values by using the key as follows:

Properties props = new Properties();
props.load(new BufferedInputStream(new FileInputStream(“filename”);
String value = System.getProperty(“key”);

Alternatively properties can be specified on the command line at application startup time, e.g.

java -Dmy.property=value MyApplication

If the key is not found getProperty returns null.

Locale

A java.util.Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user. For example, displaying a number is a locale-sensitive operation--the number should be formatted according to the customs/conventions of the user's native country, region, or culture.

Because a Locale object is just an identifier for a region, no validity check is performed when you construct a Locale. If you want to see whether particular resources are available for the Locale you construct, you must query those resources. For example, ask the NumberFormat for the locales it supports using its getAvailableLocales method.

When you ask for a resource for a particular locale, you get back the best available match, not necessarily precisely what you asked for. For more information, refer to the ResourceBundle section. Locales are represented in two, or possibly three, parts:

  1. Language code (e.g. en)
  2. Country code (e.g. US or GB)
  3. [optional] variant name to allow the possibility of more than one locale per country/language combination

Locales are defined using the language and country code separated by an underscore, e.g.

en_GB

The Locale class provides a number of convenient constants that you can use to create Locale objects for commonly used locales. For example, the following creates a Locale object for the United States:

Locale.US

Once you've created a Locale you can query it for information about itself. Use getCountry to get the ISO Country Code and getLanguage to get the ISO Language Code. You can use getDisplayCountry to get the name of the country suitable for displaying to the user. Similarly, you can use getDisplayLanguage to get the name of the language suitable for displaying to the user. Interestingly, the getDisplayXXX methods are themselves locale-sensitive and have two versions: one that uses the default locale and one that uses the locale specified in the argument.

ResourceBundle

Resource bundles contain locale-specific objects. When your program needs a locale-specific resource, a String for example, your program can load it from the resource bundle that is appropriate for the current user's locale. In this way, you can write program code that is largely independent of the user's locale isolating most, if not all, of the locale-specific information in resource bundles.
This allows you to write programs that can:

  • be easily localized, or translated, into different languages
  • handle multiple locales at once
  • be easily modified later to support even more locales

One resource bundle is, conceptually, a set of related classes that inherit from ResourceBundle. Each related subclass of ResourceBundle has the same base name plus an additional component that identifies its locale. For example, suppose your resource bundle is named MyResources. The first class you are likely to write is the default resource bundle which simply has the same name as its family--MyResources. You can also provide as many related locale-specific classes as you need: for example, perhaps you would provide a German one named MyResources_de.

Each related subclass of ResourceBundle contains the same items, but the items have been translated for the locale represented by that ResourceBundle subclass.

The resource bundle lookup searches for classes with a name assembled from following concatenated together, seperated by underscores:

  1. baseclass
  2. desired language
  3. desired country
  4. desired variant

If the class, or a properties file of the same name with .properties appended, can not be found then the lookup will discard each of the lements in turn (last to first) until it finds a match. This means that by providing a class called only baseclass (with no suffixers) a match will always be found.

The baseclass must be fully qualified (for example, myPackage.MyResources, not just MyResources). It must also be accessable by your code; it cannot be a class that is private to the package where ResourceBundle.getBundle is called.

Keys are always Strings but values can be any subclass of Object. It is possible to have several different baseclasses for one application, potentially one for exceptions and one for labels.

Unicode

Unicode is an international effort to provide a single character set that everyone can use.

Java uses the Unicode 2.0 (or 2.1) character encoding standard. In Unicode, every character occupies two bytes. Ranges of character encodings represent different writing systems or other special symbols. For example, Unicode characters in the range 0x0000 through 0x007F represent the basic Latin alphabet, and characters in the range 0xAC00 through 0x9FFF represent the Han characters used in China, Japan, Korea, Taiwan, and Vietnam.

UTF is a multibyte encoding format, which stores some characters as one byte and others as two or three bytes. If most of your data is ASCII characters, it is more compact than Unicode, but in the worst case, a UTF string can be 50 percent larger than thecorresponding Unicode string. Overall, it is fairly efficient.

Despite the advantages of Unicode, there are some drawbacks: Unicode support is limited on many platforms because of the lack of fonts capable of displaying all the Unicode characters.

java.text Package

This package provides classes and interfaces for handling text, dates, numbers, and messages in ways that are independent of natural languages. This allows programs to be written in a language-independent manner and relies on separate, dynamically linked localized resources.

All classes in the java.text package are Locale sensitive. By default they will use the default locale, but this can be overridden.

These classes are capable of formatting dates, numbers, and messages, parsing; searching and sorting strings; and iterating over characters, words, sentences, and line breaks. This package contains three main groups of classes and interfaces:

  1. Classes for iteration over text
  2. Classes for formatting and parsing
  3. Classes for string collation

Some of the classes in the java.text package are:

Annotation

An Annotation object is used as a wrapper for a text attribute value if the attribute has annotation characteristics. These characteristics are:

  1. The text range that the attribute is applied to is critical to the semantics of the range. That means, the attribute cannot be applied to subranges of the text range that it applies to, and, if two adjacent text ranges have the same value for this attribute, the attribute still cannot be applied to the combined range as a whole with this value.
  2. The attribute or its value usually no longer applies if the underlying text is changed.
CollationKey

A CollationKey represents a String under the rules of a specific Collator object. Comparing two CollationKeys returns the relative order of the Strings they represent. Using CollationKeys to compare Strings is generally faster than using Collator.compare. Thus, when the Strings must be compared multiple times, for example when sorting a list of Strings. It's more efficient to use CollationKeys.

You can not create CollationKeys directly. Rather, generate them by calling Collator.getCollationKey. You can only compare CollationKeys generated from the same Collator object.

Generating a CollationKey for a String involves examining the entire String and converting it to series of bits that can be compared bitwise. This allows fast comparisons once the keys are generated. The cost of generating keys is recouped in faster comparisons when Strings need to be compared many times. On the other hand, the result of a comparison is often determined by the first couple of characters of each String. Collator.compare examines only as many characters as it needs which allows it to be faster when doing single comparisons.

Collator

The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.

Collator is an abstract base class. Subclasses implement specific collation strategies. You can use the static factory method, getInstance, to obtain the appropriate Collator object for a given locale.

Format

Format is an abstract base class for formatting locale-sensitive information such as dates, messages, and numbers.

Format defines the programming interface for formatting locale-sensitive objects into Strings (the format method) and for parsing Strings back into objects (the parseObject method). Any String formatted by format is guaranteed to be parseable by parseObject.

Format has three subclasses: DateFormat, MessageFormat, NumberFormat

InputStreamReader

An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and translates them into characters according to a specified character encoding. The encoding that it uses may be specified by name, or the platform's default encoding may be accepted.

The class has two constructors: one that uses the platform’s default encoding and one that takes an encoding (as a String). Encodings can be represented by their ISO numbers, i.e., ISO 8859-9 is represented by “8859_9”.

There is no easy way to determine which encodings are supported. You can use the getEncoding method to get the name of the encoding being used by the Reader.

Characters that do not exist in a specific character set produce a substitution character, usually a question mark.

OutputStreamReader

An OutputStreamWriter is a bridge from character streams to byte streams: Characters written to it are translated into bytes according to a specified character encoding. The encoding that it uses may be specified by name, or the platform's default encoding may be accepted.

Each invocation of a write() method causes the encoding converter to be invoked on the given character(s). The resulting bytes are accumulated in a buffer before being written to the underlying output stream. The size of this buffer may be specified, but by default it is large enough for most purposes. Note that the characters passed to the write() methods are not buffered.

The class has two constructors: one that uses the platform’s default encoding and one that takes an encoding (as a String). Encodings can be represented by their ISO numbers, i.e., ISO 8859-9 is represented by “8859_9”.

There is no easy way to determine which encodings are supported. You can use the getEncoding method to get the name of the encoding being used by the Writer.

Characters that do not exist in a specific character set produce a substitution character, usually a question mark.

 

Links

 

Observations

to be completed

Page created by Leo Crawford
last updated in May 2002