Background
An internationalized program has the following characteristics:
- With the addition of localization data, the same executable
can run worldwide.
- Textual elements, such as status messages and the GUI component
labels, are not hard-coded in the program. Instead they are stored
outside the source code and retrieved dynamically.
- Support for new languages does not require recompilation.
- Culturally-dependent data, such as dates and currencies, appear
in formats that conform to the end user's region and language.
- It can be localized quickly.
- Localization is the process of adapting a program for use in
a specific locale. A locale is a geographic or political region
that shares the same language and customs. Localization includes
the translation of text such as GUI labels, error messages, and
online help. It also includes the culture-specific formatting
of data items such as monetary values, times, dates, and numbers.
There are many types of data that vary with region or language.
Examples of this data are:
- Messages
- Labels on GUI components
- Online help
- Sounds
- Colors
- Graphics
- Icons
- Dates
- Times
- Numbers
- Currencies
- Measurements
- Phone numbers
- Honorifics and personal titles
- Postal addresses
- Page layouts
- Legal rules, e.g. tax calculations
Java Technologies to Support Internationalization
Properties
The java.util.Properties class represents a persistent set of properties.
The Properties can be saved to a stream or loaded from a stream.
Each key and its corresponding value in the property list is a string..
A properties file stores information about the characteristics of
a program or environment including internationalization/localization
information.
By creating a Properties object and using the load method a program
can read a properties file. The program can then access the values
by using the key as follows:
Properties props = new Properties();
props.load(new BufferedInputStream(new FileInputStream(filename);
String value = System.getProperty(key);
Alternatively properties can be specified on the command line at
application startup time, e.g.
java -Dmy.property=value MyApplication
If the key is not found getProperty returns null.
Locale
A java.util.Locale object represents a specific geographical,
political, or cultural region. An operation that requires a Locale
to perform its task is called locale-sensitive and uses the Locale
to tailor information for the user. For example, displaying a number
is a locale-sensitive operation--the number should be formatted
according to the customs/conventions of the user's native country,
region, or culture.
Because a Locale object is just an identifier for a region, no
validity check is performed when you construct a Locale. If you
want to see whether particular resources are available for the Locale
you construct, you must query those resources. For example, ask
the NumberFormat for the locales it supports using
its getAvailableLocales method.
When you ask for a resource for a particular locale, you get back
the best available match, not necessarily precisely what you asked
for. For more information, refer to the ResourceBundle section.
Locales are represented in two, or possibly three, parts:
- Language code (e.g.
en)
- Country code (e.g.
US or GB)
- [optional] variant name to allow the possibility of more than
one locale per country/language combination
Locales are defined using the language and country code separated
by an underscore, e.g.
en_GB
The Locale class provides a number of convenient constants that
you can use to create Locale objects for commonly used locales.
For example, the following creates a Locale object for the United
States:
Locale.US
Once you've created a Locale you can query it for information about
itself. Use getCountry to get the ISO Country Code
and getLanguage to get the ISO Language Code. You can
use getDisplayCountry to get the name of the country
suitable for displaying to the user. Similarly, you can use getDisplayLanguage
to get the name of the language suitable for displaying to the user.
Interestingly, the getDisplayXXX methods are themselves
locale-sensitive and have two versions: one that uses the default
locale and one that uses the locale specified in the argument.
ResourceBundle
Resource bundles contain locale-specific objects. When your program
needs a locale-specific resource, a String for example, your program
can load it from the resource bundle that is appropriate for the
current user's locale. In this way, you can write program code that
is largely independent of the user's locale isolating most, if not
all, of the locale-specific information in resource bundles.
This allows you to write programs that can:
- be easily localized, or translated, into different languages
- handle multiple locales at once
- be easily modified later to support even more locales
One resource bundle is, conceptually, a set of related classes
that inherit from ResourceBundle. Each related subclass of ResourceBundle
has the same base name plus an additional component that identifies
its locale. For example, suppose your resource bundle is named MyResources.
The first class you are likely to write is the default resource
bundle which simply has the same name as its family--MyResources.
You can also provide as many related locale-specific classes as
you need: for example, perhaps you would provide a German one named
MyResources_de.
Each related subclass of ResourceBundle contains the same items,
but the items have been translated for the locale represented by
that ResourceBundle subclass.
The resource bundle lookup searches for classes with a name assembled
from following concatenated together, seperated by underscores:
- baseclass
- desired language
- desired country
- desired variant
If the class, or a properties file of the same name with .properties
appended, can not be found then the lookup will discard each of
the lements in turn (last to first) until it finds a match. This
means that by providing a class called only baseclass (with no suffixers)
a match will always be found.
The baseclass must be fully qualified (for example, myPackage.MyResources,
not just MyResources). It must also be accessable by
your code; it cannot be a class that is private to the package where
ResourceBundle.getBundle is called.
Keys are always Strings but values can be any subclass of Object.
It is possible to have several different baseclasses for one application,
potentially one for exceptions and one for labels.
Unicode
Unicode is an international effort to provide a single character
set that everyone can use.
Java uses the Unicode 2.0 (or 2.1) character encoding standard.
In Unicode, every character occupies two bytes. Ranges of character
encodings represent different writing systems or other special symbols.
For example, Unicode characters in the range 0x0000
through 0x007F represent the basic Latin alphabet,
and characters in the range 0xAC00 through 0x9FFF represent the
Han characters used in China, Japan, Korea, Taiwan, and Vietnam.
UTF is a multibyte encoding format, which stores some characters
as one byte and others as two or three bytes. If most of your data
is ASCII characters, it is more compact than Unicode, but in the
worst case, a UTF string can be 50 percent larger than thecorresponding
Unicode string. Overall, it is fairly efficient.
Despite the advantages of Unicode, there are some drawbacks: Unicode
support is limited on many platforms because of the lack of fonts
capable of displaying all the Unicode characters.
java.text Package
This package provides classes and interfaces for handling text,
dates, numbers, and messages in ways that are independent of natural
languages. This allows programs to be written in a language-independent
manner and relies on separate, dynamically linked localized resources.
All classes in the java.text package are Locale sensitive. By default
they will use the default locale, but this can be overridden.
These classes are capable of formatting dates, numbers, and messages,
parsing; searching and sorting strings; and iterating over characters,
words, sentences, and line breaks. This package contains three main
groups of classes and interfaces:
- Classes for iteration over text
- Classes for formatting and parsing
- Classes for string collation
Some of the classes in the java.text package are:
Annotation
An Annotation object is used as a wrapper for a text attribute
value if the attribute has annotation characteristics. These characteristics
are:
- The text range that the attribute is applied to is critical
to the semantics of the range. That means, the attribute cannot
be applied to subranges of the text range that it applies to,
and, if two adjacent text ranges have the same value for this
attribute, the attribute still cannot be applied to the combined
range as a whole with this value.
- The attribute or its value usually no longer applies if the
underlying text is changed.
CollationKey
A CollationKey represents a String under the rules of a specific
Collator object. Comparing two CollationKeys returns the relative
order of the Strings they represent. Using CollationKeys to compare
Strings is generally faster than using Collator.compare. Thus, when
the Strings must be compared multiple times, for example when sorting
a list of Strings. It's more efficient to use CollationKeys.
You can not create CollationKeys directly. Rather, generate them
by calling Collator.getCollationKey. You can only compare CollationKeys
generated from the same Collator object.
Generating a CollationKey for a String involves examining the entire
String and converting it to series of bits that can be compared
bitwise. This allows fast comparisons once the keys are generated.
The cost of generating keys is recouped in faster comparisons when
Strings need to be compared many times. On the other hand, the result
of a comparison is often determined by the first couple of characters
of each String. Collator.compare examines only as many characters
as it needs which allows it to be faster when doing single comparisons.
Collator
The Collator class performs locale-sensitive String comparison.
You use this class to build searching and sorting routines for natural
language text.
Collator is an abstract base class. Subclasses implement specific
collation strategies. You can use the static factory method, getInstance,
to obtain the appropriate Collator object for a given locale.
Format
Format is an abstract base class for formatting locale-sensitive
information such as dates, messages, and numbers.
Format defines the programming interface for formatting locale-sensitive
objects into Strings (the format method) and for parsing Strings
back into objects (the parseObject method). Any String formatted
by format is guaranteed to be parseable by parseObject.
Format has three subclasses: DateFormat, MessageFormat,
NumberFormat
InputStreamReader
An InputStreamReader is a bridge from byte streams
to character streams: It reads bytes and translates them into characters
according to a specified character encoding. The encoding that it
uses may be specified by name, or the platform's default encoding
may be accepted.
The class has two constructors: one that uses the platforms
default encoding and one that takes an encoding (as a String). Encodings
can be represented by their ISO numbers, i.e., ISO 8859-9 is represented
by 8859_9.
There is no easy way to determine which encodings are supported.
You can use the getEncoding method to get the name
of the encoding being used by the Reader.
Characters that do not exist in a specific character set produce
a substitution character, usually a question mark.
OutputStreamReader
An OutputStreamWriter is a bridge from character streams
to byte streams: Characters written to it are translated into bytes
according to a specified character encoding. The encoding that it
uses may be specified by name, or the platform's default encoding
may be accepted.
Each invocation of a write() method causes the encoding converter
to be invoked on the given character(s). The resulting bytes are
accumulated in a buffer before being written to the underlying output
stream. The size of this buffer may be specified, but by default
it is large enough for most purposes. Note that the characters passed
to the write() methods are not buffered.
The class has two constructors: one that uses the platforms
default encoding and one that takes an encoding (as a String). Encodings
can be represented by their ISO numbers, i.e., ISO 8859-9 is represented
by 8859_9.
There is no easy way to determine which encodings are supported.
You can use the getEncoding method to get the name
of the encoding being used by the Writer.
Characters that do not exist in a specific character set produce
a substitution character, usually a question mark.
|