First created at: 9/21/2009 Last updated at: 10/29/2009 Locale name alias support at libc --------------------------------- OVERVIEW When Solaris systems are newly introduced into a heterogeneous environment dominated by other operating systems, say, an IBM shop or a HP-UX shop, due to that the locale names being used at ours and other platforms such as AIX, HP-UX, and various Linux distributions are slightly different in many cases, even though we have compatible and acceptable locales at Solaris, existing users' locale environment variable settings of such shops are not honored causing an operational interoperability/compatibility issue especially in accepting Solaris systems. As an example, while Solaris uses fr_FR.UTF-8 as the locale name for French France UTF-8 locale, IBM AIX uses FR_FR and HP-UX 11.11 and RHEL 5.4 use fr_FR.utf8. (It also appears that glibc based Linux distributions allow some variations of the locale names in codeset part of locale name via some kind of codeset name normalization mechanism hence accepting not only fr_FR.UTF-8 and fr_FR.UTF8 but also fr_FR.utf8.) One way to resolve this interoperability/compatibility issue would be creating and maintaining thousands of locale name related symbolic links at our locale directories but that will be quite messy and very difficult to maintain. Hence, this project proposes to have a transparent locale name alias support mechanism at libc with embedded locale name mapping tables as outlined at below to remedy the interoperability/compatibility issue and aid users who want to migrate from other platforms to Solaris. TECHNICAL DETAILS Currently, when a locale selection is made with setlocale(3C), as an example for 32-bit environment, the function looks for the locale shared object at /usr/lib/locale//.so.3. In this process of locating the locale shared object, the name given to the setlocale(3C) and the component of the path to the locale shared object must be identical byte by byte. If there is no requested locale shared object, i.e., the locale, found, then, the setlocale() will return NULL and the current locale, i.e., C locale in most of the case, will not be changed. (The same also applies to the LC_MESSAGES category directory.) This project slightly changes the current failure return mechanism briefly noted at the above such that if the locale is not found, before returning NULL, the setlocale(3C) will try to find out if the given locale name is an alias or not and if so what would be the matching canonical locale name being supported at Solaris. When there is a matching canonical locale name, then, the function will try to locate and load the locale if any. Details on this (including a simple normalization mechanism on the codeset part of the locale names during the matching process) are described in the NOTES section of the updated setlocale(3C) man page [2] and also in the subsection 1 of the DESCRIPTION section of the new man page, locale_alias(5) [2]. This additional checking will not overwrite or change the user's locale environment variable settings; it will just internally and transparently map a locale name alias into a canonical locale name, locate, and load the locale if any and applicable. Afterward, all internationalized APIs will work transparently as if the locale name supplied is the canonical locale name. A similar approach plus some additional steps will also take place in translated message retrieval functions: If the locale name given is a canonical locale name to obsoleted Solaris locale names by [3] and [4] and there is no associated translated message object or catalog in the system with the locale name, for a better backward compatibility, the messaging functions will additionally look for the message object or catalog using the obsoleted Solaris locale names as the additional locale names to check on against with. Also, as a part of locale name alias support mechanism, if the locale name given is an accepted and supported locale name alias to a canonical locale name by this project and there is no associated translated message object or catalog in the system with the locale name, the messaging functions will additionally look for its message object or catalog by using the canonical locale name. Details on these are described in the updated man pages, gettext(1), catopen(3C), gettext(3C), and environ(5) [2] and the new man page, locale_alias(5) [2]. These additional checkings are necessary to make our messaging functionality transparently work for obsoleted Solaris locales and also for the supported locale name aliases. The reason why the project team is explicitly updating the messaging function related man pages is due to that the interfaces are explicitly specifying the locale directories and locale names. No other internationalized interfaces appear requiring such explicit update on the man pages. The mapping tables shown at locale_alias(5) [2] are formulated from the data extracted from [3], [4], and some operating systems such as AIX 6.1, HP-UX 11.11, RHEL 5.4, Ubuntu 9.04, and the latest OpenSolaris/Solaris Nevada via some simple reverse engineering. They will be embedded into libc under read only data section. (We expect there will be no significant changes at the tables, if any, in the future.) Although this project does not change locale(1) utility, this project also update the NOTES section of locale(1) man page as shown at [2] to clarify on the "locale -a" output that locale aliases are supported only as aliases and will not be shown at the output. The scope of this project is within libc only and thus if there is any system utilities and/or libraries in this and other consolidations that individually reference and maintain locale names by themselves, to have a similar alias support, such utilities and/or libraries should undergo similar changes. (It appears there are not that many and significant cases of that.) INTERFACE STABILITY AND RELEASE BINDING This project imports no notable interfaces. This project exports: Interface Stability Note --------- --------- ---- gettext(1) Committed Updated utility as shown at [2] locale(1) man page Committed Updated man page as shown at [2] change only catopen(3C) Committed Updated API as shown at [2] gettext(3C) functions Committed Updated APIs as shown at [2] setlocale(3C) Committed Updated API as shown at [2] environ(5) Committed Updated NLSPATH as shown at [2] locale_alias(5) Committed New man page on mappings [2] This project asks for Micro/Patch release binding. REFERENCES [1] The Open Group, The Unix Internationalization Guide, Sep. 2003. http://www.opengroup.org/bookstore/catalog/g032.htm [2] New and updated man pages in flat text and corresponding diff files at the materials directory of the case: gettext.1, gettext.1.diff, locale.1, locale.1.diff, catopen.3c, catopen.3c.diff, gettext.3c, gettext.3c.diff, setlocale.3c, setlocale.3c.diff, environ.5, environ.5.diff, locale_alias.5 [3] PSARC/2009/342 EOF of @euro locales. [4] PSARC/2009/528 EOF of short form locales.