#ident "@(#)onepager-ldterm-csi.txt 1.8 99/04/28 SMI" 1. Introduction 1.1. Project/Component Working Name: Codeset independent ldterm(7M) and stty(1) to fix 4089374 1.2. Name of Document Author/Supplier: Ienup Sung 1.3. Date of This Document: 2/22/1999 1.4. Name of Major Document Customer(s)/Consumer(s): PSARC SOESC Asian/European/Japan Localization Centers 2. Project Summary 2.1. Project Description Currently we are supporting many non-EUC codeset based locales like Shift JIS for Japan, BIG5 for Taiwan, GBK for China and Unicode/UTF-8 locales for European and Asian countries including Germany, France, Italy, Spain, Sweden, Japan, Korea, and, also domestic customers. Since current ldterm(7M) and stty(1) are EUC representation dependent implementations, we cannot directly support any line discipline from terminal (and terminal emulators) without having a workaround solution, in other words, a pair of (code conversion) STREAMS modules that surround the ldterm(7M) module. Also, to use the workaround solution, we have to ask customers go through many steps of instructions like specified in "Overview of en_US.UTF-8 Locale Support" chapter in "Solaris Internationalization Guide for Developers" that is quite cumbersome and also easy to make mistake. This project is to make the ldterm(7M) and stty(1) codeset independent to solve such problems. 2.2. Risks and Assumptions This project assume that ldterm(7M) will contain internally three different method sets for each codeset types: - EUC codeset - PC environment originated codeset - Unicode/UTF-8 codeset For UTF-8 codeset methods, we are assuming that we will only support Unicode/ISO 10646 code range of UTF-16 (16 x 65536 code points). 3. Business Summary 3.1. Problem Area This project resolves EUC codeset dependency in ldterm(7M) and stty(1M). 3.2. Market/Requester Domestic and international customers Field offices at Japan, Korea, Taiwan, China, and, Ireland 3.3. Business Justification Non-EUC locale users/customers don't need to go through a complicated steps of instructions to make line discipline understand widths of characters from non-EUC codeset. Localization centers can phase out/obsolete the surrounding STREAMS modules to minimize maintenance cost. 3.4. Competitive Analysis Other Unix vendors are already supporting codeset independent line discipline modules. 3.5. Opportunity Window/Exposure Solaris 8 Alpha (build 22) 4. Technical Description To make the ldterm(7M) and stty(1) codeset independent, the following will be done: - Provide three sets of internal methods in the ldterm(7M) to handle various codesets: (1) EUC codeset methods (default) (2) PC environment originated codeset methods (3) UTF-8 codeset methods The default method set that ldterm(7M) will start and run with will be the (1) from above. - Three new I_STR ioctl message commands specifically for the ldterm(7M) will be added: CSINFO_SET This call takes a pointer to a ldterm_cs_header_t data structure, and uses it to set the line discipline definition and also for a possible switch of the internal methods and data for the current locale's codeset. When this message is reached, the ldterm(7M) will check the validity of the message and if the message contains correct info, it will accumulate the header info. CSDATA_SET Depend on the header info previously set by 'CSINFO_SET' command, especially, 'csinfo_num' data field of the header, the ldterm(7M) will accept one or more of 'CSDATA_SET' messages and accumulate them internally. When it receives the final 'CSDATA_SET', the ldterm(7M) will validate so far received messages and set the received data as the data that will be used in the ldterm(7M) and then switch into the corresponding methods. If the validation fails, the ldterm(7M) will negative acknowledge the message. It is a responsibility of stty(1) that there will be always exactly the 'csinfo_num' number of 'CSDATA_SET' ioctl messages after the 'CSINFO_SET'. CSINFO_GET This call takes a pointer to a ldterm_cs_header_t structure and returns in it the codeset header info currently in use by the ldterm(7M) module. The three new ioctl commands will be added to header file. The EUC_WSET and EUC_WGET will not be removed. - Any locale that wants to utilize the (internal) non-EUC codeset methods of ldterm will provide /usr/lib/locale//LC_CTYPE/ldterm.dat file. The ldterm.dat file will contain info like codeset type, codeset and/or character widths of the current locale. - Upon user request of 'defeucw' mode setting like following example: system% stty defeucw The stty(1) command will check if the current locale has the /usr/lib/ locale//LC_CTYPE/ldterm.dat file. If it does have the file, the stty(1) command will read in the file and pass down the content of the file to the ldterm(7M) module by using the CSINFO_SET and CSDATA_SET ioctl message commands. The current behavior on EUC will not be changed. For 'write settings' request, i.e., stty -a, we will not change the current implementation. And thus if the stty(1) is executed with -a option, and the current locale is not EUC one, it will print out: eucw ?, scrw ? If the current locale is an EUC one, the stty(1) will print out byte widths and screen column widths for the EUC codesets, for instance, in case of any single byte locales we support, stty -a will give following result: eucw 1:1:0:0, scrw 1:1:0:0 - Please refer to the following documents for more details on: -- Codeset independent ldterm(7M) and stty(1) design at: http://nirvana.eng/ldterm/ldterm-csi.txt -- Brief analysis of EUC dependency in ldterm(7M) at: http://nirvana.eng/ldterm/ldterm-euc-dependency.txt -- Brief analysis of EUC dependency in stty(1) at: http://nirvana.eng/ldterm/stty-euc-dependency.txt 5. Reference Documents [ISO 10646] ISO/IEC, ISO/IEC 10646-1:1993(E), "Information Technology-- Universal Multiple-Octet Coded Character Set (UCS)--Part 1: Architecture and Basic Multilingual Plane," May 1993, (International Standard). [Unicode1] The Unicode Consortium, "The Unicode Standard: Worldwide Character Encoding Version 1.0, Volume 1 and 2," October 1991. [Unicode2] The Unicode Consortium, "The Unicode Standard: Version 2.0," July 1996. [STREAMS] Sun Microsystems, Inc., STREAMS Programmer's Guide (Part No. 801-6679-10), 1994. [I18N Guide] Sun Microsystems, Inc., Solaris Internationalization Guide For Developers (http://docs.sun.com), 1998. 6. Resources and Schedule 6.1. Projected Availability 4/1999 for Solaris 8 Alpha (build 22) 6.2. Cost of Effort ------------------------------------------------------------------ Work area engineer weeks ------------------------------------------------------------------ ldterm module change 2 stty command change and preliminary unit testing 1 Code review 0.5 PIT and PCT testings 1 ------------------------------------------------------------------ Total 4.5 ------------------------------------------------------------------ 6.3. Cost of Capital Resources No additional capital resources required. 7. Prototype Availability 7.1. Prototype Availability Not available. 7.2. Prototype Cost 3 engineer weeks 8. Acknowledgement We would like to thank Milan Bag for his review and also Cathe Ray (while she was an engineering manager) for her support on the project.