#ident "@(#)ldterm-euc-dependency.txt 1.1 98/03/25 SMI" 1/14/1998 Ienup Sung is@eng.sun.com Brief analysis of EUC dependency in ldterm module ------------------------------------------------- 0. Overview This memo is to provide a brief analysis result. The scope of the analysis is restricted to I18N and EUC dependency. 1. Header files Following headers are included and they contains EUC specific definitions: /usr/include/sys/euc.h /usr/include/sys/eucioctl.h Also, /usr/include/sys/ldterm.h contains some degree of EUC-specific macros defined, for instance, EUCSIZE, EUC_TWIDTH, T_SS2 and so on. In the same header, we also have the ldterm state data type that contains EUC specific info as like type definition fragment: typedef struct ldterm_mod { ... unsigned int t_state; /* internal state of ldterm module */ ... /* * The following are for EUC processing. */ unchar t_codeset; /* current code set indicator (read side) */ unchar t_eucleft; /* bytes left to get in current char (read) */ unchar t_eucign; /* bytes left to ignore (output post proc) */ unchar t_eucpad; /* padding ... for eucwioc */ eucioc_t eucwioc; /* eucioc structure (have to use bcopy) */ unchar *t_eucp; /* ptr to parallel array of column widths */ mblk_t *t_eucp_mp; /* the m_blk that holds parallel array */ unchar t_maxeuc; /* the max length in memory bytes of an EUC */ int t_eucwarn; /* bad EUC counter */ } ldtermstd_state_t; "t_state" field will have TS_MEUC flag set if the current codeset that is being processed in the module is multibyte and/or multicolumn EUC codeset. "t_codeset" field contains 0 (primary, ASCII codeset) or 1 ~ 3 (supplementary codesets) depend on which character from EUC codesets that the ldterm is processing. "t_eucleft" contains the number of bytes that comprises a single EUC character that the reader side is processing. "t_eucign" contains the number of remaining bytes of an EUC character. "eucwioc" keeps the EUC codesets' character widths of the currenc locale in both byte and column sizes. It is a stripped down version of eucwidth_t that is defined at /usr/include/sys/euc.h. "t_eucp" is a pointer to a vector that keeps the screen column sizes for each characters at a canonical input line. Each (multibyte) EUC character's corresponding first byte contains the size of the character. Subsequent bytes will contain zero. "t_eucp_mp" actually is the vector that the 't_eucp' is pointing. "t_maxeuc" contains MB_CUR_MAX of the locale's codeset. "t_eucpad" and "t_eucwarn" are for padding and bad EUC character counter. Not used in the current implemenation at all or that much. 2. ldterm.c The source file version that we did the analysis is 1.82 final change date set at 10/22/1997. There are following total 17 functions that actually do EUC-specific or related processings: static int ldtermopen(queue_t *q, dev_t *devp, int oflag, int sflag, cred_t *crp) static int ldtermclose(queue_t *q, int cflag, cred_t *crp) static void ldtermrsrv(queue_t *q) static mblk_t * ldterm_docanon(unchar c, mblk_t *bpt, size_t ebsize, queue_t *q, ldtermstd_state_t *tp) static int ldterm_tabcols(ldtermstd_state_t *tp) static void ldterm_tokerase(queue_t *q, size_t ebsize, ldtermstd_state_t *tp) static void ldterm_kill(queue_t *q, size_t ebsize, ldtermstd_state_t *tp) static void ldterm_msg_upstream(queue_t *q, ldtermstd_state_t *tp) static mblk_t * ldterm_output_msg(queue_t *q, mblk_t *imp, mblk_t **omp, ldtermstd_state_t *tp, size_t bsize, int echoing) int movtuc(size_t size, unsigned char *from, unsigned char *origto, unsigned char *table) static void ldterm_do_ioctl(queue_t *q, mblk_t *mp) static void ldterm_euc_erase(queue_t *q, size_t ebsize, ldtermstd_state_t *tp) static void ldterm_eucwarn(ldtermstd_state_t *tp) static void cp_eucwioc(eucioc_t *from, eucioc_t *to, int dir) static int ldterm_memwidth(unchar c, eucioc_t *w) static int ldterm_dispwidth(unchar c, eucioc_t *w, int mode) static int lderm_codeset(unchar c) 2.1. ldtermopen() This function initializes the EUC portion of ldtermstd_state_t instance for this Stream. In this initialization, it is assumed that the initial/default is for EUC single byte codesets. 2.2. ldtermclose() In this function, before it closes itself from the Stream, the module frees any memory allocated at 't_eucp_mp' of the ldtermstd_state_t state instance for the Stream. It also assign NULL at both 't_eucp_mp' and 't_eucp'. 2.3. ldtermrsrv() This function initialize the EUC portion of the state if rescan of the input buffer has requested. 2.4. ldterm_docanon() Erase, word erase and kill line will be processed in this function. Depend on the current state, i.e., if t_state has TS_MEUC or not, EUC specific operations, esp., ldterm_euc_erase(), ldterm_tokerase() and ldterm_kill(), respectively, will be performed. Also, before this routine actually adds an EUC character, it checks whether the current line can hold the character. If it cannot and if there was no IMAXBEL set, it will reset the 't_eucp' to the start address of 't_eucp_mp' since we have start a new current line. Since adding a character is being done by inserting a byte one by one, a few fields are used to keep track of the current state. Following are being updated in the routine: t_eucp, t_eucleft, and t_codeset If it is the end of line, the routine will reset the 't_eucp' to the start address of 't_eucp_mp' since we will have another new current line. 2.5. ldterm_tabcols() If the 'TS_MEUC' has set at the 't_state', it computes the column numbers that is deleted by looking at the display column width between 't_eucp_mp' and 't_eucp'. 2.6. ldterm_tokerase() This routine is solely for EUC operation. It will erase any trailing white- space characters (actually, space (0x20) and tab (0x09) characters only), and then erase any non-white-space characters if any. The erase operation is being done by deleting one byte at a time until it reaches any "white-space" charcter or the beginning of the line. The module references 't_eucp' to decide the actual display widths of each character. 2.7. ldterm_kill() While 'killing' the line, this function works very similar to the operation of ldterm_tokerase() if there is TS_MEUC bit set in the 't_state'. 2.8. ldterm_msg_upstream() This function resets the 't_eucp' to the start address of 't_eucp_mp' since we have done with this particular message even though we do not know whether there will be more input or not. 2.9. ldterm_output_msg() This module does output processing especially with possible case conversion by using movtuc() as many as possible. And then for ordinary EUC bytes including SS2 and SS3 characters, it also does a character by character output processing (case conversion). 2.10. movtuc() This function converts given input characters by using a given table. 2.11. ldterm_do_ioctl() This function contains EUC_WSET and EUC_WGET command processings. For EUC_WSET, it checks whether correct information has provided to the module. If it has the correct information, it will try to figure out whether if this EUC locale's codesets have any multibyte characters and/or multi column characters. Depend on the analysis, 't_maxeuc' will have MB_CUR_MAX of the current locale and also the state, 't_state', will have TS_MEUC flag set. And then if the 't_maxeuc' is bigger than 1 and 't_state' has flag set to TS_MEUC, it will allocate memory at 't_eucp_mp' of CANBSIZ (256) bytes much and set appropriate 't_eucp'. And then it passes down a new ioctl command, EUCWSET downstream. If everything goes alright, ACK. Otherwise, NACK the ioctl. For EUC_WGET, copy existing 'eucwioc' over to the buffer provided and then ACK it. 2.12. ldterm_euc_erase() This function does erase by using 't_eucp' contents. 2.13. ldterm_eucwarn() This function does nothing but increasing the warning count. This is only for debugging. 2.14. cp_eucwioc() This function copies given 'eucioc_t' structure to the provided space. 2.15. ldterm_memwidth() This function takes the first byte of an EUC character and privide byte length of the given EUC character. 2.16. ldterm_dispwidth() This function takes the first byte of an EUC character and privide the number of column positions required to the given EUC character. 2.17. ldterm_codeset() For the given the first byte of an EUC character and then return its corresponding EUC codeset. As always, primany codeset's number is 0.