#ident  "@(#)ldterm-euc-dependency.txt	1.1 98/03/25 SMI"
								1/14/1998
								Ienup Sung
								is@eng.sun.com
Brief analysis of EUC dependency in ldterm module
-------------------------------------------------

0. Overview

This memo is to provide a brief analysis result. The scope of the analysis is
restricted to I18N and EUC dependency.


1. Header files

Following headers are included and they contains EUC specific definitions:

	/usr/include/sys/euc.h
	/usr/include/sys/eucioctl.h

Also, /usr/include/sys/ldterm.h contains some degree of EUC-specific
macros defined, for instance, EUCSIZE, EUC_TWIDTH, T_SS2 and so on.
In the same header, we also have the ldterm state data type that contains
EUC specific info as like type definition fragment:

typedef struct ldterm_mod {
		...
	unsigned int t_state;	/* internal state of ldterm module */
		...
        /*
         * The following are for EUC processing.
         */
        unchar  t_codeset;      /* current code set indicator (read side) */
        unchar  t_eucleft;      /* bytes left to get in current char (read) */
        unchar  t_eucign;       /* bytes left to ignore (output post proc) */
        unchar  t_eucpad;       /* padding ... for eucwioc */
        eucioc_t eucwioc;       /* eucioc structure (have to use bcopy) */
        unchar  *t_eucp;        /* ptr to parallel array of column widths */
        mblk_t  *t_eucp_mp;     /* the m_blk that holds parallel array */
        unchar  t_maxeuc;       /* the max length in memory bytes of an EUC */
        int     t_eucwarn;      /* bad EUC counter */
} ldtermstd_state_t;

"t_state" field will have TS_MEUC flag set if the current codeset that is
being processed in the module is multibyte and/or multicolumn EUC codeset.

"t_codeset" field contains 0 (primary, ASCII codeset) or 1 ~ 3 (supplementary
codesets) depend on which character from EUC codesets that the ldterm is
processing.

"t_eucleft" contains the number of bytes that comprises a single EUC character
that the reader side is processing.

"t_eucign" contains the number of remaining bytes of an EUC character.

"eucwioc" keeps the EUC codesets' character widths of the currenc locale in
both byte and column sizes. It is a stripped down version of eucwidth_t that is
defined at /usr/include/sys/euc.h.

"t_eucp" is a pointer to a vector that keeps the screen column sizes for each
characters at a canonical input line. Each (multibyte) EUC character's
corresponding first byte contains the size of the character. Subsequent
bytes will contain zero. "t_eucp_mp" actually is the vector that the 't_eucp'
is pointing.

"t_maxeuc" contains MB_CUR_MAX of the locale's codeset.

"t_eucpad" and "t_eucwarn" are for padding and bad EUC character counter.
Not used in the current implemenation at all or that much.


2. ldterm.c

The source file version that we did the analysis is 1.82 final change date
set at 10/22/1997.

There are following total 17 functions that actually do EUC-specific or
related processings:

static int
ldtermopen(queue_t *q, dev_t *devp, int oflag, int sflag, cred_t *crp)

static int
ldtermclose(queue_t *q, int cflag, cred_t *crp)

static void
ldtermrsrv(queue_t *q)

static mblk_t *
ldterm_docanon(unchar c, mblk_t *bpt, size_t ebsize, queue_t *q,
		ldtermstd_state_t *tp)

static int
ldterm_tabcols(ldtermstd_state_t *tp)

static void
ldterm_tokerase(queue_t *q, size_t ebsize, ldtermstd_state_t *tp)

static void
ldterm_kill(queue_t *q, size_t ebsize, ldtermstd_state_t *tp)

static void
ldterm_msg_upstream(queue_t *q, ldtermstd_state_t *tp)

static mblk_t *
ldterm_output_msg(queue_t *q, mblk_t *imp, mblk_t **omp, ldtermstd_state_t *tp,
		size_t bsize, int echoing)

int
movtuc(size_t size, unsigned char *from, unsigned char *origto,
		unsigned char *table)

static void
ldterm_do_ioctl(queue_t *q, mblk_t *mp)

static void
ldterm_euc_erase(queue_t *q, size_t ebsize, ldtermstd_state_t *tp)

static void
ldterm_eucwarn(ldtermstd_state_t *tp)

static void
cp_eucwioc(eucioc_t *from, eucioc_t *to, int dir)

static int
ldterm_memwidth(unchar c, eucioc_t *w)

static int
ldterm_dispwidth(unchar c, eucioc_t *w, int mode)

static int
lderm_codeset(unchar c)


2.1. ldtermopen()

This function initializes the EUC portion of ldtermstd_state_t instance for
this Stream. In this initialization, it is assumed that the initial/default is
for EUC single byte codesets.


2.2. ldtermclose()

In this function, before it closes itself from the Stream, the module
frees any memory allocated at 't_eucp_mp' of the ldtermstd_state_t state
instance for the Stream. It also assign NULL at both 't_eucp_mp' and 't_eucp'.


2.3. ldtermrsrv()

This function initialize the EUC portion of the state if rescan of
the input buffer has requested.


2.4. ldterm_docanon()

Erase, word erase and kill line will be processed in this function. Depend on
the current state, i.e., if t_state has TS_MEUC or not, EUC specific
operations, esp., ldterm_euc_erase(), ldterm_tokerase() and ldterm_kill(),
respectively, will be performed.

Also, before this routine actually adds an EUC character, it checks whether
the current line can hold the character. If it cannot and if there was no
IMAXBEL set, it will reset the 't_eucp' to the start address of 't_eucp_mp' 
since we have start a new current line.

Since adding a character is being done by inserting a byte one by one,
a few fields are used to keep track of the current state. Following are
being updated in the routine:

	t_eucp,
	t_eucleft, and
	t_codeset

If it is the end of line, the routine will reset the 't_eucp' to the start
address of 't_eucp_mp' since we will have another new current line.


2.5. ldterm_tabcols()

If the 'TS_MEUC' has set at the 't_state', it computes the column numbers
that is deleted by looking at the display column width  between 't_eucp_mp'
and 't_eucp'.


2.6. ldterm_tokerase()

This routine is solely for EUC operation. It will erase any trailing white-
space characters (actually, space (0x20) and tab (0x09) characters only),
and then erase any non-white-space characters if any. The erase operation is
being done by deleting one byte at a time until it reaches any "white-space"
charcter or the beginning of the line. The module references 't_eucp' to
decide the actual display widths of each character.


2.7. ldterm_kill()

While 'killing' the line, this function works very similar to the operation of
ldterm_tokerase() if there is TS_MEUC bit set in the 't_state'.


2.8. ldterm_msg_upstream()

This function resets the 't_eucp' to the start address of 't_eucp_mp'
since we have done with this particular message even though we do not know
whether there will be more input or not.


2.9. ldterm_output_msg()

This module does output processing especially with possible case conversion by
using movtuc() as many as possible. And then for ordinary EUC bytes including
SS2 and SS3 characters, it also does a character by character output
processing (case conversion).


2.10. movtuc()

This function converts given input characters by using a given table.


2.11. ldterm_do_ioctl()

This function contains EUC_WSET and EUC_WGET command processings.

For EUC_WSET, it checks whether correct information has provided to the module.
If it has the correct information, it will try to figure out whether if
this EUC locale's codesets have any multibyte characters and/or multi column
characters. Depend on the analysis, 't_maxeuc' will have MB_CUR_MAX of
the current locale and also the state, 't_state', will have TS_MEUC flag set.
And then if the 't_maxeuc' is bigger than 1 and 't_state' has flag set to
TS_MEUC, it will allocate memory at 't_eucp_mp' of CANBSIZ (256) bytes much
and set appropriate 't_eucp'. And then it passes down a new ioctl command,
EUCWSET downstream. If everything goes alright, ACK. Otherwise, NACK the ioctl.

For EUC_WGET, copy existing 'eucwioc' over to the buffer provided and then
ACK it.


2.12. ldterm_euc_erase()

This function does erase by using 't_eucp' contents.


2.13. ldterm_eucwarn()

This function does nothing but increasing the warning count. This is only for
debugging.


2.14. cp_eucwioc()

This function copies given 'eucioc_t' structure to the provided space.


2.15. ldterm_memwidth()

This function takes the first byte of an EUC character and privide byte
length of the given EUC character.


2.16. ldterm_dispwidth()

This function takes the first byte of an EUC character and privide
the number of column positions required to the given EUC character.


2.17. ldterm_codeset()

For the given the first byte of an EUC character and then return its
corresponding EUC codeset. As always, primany codeset's number is 0.