This fast-track was spun off of the CIFS Service (PSARC 2006/715) case along with: PSARC 2007/218 caller_context_t in all VOPs PSARC 2007/227 VFS Feature Registration and ACL on Create Although each of these changes are part of the bigger picture, they have been broken down into smaller pieces so each gets the attention it deserves. Several of these CIFS-related fast-tracks will describe changes to the signatures of vnode operations (VOPs). In some cases, these fast-tracks will describe multiple changes to the same VOPs. The project team intends to put all of these changes into ON in a single putback. NOTE: The VFSDEF_VERSION number in sys/vfs.h will be bumped from 3 to 4 in order to prevent unbundled file system kernel modules with the old signatures from loading. Once the unbundled file system modules are updated with the new signatures and recompiled, they will also pick up the new VFSDEF_VERSION number and be allowed to load. Case-insensitive behavior on create and lookup is a fundamental requirement for CIFS service. Traditionally, Solaris users access files using case-sensitive create and lookup. Consequently, we need to preserve the traditional behavior yet allow the CIFS server to request case-insensitive behavior. The following are the requirements for changes to the VFS/vnode interface to support mixed-case (case-sensitive and case-insensitive) behavior in Solaris. REQUIREMENTS 1) The CIFS server requires at least one file system type to support case-preserving name creation and both case-sensitive (c-s) and case-insensitive (c-i) behavior (a.k.a., mixed-behavior) on lookup. Solaris will support both the NFS and CIFS protocols. Windows clients typically expect case insensitive behavior while NFS clients typically expect case-sensitive behavior. 2) The VOP interface shall provide a way for the caller to specify an option for case-insensitive behavior. Any given file system volume (i.e. mount-point) may support both case-insensitive or case-sensitive behavior. The caller must be able to specify the behavior on lookup and create operations. The VOP interface must support those requests. 3) An interface is needed to determine if a file system supports requests for case-sensitivity, case-insensitivity, or both. The CIFS server negotiates with its clients on case behavior and needs an interface to query the file system volume on its capabilities. 4) The VOP_LOOKUP interface must return the case-sensitive name in the case of a case-insensitive lookup. Certain CIFS operations (e.g, TRANS2_QUERY_PATH_INFORMATION) require the case-sensitive name that was matched in the lookup operation. 5) When requested, the VOP_READDIR interface must return information for each directory entry to indicate if the entry is a case-insensitive duplicate of another entry. The VOP_LOOKUP interface must also return the same information. File creates done by a non-CIFS application may create file names that are a case-insensitive duplicate of an existing file name. For example, if "foo" exists then a non-CIFS application could create the file "FOO", which is a case-insensitive duplicate. This requirement states both VOP_READDIR and VOP_LOOKUP, when requested, must provide the caller with an indication that a case-insensitive conflict exists. The caller (the CIFS server) may choose to present the name as a "mangled" name so that the user in the case-insensitive environment can distinguish between c-i conflicting file names. NON-REQUIREMENTS 1) Case-insensitive emulation for file systems that do not support native case-insensitive behavior is not a requirement. The initial consumer of case-insensitive behavior is the CIFS client. The CIFS server can handle both case-sensitive and case-insensitive behavior. As long as we have one file system which supports native case-insensitive and case-sensitive behavior (see Req't 1), then there is no requirement for emulation. Note that we may need this feature in the future (and the interfaces we create should not preclude support for emulation). INTERFACE CHANGES NEEDED The following outlines the set of changes to the Solaris file system interface which are needed to support mixed-case (case-sensitive and case-insensitive) behavior in file systems. This behavior is necessary to provide CIFS services in Solaris. CASE BEHAVIOR There are two forms of case behavior: Case-sensitive and case-insensitive. Case sensitive behavior is what Solaris currently supports: When an object is created with a name, the case of the name is preserved. Objects with names that only differ in case (e.g., "foo", "Foo", "FOO") can co-exist in the same directory. Lookups on a particular name can only succeed if the case matches. Case-insensitive behavior, as required by the Solaris CIFS server, follows a particular set of rules. In mixed-mode (volume supports both c-i and c-s behavior) the case insensitive rules do not apply to case-sensitive accesses: Object Creation: Before a named object is created, the directory must be searched for any c-i matches of that name. If there is a c-i match, then the create operation must fail with EEXIST. Lookup: In mixed-mode it is possible for a c-s client to create names that are c-i clashes. For example, it may be possible for "foo", "Foo", and "FOO" to exist in the same directory. In those cases, a c-i lookup of any c-s form of "foo" (e.g., "foo", "FOO", "FoO", "fOo", ad nauseum) should always return the same name. In other words, if a c-i lookup of "foo" returns "foo", then a c-i lookup of "FOO" or "Foo" should also return "foo". It doesn't matter what the algorithm is as long as it is consistent. In addition, the VOP_LOOKUP() routine must return an indication if this file name is a c-i conflict with another file name in the same parent directory. Readdir: Since case-insensitive duplicates may exist in a directory, the VOP_READDIR() interface must return an indication if a directory entry contains a file name that is a c-i conflict with another entry in the same directory. VFS INTERFACE CHANGES An interface is required to determine the case behavior of a particular file system volume. There are three possible modes for behavior: c-s only, c-i only, and mixed-mode (both c-s and c-i requests are supported). The user-level interface is pathconf(2). The pathconf(2) interface will have a new variable added: {CASE_BEHAVIOR}/_PC_CASE_BEHAVIOR. The pathconf(2) system call will return a value which represents the bitwise OR of the following flags which indicate the case-behavior that the file system supports: _CASE_INSENSITIVE This file system supports case-insensitive behavior _CASE_SENSITIVE This file system supports case-sensitive behavior The fs_pathconf() routine (default routine for VOP_PATHCONF()) will be modified to support _PC_CASE_BEHAVIOR. The default behavior for file systems on Solaris is case-sensitive so a call to fs_pathconf() with _PC_CASE_BEHAVIOR set as the cmd would set the _CASE_SENSITIVE bit. Any file system that supports anything but case-sensitive-only behavior is required to use the (new) VFS Feature Registration interfaces to register its case behavior. File systems that support case-insensitive behavior must register the VFSFT_CASEINSENSITIVE feature. File systems that do *not* support case-sensitive behavior must register the VFSFT_NOCASESENSITIVE feature. The following new VFS features will be introduced to support case behavior: #define VFSFT_CASEINSENSITIVE 0x100000002 /* Supports case-insensitive */ #define VFSFT_NOCASESENSITIVE 0x100000004 /* NOT case-sensitive */ #define VFSFT_DIRENTFLAGS 0x100000008 /* Supports dirent flags */ Expressed in a truth table form: ---------------------------------------------------------------------------- VFSFT_CASEINSENSITIVE VFSFT_NOCASESENSITIVE Case sensitivity ---------------------------------------------------------------------------- off off sensitive on on insensitive on off mixed off on INVALID VOP (Vnode Operation) INTERFACE CHANGES The VOP/fop interfaces that deal with names (lookup, create, remove, link, rename, mkdir, rmdir, and symlink) need to pass a new flag, FIGNORECASE, to request c-i behavior as described above. The new flag will be defined in sys/file.h: #define FIGNORECASE 0x80000 /* request case-insensitive lookups */ The readdir fop interface needs to pass a new flag, V_RDDIR_ENTFLAGS, to request c-i conflict information via the flag field of a modified dirent structure. The new flag is defined in vnode.h: /* * Flags for VOP_READDIR */ #define V_RDDIR_ENTFLAGS 0x01 /* request dirent flags */ One dirent flag is being defined #define ED_CASE_CONFLICT 0x10 If set, it implies the entry conflicts, when case is disconsidered, with at least one other entry in the directory. The file systems that implement c-i behavior are responsible for following the behaviors described below. Under each fop call is a description of the behavior with FIGNORECASE set. Otherwise, the behavior is the same as today. The lookup (VOP_LOOKUP/fop_lookup) and create (VOP_CREATE/fop_create) routines already have a flag(s) field. However, lookup requires additional parameters to return the directory entry flags, direntflagp, and the real (case-preserved) name, realpnp. int fop_lookup( vnode_t *dvp, char *nm, vnode_t **vpp, pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int *direntflagp, /* dirent-specific flags */ /* NEW */ pathname_t *realpnp); /* real case-sensitive name */ If FIGNORECASE is set and the file system supports c-i, return the vnode in **vpp with a c-i name match, if it exists. Note that in the presence of multiple case versions of a given name (e.g., "foo", "Foo", "FOO"), the case of the name (nm) passed to fop_lookup does not affect which case-sensitive name to return. In other words, if a lookup of "foo" matches the vnode for "Foo", then a lookup of "Foo", "FOO", or "fOo" should match the same vnode. If FIGNORECASE is set and the file system does not support c-i, we return EINVAL. If FIGNORECASE is set and realpnp is non-NULL, then lookup fills in the pathname structure with the case-preserved name that was looked up. Also if FIGNORECASE is set, direntflagp is non-NULL and the file system supports per-directory-entry flags, then the flags will be returned in *direntflagp. If a file system supports c-i name matching and mixed case sensitivity, it should also support the per-directory-entry flags. Supporting the per-directory-entry flags is not mandatory, but if they are not supported, the consequence is that the consumer of fop_lookup() will be required to use brute-force techniques to manually determine if there are any case conflicts within the directory. As an example, a CIFS server might use the flags to know it needs to mangle a looked up name: error = VOP_LOOKUP(dvp, dname, &vp, FIGNORECASE, ...., &deflags, NULL); if (error == 0 && (deflags & ED_CASE_CONFLICT)) mname = mangle(name); Having the dirent flags available to a fop_lookup() caller also hints at intriguing possibilities for future flags. int fop_create( vnode_t *dvp, char *name, vattr_t *vap, vcexcl_t excl, int mode, vnode_t **vpp, cred_t *cr, int flag, /* New flag: FIGNORECASE */ caller_context *ctp, /* See PSARC/2007/218 */ vsecattr_t *vsecp) /* See PSARC/2007/227 */ If FIGNORECASE is set and file system supports c-i, then the object is created ONLY IF there is no object that has a c-i name match in the directory. If FIGNORECASE is set, and the file system supports c-i, but an object exists with a c-i name match exists, return EEXIST. If FIGNORECASE is set and the file system does *not* support c-i, then the fop_create returns EINVAL. The following naming routines need to have a "flags" field added. int fop_remove( vnode_t *dvp, char *nm, cred_t *cr, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int flags) /* Takes FIGNORECASE */ If FIGNORECASE is set and file system supports c-i, remove the name of the object with a c-i name match, if it exists. Note that in the presence of multiple case versions of a given name (e.g., "foo", "Foo", "FOO"), the behavior should be the same as fop_lookup(). That is, the c-i matching algorithm should be the same in both fop_lookup, fop_remove, fop_rename, and fop_rmdir. If FIGNORECASE is set and the file system does not support c-i, return EINVAL. int fop_link( vnode_t *tdvp, vnode_t *svp, char *tnm, cred_t *cr, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int flags) /* Takes FIGNORECASE */ If FIGNORECASE is set and file system supports c-i, then the link is created ONLY IF there is no object that has a c-i name match in the target directory. If FIGNORECASE is set and the file system does *not* support c-i, then return EINVAL. int fop_rename( vnode_t *sdvp, char *snm, vnode_t *tdvp, char *tnm, cred_t *cr, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int flags) /* Takes FIGNORECASE */ If FIGNORECASE is set and the file system supports c-i, then the object to be renamed shall match the same name that fop_lookup would match. That is, the c-i matching algorithm should be the same in both fop_lookup, fop_remove, fop_rename, and fop_rmdir. Same holds for the target name, if it exists. If FIGNORECASE is set and the file system does *not* support c-i, then we return EINVAL. If FIGNORECASE is set and snm and tnm are a case-insensitive match, the return value of the fop_rename should be zero and no other action should be performed. int fop_mkdir( vnode_t *dvp, char *dirname, vattr_t *vap, vnode_t **vpp, cred_t *cr, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int flags, /* Takes FIGNORECASE */ vsecattr_t *vsecp) /* See PSARC/2007/227 */ If FIGNORECASE is set and file system supports c-i, then the object is created ONLY IF there is no object that has a c-i name match in the directory. If FIGNORECASE is set and the file system does *not* support c-i, then return EINVAL. int fop_rmdir( vnode_t *dvp, char *nm, vnode_t *cdir, cred_t *cr, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int flags) /* Takes FIGNORECASE */ If FIGNORECASE is set and file system supports c-i, remove the name of the object with a c-i name match, if it exists. Note that in the presence of multiple case versions of a given name (e.g., "foo", "Foo", "FOO"), the behavior should be the same as fop_lookup(). That is, the c-i matching algorithm should be the same in both fop_lookup, fop_remove, fop_rename, and fop_rmdir. If FIGNORECASE is set and the file system does not support c-i, return EINVAL. int fop_readdir( vnode_t *vp, uio_t *uiop, cred_t *cr, int *eofp, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int flags) /* Takes V_RDDIR_ENTFLAGS */ If "flags" has V_RDDIR_ENTFLAGS set and the file system supports per-directory-entry flags, then the uio structure will contain the following structure: typedef struct edirent { ino64_t ed_ino; /* "inode number" of entry */ off64_t ed_off; /* offset of disk directory entry */ uint32_t ed_eflags; /* per-entry flags */ unsigned short ed_reclen; /* length of this record */ char ed_name[1]; /* name of file */ } edirent_t; If "flags" has V_RDDIR_ENTFLAGS set and the file system does *not* support per-directory-entry flags, then fop_readdir() will return EINVAL. If "flags" does not have V_RDDIR_ENTFLAGS set then the uio structure will have the traditional dirent_t format. Note that if the file system supports per-directory-entry flags, then the file system must set VFSFT_DIRENTFLAGS using the VFS Feature Registration interface. Below is a simplified pseudo-code routine that uses the fop_readdir() interface to rewrite the names of case-insensitive conflicts: char ** readnames() { ... error = VOP_READDIR(vp, &uio, cr, &eof, ct, V_RDDIR_DIRENTFLAGS); if (error) return (error); compute # of entries and alloc rnames while entries remain and space left in rname array { // get ed_eflags from edirent structure if (ed_eflags & ED_CASE_CONFLICT) strcpy(rnames[n++], mangle(ed_name)); else strcpy(rnames[n++], ed_name); } return (rnames); } int fop_symlink( vnode_t *dvp, char *linkname, vattr_t *vap, char *target, cred_t *cr, caller_context *ctp, /* See PSARC/2007/218 */ /* NEW */ int flags) /* Takes FIGNORECASE */ If FIGNORECASE is set and file system supports c-i, then the symlink is created ONLY IF there is no object that has a c-i name match to the target name in the directory. If FIGNORECASE is set and the file system does *not* support c-i, then EINVAL is returned. CASE-INSENSITIVE BEHAVIOR SUPPORT IN ZFS FILE SYSTEMS We will modify ZFS so that a ZFS file system can support the case-insensitive behaviors required by the Solaris CIFS server. MAN PAGE MODIFICATIONS ---------------------------------------------------- The following changes apply to the zfs(1M) man page: ---------------------------------------------------- The following three properties cannot be changed once the file system has been created, and so should be set at file system creation time. If not set in the "zfs create" command these properties will be inherited from the parent dataset. If the parent lacks these properties due to having been created prior to these features being supported, the new file system will have the default values for these properties. utf8only = on | off This property indicates if the file system should reject file names including characters not present in the UTF-8 character code set. If this property is explicitly set to "off", the normalization property (below) must either not be explicitly set or be set to "none". The default value for the "utf8only" property is "off". This property cannot be changed once the file system has been created. normalization = none | formD | formKC This property indicates if the file system should perform a unicode normalization of file names whenever two file names are compared, and which normalization algorithm should be used. File names are always stored unmodified, names are normalized as part of any comparison process. If this property is set to a legal value other than "none", and the "utf8only" property was left unspecified, the "utf8only" property will automatically be set to "on". The default value of the "normalization" property is "none". This property cannot be changed once the file system has been created. casesensitivity = sensitive | insensitive | mixed This property indicates if the file name matching algorithm used by the file system should be case-sensitive, case-insensitive, or allow a combination of both styles of matching. The default value for the "casesensitivity" property is "sensitive". Traditionally, UNIX and POSIX file systems have case-sensitive file names. The "mixed" value for the "casesensitivity" property indicates the file system can support requests for both case-sensitive and case-insensitive matching behavior. Currently case-insensitive matching behavior on a file system that supports mixed behavior is limited to kernel modules. Accesses from a user process have no means to directly request case-insensitive behavior on these file systems. A user process can indirectly achieve case-insensitive access, though, through an intermediate kernel module explicitly requesting case-insensitive behavior, such as a CIFS server. When a case-insensitive matching request is made of a "mixed" sensitivity file system, the behavior is generally the same as would be expected of a purely case-insensitive file system. The difference is that a "mixed" sensitivity file system may contain directories with multiple names that are unique from a case-sensitive perspective, but not unique from the case-insensitive perspective. For example, a directory might contain files "foo", "Foo", and "FOO". If there is a request to case-insensitively match any of the possible forms of "foo", (for example "foo", "FOO", "FoO", "fOo", et cetera) one of the three existing files will be chosen as the match by the matching algorithm. Exactly which file the algorithm chooses as match is not guaranteed, but what is guaranteed is that the same file will be chosen as match for any of the forms of "foo". The file chosen as a case-insensitive match for "foo", "FOO", "foO", "Foo", et. cetera will always be the same, so long as the directory remains unchanged. Regardless of the "casesensitivity" property setting, the file system will preserve the case of the name specified to create a file. The "casesensitivity" property cannot be changed once the file system has been created. The "utf8only", "normalization", and "casesensitivity" properties will also become new permissions that can be assigned to non-privileged users via the ZFS delegated administration model PSARC/2006/465. -------------------------------------------------------------------- The SEE ALSO section of zfs(1M) should also include these references: u8_textprep_str(9f) u8_strcmp(9f) u8_validate(9f) -------------------------------------------------------------------- The following possible return value must be added to the open(2), symlink(2), link(2), unlink(2), rename(2), mkdir(2), and rmdir(2) man pages: -------------------------------------------------------------------- EILSEQ The path argument includes non-UTF8 characters and the file system only accepts file names where all characters are part of the UTF-8 character codeset. EXPORTED INTERFACES |Proposed |Specified | |Stability |in what | Interface Name |Classification |Document? | Comments =============================================================================== VFSFT_CASEINSENSITIVE |Consolidation |This | VFS Feature #define VFSFT_NOCASESENSITIVE |Private |Document | VFSFT_DIRENTFLAGS | | | ------------------------+---------------+---------------+---------------------- FIGNORECASE |Contracted |This | file.h #define |Consolidation |Document | |Private | | ------------------------+---------------+---------------+---------------------- V_RDDIR_ENTFLAGS |Contracted |This | vnode.h #define |Consolidation |Document | |Private | | ------------------------+---------------+---------------+---------------------- edirent_t |Contracted |This | Structure returned |Consolidation |Document | within uio buf of |Private | | readdir() caller | | | requesting | V_RDDIR_ENTFLAGS ------------------------+---------------+---------------+---------------------- ED_CASE_CONFLICT |Contracted |This | extdirent.h #define |Consolidation |Document | |Private | | | | | | ------------------------+---------------+---------------+---------------------- VOP_LOOKUP, fop_lookup |Contracted |This |New output parameters |Consolidation |Document |int *direntflagp |Private | |pathname_t *realpnp | | | VOP_CREATE, fop_create, | | |New input paramater VOP_REMOVE, fop_remove, | | |int flag VOP_LINK, fop_link, | | | VOP_RENAME, fop_rename, | | | VOP_MKDIR, fop_mkdir, | | | VOP_RMDIR, fop_rmdir, | | | VOP_READDIR,fop_readdir,| | | VOP_SYMLINK,fop_symlink | | | | | | ------------------------+---------------+---------------+---------------------- {CASE_BEHAVIOR}, |Stable |This | pathconf(2) variable _PC_CASE_BEHAVIOR | |Document | name and value | | | _CASE_INSENSITIVE | | | Bit values for _CASE_SENSITIVE | | | _PC_CASE_BEHAVIOR ------------------------+---------------+---------------+---------------------- utf8only |Evolving |This | zfs(1M) file system normalization | |Document | properties casesensitivity | | | ------------------------+---------------+---------------+---------------------- [ 1.9 Last update: 05/09/07 11:23:40 ]