This document will contain all the information required to design and develop a CIFS client on Solaris. The reader will be able to get a high level understanding of the technical and organizational difficulties involved in creating the client and their proposed resolutions.

1.2 Document's intended audience

The intended audience includes the PSARC review team, the OpenSolaris user community, our Quality Engineering staff and our Technical Writing staff.

1.3 Document References

Requirements: http://jurassic.eng/home/thurlow/work/cifs/cifs_client_prd.html

1.4 Definitions of important terms, acronyms, or abbreviations

Terminal	Definition
SMB	Server Message Block - a network protocol mainly used by Microsoft Windows computers
CIFS	Common Internet File System - What SMB was renamed to by Microsoft in 1998
BSD	Berkeley Software Distribution - Any Unix derived from a distribution by the University of California, Berkeley, in the 1970s
RPC	Remote procedure call – A protocol allowing one host to call procedures on another.
RAP	Remote Administration Protocol – A Microsoft protocol allowing remote computer access.
ACL	Access control list – A list whose criteria provide controlled access to computer files.
IDL	Interface description language – A language that defines interfaces.
DES	Data Encryption Standard, a method of encryption.
NTLM	NT LAN Manager – Old name for a Microsoft computer networking security protocol now known as Integrated Windows Authentication (IWA)
UCS-2	Fixed-length (16 bits) subset of UTF-16, able to represent the basic multilingual plane only
cp437	Code page 437 - the original character set of the IBM PC, circa 1981
UTF-8	8-bit Unicode Transformation Format
UTF-16	16-bit Unicode Transformation Format
NetBIOS	Network Basic Input/Output System – A networking protocol originally developed for small networks.
TCP	Transmission Control Protocol – A core internet protocol allowing computers to exchange packets.
DCE	Distributed Computing Environment, a specification from The Open Group

1.5 Summary/Abstract of document contents

This document describes the scope of this project, which is to deliver a standard virtual file system on the Solaris OS by designing a CIFS client that accesses files and directories on a CIFS server. Specifically, the document describes the user experience, the architecture, the design, the technical problems and resolutions, and the deliverables for the project.

2 Project Overview

This project will deliver a standard Solaris virtual file system which implements a CIFS client, providing access to files and directories on CIFS servers. The implementation on Solaris will be the result of a port from the smb version 217.2 package on Darwin 8.2, Darwin being a BSD variant underpinning MacOS X. A relatively large range of basic CIFS functionality is included in this code, so the main task is to get it ported to Solaris and running well. The Darwin package smb-217.2 has had an Open Source Code Review.

2.1 What is CIFS?

The CIFS protocol (Common Internet File System, a.k.a. SMB) is the natural file sharing protocol on Microsoft Windows machines, and is implemented by Samba on Unix/Linux. Using a CIFS client, users can mount remote CIFS server shares (directories) to get read-write access to previously inaccessible files.

2.2 The user experience

A user wishing to use CIFS shares should normally find the packages SUNWsmbfsr and SUNWsmbfsu installed in their default package cluster.

There are 2 user scenarios that describe the CIFS client user model:

1. Browsing shares on a Windows/CIFS server

2. Mounting one or more shares for direct access through standard Solaris file utilities, like 'ls', 'cat', 'vi', etc.

2.2.1 Browsing

The user would use the new 'smbutil(1)' binary to browse shares on a CIFS server as follows:

% smbutil view //[<workgroup>;][<user>[:<password>]@]<server>

For example, browsing the shares on server 'nano':

bash-3.00# smbutil view //root@nano 
Password: 
Share       Type        Comment 
------------------------------- 
ipc$        pipe        IPC Service (Samba Server) 
tmp         disk        Temporary file space 
public      disk        Public Stuff
root        disk        Home Directories

4 shares listed from 4 available

2.2.2 Mounting When the user wants to mount a CIFS share, they will do it in one of two ways:

- Manual mount via mount_smbfs(1M):

# mount -F smbfs //[<workgroup>;][<user>[:<password>]@]<server>/<share> /<path>

For example, mounting the share 'public' as user 'foo', on server 'nano', would give:

# mount -F smbfs //nano/public /PUBLIC

- Via an automounter map entry:

# tail -1 /etc/auto_direct

/PUBLIC -fstype=smbfs //nano/public

% ls /PUBLIC

bin docs

Once the CIFS share has been mounted, users should be able to use ordinary Solaris tools to access the files. For the above scenario an example would be:

% soffice8 /PUBLIC/docs/somedoc.ods

CIFS does not support the same functionality as NFS on Solaris; a list of the expected behavioral differences can be found in section 2.2.4 Behavioural differences of filesystem commands

As usual in Solaris, you would use umount(1M) to unmount a share that had been manually mounted and you would see automounted shares automatically unmounted after a period of inactivity. If the server and client are part of an Active Directory domain, the user will have credentials acquired at login and they would be able to access a mounted CIFS share without a password. If the share requires a password, the user would need to use 'smbutil(1)' to enter their password for the share before they could successfully access the share:

# smbutil login //[<workgroup>;][<user>[:<password>]@]<server>/<share>

For example:

% smbutil login //jones@nano/public
Password:

2.2.3 Nautilus

Nautilus integration will not be present in the initial release. However, we intend to provide early access to our bits to the Nautilus team and work with them to ensure integration in their project.

2.2.4 Behavioural differences of filesystem commands

No hard links - link(2) will return ENOSYS
No symlinks - symlink(2) will return ENOSYS
No byte-range locks - fcntl(2) lock operations will be arbitrated locally (MI_LLOCK)
No ACLs set/get - GETACL with acl(2) and facl(2) will return a fabricated ACL like pcfs(7FS), SETACL will return ENOSYS
No device nodes - mknod(2) will return ENOSYS

We will observe the behaviour of our client prototype to see if it presents any large surprises to user expectations.

2.3 High level Design diagram

3 Deliverables

The following deliverables will be putback into Solaris Nevada, and possibly into a Solaris 10 update.

3.1 User Space commands/libraries

3.1.1 smbutil

smbutil is a command-line utility that maps names, lists shares and performs other simple tasks as described in the man page. It links with libsmb.so. We will extend it to accept per-share passwords and pass them to the kernel. The man page is available here(http://jurassic.eng/home/thurlow/work/cifs/smbutil_man_final).

3.1.2 mount_smbfs

mount_smbfs is the smbfs-specific mount utility called by mount(1M). It links with libsmb.so. The man page is available here(http://jurassic.eng/home/thurlow/work/cifs/mount_smbfs_final).

3.1.3 libsmb.so

libsmb.so knows how to map NetBIOS names to TCP addresses, and contains the DCE RPC and RAP code to do some administration activities with the server (e.g. getting a list of shares). It calls into the netsmb module via ioctl(2). When the user uses either a mount or smbutil command, the ioctl() function is first called to resolve the machine/domain names, then to do protocol negotiation, and addionally for session setup and a tree connect.

3.2 Kernel modules

3.2.1 smbfs

smbfs is the Solaris virtual filesystem module with the vfsops and vnodeops. It will be patterned on the NFS Version 2 code in structure, and the CIFS-specific behavior will be ported in from Darwin. It calls into the netsmb module.

3.2.2 netsmb

The netsmb module is somewhat similar to NFS's RPC layer. It forms packets, and sends and receives them according to the CIFS/SMB protocol.

3.3 Binary list

The following binaries will be delivered:

SUNWsmbfsu:

/usr/lib/fs/smbfs/mount_smbfs

/usr/lib/smbfs/libsmb.so

/usr/bin/smbutil

SUNWsmbfsr:

/kernel/fs/smbfs

/kernel/fs/sparcv9/smbfs

/kernel/fs/amd64/smbfs

/kernel/drv/nsmb

/kernel/drv/sparcv9/nsmb

/kernel/drv/amd64/nsmb

3.4 Areas of Overlap

CIFS functionality will also be introduced into Solaris by the Pebble Beach project's CIFS server. Potential areas of overlap include NetBIOS, MS-RPC and the fact that the CIFS redirector in their code base acts like a limited-purpose client to communicate to other CIFS servers. We have not fully evaluated the overlap in any of these areas.

3.5 Other changes

We have not identified specific changes to 'snoop', but we plan to make any required changes as needed, to file bugs about the shortcomings, and to co-deliver the fixes with this project.

4 Interfaces

4.1 Exported interfaces

Interface	Classification	Comments
smbutil	Evolving	See man page (http://jurassic.eng/home/thurlow/work/cifs/smbutil_man_final)
mount_smbfs	Evolving	See man page (http://jurassic.eng/home/thurlow/work/cifs/mount_smbfs_final)
libsmb.so	Project Private	Detailed design here
smbfs	Consolidation Private	Detailed design here
netsmb	Project Private	Detailed design here

4.2 Imported interfaces

Interface	Classification	Comments
sockfs calls	Consolidation Private	See Section 8.5.2.3

5 Design Considerations

5.1 Assumptions and Dependencies

We assume that there will be no interesting functional overlap with the CIFS server port in progress based on the Procom code base

We assume that this project will be suitable to backport to a Solaris 10 Update

We assume that the requirements and design are changeable, with appropriate review, as we learn new information

We assume that most users of the CIFS client will be accessing files on Windows machines and will not require access to Unix-specific objects such as symbolic links and devices

5.2 Goals and Guidelines

The KISS principle "Keep it simple Solaris!"
Organize design so we can develop with the OpenSolaris community
Limit initial functionality to basics and don't over promise
Product will work seamlessly with existing Solaris commands

5.3 Development Methods

5.3.1 Description of methodology

In porting kernel module source from a BSD release, we have had to make choices about how to proceed. We wanted to create a code base that would be a maintainable part of Solaris, as we expect this code base to be extended to improve functionality in future, and we also want sustaining engineering to be as straightforward as possible. In that light, we rejected the option of using #define's to try to create a Darwin look-alike environment within Solaris, since that would not be familiar to future engineering staff. We instead want to write a top-level Solaris layer which replicates what other Solaris virtual filesystems do, and porting CIFS/SMB-specific logic and protocol code into that framework. This will constrain the ability to apply simple diffs from the Darwin code base to our Solaris code, but will make it easier to find the home-grown bugs that we will create during the port, and will make the code more approachable for people with Solaris expertise.

5.3.2 Comparison of Solaris and Darwin interfaces

For sectionable areas of code such as vfsops and vnodes we have created interface tables for their functions in Solaris and functions in Darwin and analyzed how they compare to each other. Similarities and differences are noted and solutions devised which will allow code written for Darwin to interface with Solaris.

5.3.3 Re-use and adaptation of existing Solaris code

The nfsv2 code will be used as a basis for the filesystem operations portion of smbfs. The nfsv2 code will be pared down, and will have the appropriate data structures and functions put into it.

5.3.4 Porting existing Darwin smbfs code

Darwin is a Unix type OS forked from 4.4 BSD. Darwin's kernel is based on a combination of Mach 3.0, BSD and some proprietary functionality. SMBFS is taken from the NetBSD code and ported over to Darwin. Extra functionality such as KeyChains, active directory support and support for the GUI is added to SMBFS code.

We started with Darwin's SMBFS code(smb217-2 version), and in the process of porting it to Solaris, changed the code related to mbuf's/memory allocations, sleep/wakeup mechanisms, network operations and the general driver interface for handling the minor nodes and related device operations.

In the user context code, we are not going to port the IDL compiler code for compiling rpc stubs. Right now the only functionality requiring RPC/RAP is listing the shares on a CIFS server. The stubs that were previously generated and included in the codebase provide this functionality.

By choosing Darwin's existing functional code base for smb-217.2, we will avoid much development and debugging time as the code has had a lot of soak time in active usage.

5.3.5 Debugging via Tracing input and output on the wire

All the packets which are sent over the wire can be traced using Ethereal which understands the SMB protocol. Ethereal shows the packet trace, each parameter, and the values which are sent over the wire. So, a packet captured from a Windows/CIFS client to a CIFS Server(Windows/Samba) can be compared with the packet generated by our CIFS client sent to a CIFS server to find out if we differ in any parameters and some comparable values.

6 Security Considerations

6.1 Authentication

CIFS/SMB has had several different authentication models over time and many are still supported. Simple password schemes were used in the early life of the protocol, and as security awareness has grown, the quality and quantity of authentication methods have improved, culminating with the use of Kerberos authentication in Active Directory environments. With password-based schemes, the CIFS client will accept a password via the "login" subcommand of smbutil(1); the password will be passed to the kernel via a new ioctl() and stored per-user and per-share in kernel memory until needed for authentication. To better deal with sites where user accounts have synchronized passwords, we will investigate implementing a PAM module to use the ioctl() mechanism to save the user's password for future use.

Configuration settings for minimum permissible security model can be set for either the whole system via SMF, or for an individual user via $HOME/.nsmbrc. Policy is applied in such a way that the highest applicable level of authentication listed in either source is used.

The security model used to access a given CIFS share depends on both the "dialect" negotiated in the "Session Setup" exchange and the administrative choices made for the share by the admin, which are discovered in the "Tree Connect" exchange. If a negotiated security model is too weak per the preferences files on the client, the client will close the connection with the server. The following is a listing of the authentication models; all will supported in the CIFS client except as noted.

6.1.1 Anonymous and Guest login

It is possible to send a "Session Setup" packet with a null username and password; this was sometimes used for world-readable shares the world was welcome to see. It is not clear that modern CIFS servers can be set to accept this authentication for actual file sharing, but this is still commonly used for enumerating shares via RAP.

An alternate way to handle world-readable shares is to use the username "GUEST" without a password. Modern CIFS servers can be set to allow such a login, but typically such access is disabled to prevent this distinct user account from being abused.

6.1.2 Plaintext passwords

Modern CIFS servers can be set to permit plaintext passwords with simple ASCII encoding, though this is never the default because the password can be revealed so easily. If a server is set up this way, we would need to supply the plaintext password or decline to work with that CIFS share. A useful distinction is between user-level security and share-level security. With user-level security, a username is supplied and a password unique to that user is supplied. Since a rogue user could set a trap to capture user passwords this way, we will not support plaintext passwords unless the user has specified the "-i" flag to the smbutil(1) "password" subcommand and there is no system-wide prohibition on this authentication level. With share-level security, a password is associated with just an individual share, so we will permit this if the user has used the "-s" option to smbutil(1) "login" subcommand.

The Darwin code supports plaintext passwords, but this can be a security risk and is only used with the very oldest dialects of the CIFS protocol. We will disable this mode by default via an SMF property, which will result in the client declining to set up a connection with such a server; it will be possible for the sysadmin to permit plaintext passwords if necessary.

6.1.3 LM Challenge / Response

To get away from passwords sent over the wire, a password-based scheme was invented and remains supported. First, the server stores the user's password encrypted via DES with a well-known key; this stored info is called the "LM Hash". At Session Setup time, the server generates a random string of bytes and sends them to the client as a "challenge". Both the client and server encrypt the challenge bytes via DES, using the LM Hash as the key. The client sends the encrypted result to the server, which accepts the result if it matches its computations. This is not considered robust, because both the usual method for generating the challenge can permit replay attacks, and because the specific conversion algorithm for the LM Hash makes dictionary attacks simpler.

6.1.4 NTLM Challenge / Response

This upgraded password-based scheme improves the above scheme by using Unicode UCS-2LE-encoded mixed-case passwords and RFC-1320 MD4 processing instead of DES with a well-known key. The result is better; some clients compromise this by sending both an LM Response and an NTLM response in the same Session Setup packet.

6.1.5 NTLMv2

NTLMv2 does not appear to be used by default on existing CIFS servers, but modern ones can be configured to use it instead of NTLM. This is not negotiated in any way, so if the server uses this, the client must be configured to use it or must try it in addition to other schemes it tries. It adds some random data and a time stamp to prevent replay attacks and replaces MD4 with HMAC-MD5 (RFC 2104) to make dictionary attacks more expensive, but it still has the same guessability weaknesses as NTLM.

6.1.6 Extended Security and Kerberos

If the CIFS client uses the NTLM 0.12 protocol dialect and specifies the CAP_EXTENDED_SECURITY capability in the Negotiate exchange, modern CIFS servers agree to use Extended Security, which was introduced as a very different basis for authentication. The server response to the Negotiate includes a security blob with an "SPNEGO initiate" packet, which the client returns with a credential in the Session Setup request. The Session Setup will complete with the server picking a security flavor, which is usually Kerberos but could also be another challenge/response method. Once the Session Setup is complete, the client and the server are authenticated; the integrity of the TCP connection is assumed to be sufficient to secure CIFS/SMB traffic.

When used with Active Directory, Kerberos authentication permits true single-sign-on, where the login password permits you to access CIFS shares without typing further passwords. Solaris has the infrastructure to work with AD's Kerberos and LDAP implementations, and the "secret sauce" authorization information returned with the initial Kerberos ticket is only needed by CIFS servers, so we are optimistic that this will be supportable. The proposed approach for this is to uncomment the code and see what undefined symbols we need, and then to see if we can resolve them readily from either Solaris or Darwin sources. This will not be particularly easy work to estimate correctly.

6.2 Multi-user mounts

In Darwin, the smbfs module assumes that a mount will only be used by one user. This will obviously not be acceptable for Solaris. We will break this assumption by managing multiple SMB Sessions per mounted smbfs filesystem, and finding the correct session for the user before doing over-the-wire SMB operations. A mount of an smbfs filesystem will be possible without valid authentication information present, in which case the Session Setup and the I/O will fail with EPERM until a password is provided by smbutil or a Kerberos ticket can be accessed. This will permit mounts from /etc/vfstab and via the automounter to be performed, but will not permit data access after a reboot until a user credentials are available. It appears this is no worse than current Kerberos-enabled NFS behavior. Note that anonymous logins, usually used for read-only access to data, will work without credentials as expected.

Another issue is password management. For configurations not using Active Directory, it is common use with CIFS/SMB to supply the username and sometimes the password as part of the UNC resource name. These UNC names are accepted by 'mount_smbfs', which can also prompt for a password. We need an alternate way for the user to enter the password in order to implement multi-user mounts as discussed above. On Darwin, it is also possible to use 'smbutil' with a UNC name to put a hashed password into the user's' $HOME/.nsmbrc file, such that mount_smbfs can use it later. This will not be supported on Solaris due to rules against putting passwords or equivalents in regular files. Instead, we will add an ioctl() to /dev/nsmb to permit smbutil to send the server/share/user/password tuple into the kernel, where it will be stored until needed for a Session Setup. These zone dependant tables will be stored securely in kernel memory, and will not be preserved across a reboot.

We don't currently believe that we need a way to prompt the user for a password when a share is accessed, but will need usability feedback on this.

6.3 RBAC and privileges

The CIFS client should follow the example of NFS here, as all that should be necessary is to make sure we have secpolicy* checks at appropriate places. The NFSv2 client calls secpolicy_fs_mount(), secpolicy_fs_unmount(), secpolicy_vnode_setattr(), secpolicy_vnode_access() and secpolicy_fs_linkdir(); we likely want all of these in smbfs, though an over-the-wire access check would replace the secpolicy_vnode_access() call.

6.4 Name Resolution

By default, the CIFS client uses gethostbyname() to resolve hostnames, falling back to use NetBIOS name resolution (NBNS) if needed. The use of NBNS can be risky due to past known exploits, so we will add an SMF property to disable the fallback to NBNS; the default setting of the property will permit the fallback so that Solaris clients in Microsoft environments will work out of the box.

7 Architectural Strategies

7.1 Use of smb-217.2 from Darwin

We chose to use code from a BSD variant because of licensing which was compatible with the Solaris kernel and the CDDL, and we chose to use Darwin since it seemed like the most relevant and recent of the BSDs. The smbfs version at the time was smb-217.2. We have been tracking changes made since we took our copy of the code.

7.2 Use of PSARC/2005/446 interfaces (uconv functions) which implement codeset conversion

Although the code base of smb217-2 did come with codeset conversion capabilities which we could have ported, we have decided to standardize on the Solaris interfaces providing this capability, minimizing code duplication within the Solaris kernel environment.

7.3 Use of nfsv2 vfsops and vnops code as a re-engineering base

The choice of using nfsv2 as a base as opposed to nfsv4 is due to it being simpler, and so it is easier to pare it down to use as a base for the smbfs code.

7.4 Error detection and recovery

The netsmb module does the majority of it's error detection and recovery itself. Typical errors are with the I/O between the CIFS server and CIFS client or related to memory. For all the error conditions, the information is logged in the /var/adm/messages file, allocated memory is cleaned up, and the program exits gracefully.

All the buffers allocated in the netsmb module when sending the request/response for each of the calls at the time of setup are cleared once the information is copied out to the user space before the ioctl() returns. So in the user level if libsmb.so fails for any reason, the program just exits printing the error to stdout.

The smbutil command does not have much potential for error. It simply requests an operation and receives the data back from libsmb.so and prints to stdout.

7.5 Memory management policies

7.5.1 Memory allocation

The standard kernel memory allocator calls (kmem_alloc() and friends) will be used; the conversion from Darwin's malloc() family of calls is mostly obvious.

7.5.2 Mbufs vs. mblocks

All the packets for negotiation, session setup, write and read are built in Darwin using mbufs. For Solaris, we will have a conversion layer called MBCHAIN which maps the mbuf operations to operations on data blocks associated with mblocks instead.

The problem with MBLKs is that they don't have the "previous" and "next" pointers for traversing as Darwin does. So MBLKs have to declare two other structures which support MBCHAIN/MDCHAIN functionality. These structures are very closely bound to the SMB code from Darwin in the NETSMB module. Thus, to take advantage of the code and not make too many changes, MBCHAIN and MDCHAIN structures are used with MBLK's implementation.

7.6 Concurrency and synchronization

7.6.1 Threading

Darwin uses current_thread()/current_proc() interfaces to check something in the current thread or process; these will be replaced with references to curthread and curproc in Solaris.

7.6.2 Thread synchronization primitives

BSD

msleep() and wakeup() are used to handle event based process blocking. If a process has to wait for an external event, it is put on sleep by msleep(). It is later woken up by wakeup() call to indicate the resource the process was blocking on is available now.

Solaris

Condition variables are the standard form of thread synchronization on Solaris when a signaling algorithm is being used. They are designed to be used with mutex locks which allow an atomic check of the condition. cv_wait() and cv_timedwait() are called to put the thread to sleep waiting on a change of condition variable or a signal. cv_signal() is used to signal the completion of the usage of the resource to a process that is waiting on it.

Darwin Function	Solaris Function
msleep()	cv_wait()/cv_timedwait()/cv_wait_sig()
wakeup()	cv_signal()

7.6.3 Timers

Darwin uses the nanotime() call to get a high-resolution version of the current time. This will be replaced on Solaris by gethrestime().

Darwin uses timespeccmp()/timespecadd()/timespecsub() to deal with time intervals, which are supported in Solaris.

7.6.4 Concurrent access issues

In Windows environments, a file create or open can specify the "share access" to be granted to another process trying to open the file. A Windows app will often use the FILE_NO_SHARE OR FILE_SHARE_READ to prohibit all competing access or to deny write access to other openers. This is extended to the over-the-wire case by support in the CIFS protocol. This means that two processes doing this cannot accidentally clobber a file's contents with competing writes. The only other protection from this would be byte-range locking, which is not in scope.

In Unix/Linux environments, the open(2) API has not accepted an argument to specify this; matching this, our client's opens will not impose restrictions to other processes (i.e. NT_CREATE_ANDX's ShareAccess is always set to NTCREATEX_SHARE_ACCESS_ALL) or to reopen files to support fcntl()'s F_SHARE semantics. Future work could add support for fcntl()'s F_SHARE interface on files which are already open. The Pebble Beach project has plans to add a new open(2) variant which accepts a share mode argument to be applied at open time, and we would like to support it when it is available.

7.6.5 TIMEOUT support on the port

SO_RCVTIMEO option defines the receive timeout value, which is how long the system waits for calls like sorecvmsg() to be completed before the operation times out.

In smb_iod_recvall() function, the BSD code waits in an infinite loop until we receive any pending packets from the network or until the socket reaches TIMEOUT value. Solaris doesn't have support for the TIMEOUT value so we just wait in the loop for the incoming packets thinking that there are more packets to come.

Support for SO_SNDTIMEO and SO_RCVTIMEO are not yet implemented in the sockfs layer inside the Kernel.

7.7 Communication mechanisms

7.7.1 Kernel authorization

Darwin uses the routines suser()/kauth_cred_getuid()/kauth_cred_getgid() to identify a proper user. We will replace these with calls to crgetuid() and crgetgid() on Solaris. These calls are used by the NetSMB code while setting up the connection for that particular user and used for authorizing a user. There is no authentication involved in the local machine while doing CIFS authentication.

7.7.2 Byte ordering

For the code to be portable, we need to ensure that the integers are properly converted to and from the network byte order. As the CIFS protocol defines network byte order as little-endian we must select functions supporting that conversion. The functions we use are:

Function	Usage
htoles(x)	host to little endian short
letohs(x)	little endian to host short
htobes(x)	host to big endian short
betohs(x)	big endian to host short

Currently, we are using the Darwin's implementation of the Byteorder routines for Solaris. As this code is also present in Solaris, we should eventually make use of the Solaris code.

7.8 Other topics

7.8.1 UIO routines

Darwin uses these routines to access fields in the uio structure; we will use direct field access in Solaris.

Function	Usage
uio_isuserspace()	Check the uio_segflag and return if its user space.
uio_iovcnt()	Return the active iovecs for the given uio_t.
uio_resid()	Return the residual IO value for the given uio_t.
uio_curriovbase()	Return base address of current iovec associated with the given uio_t.
uio_curriovlen()	Return the length of the current iovec associated with the give uio_t.
uio_update()	Update the given uio_t with the a_count of completed IO.

7.8.2 vfs_context_t

vfs_context_t encapsulates the context in which the VFS operation is being performed, which holds the proc and cred structures.

       struct vfs_context {
             proc_t  vc_proc;
             ucred_t vc_ucred;
       }

The use of this structure to pass the context pervades the code; here is where it can be found:

Basic creation and initialization functions:

Function	Comments
smb_scred_init()	Initialize the context.
smb_sigintr()	Calls vfs_context_issignal with SMB_SIGMASK
vfs_context_ucred()	Return the credential structure.
vfs_context_is64bit()	Returns true if the proc is 64-bit.
vfs_context_rele()	Release the context.
vfs_context_issignal()	Return the pending signals if any for that particular process. (ISSIG() may be used.)
vfs_context_create()	Create a new context. If not initialize.

Other functions:

       smbfs_vinvalbuf()
       smbi_getattr()
       smbi_setattr()
       smbi_open()
       smbi_close()
       smbi_fsync()
       smbfs_readvdir()
       smbfs_readvnode()
       smbfs_vinvalbuf_internal()
       smbfs_vinvalbuf()
       smbfs_writevnode()
       smbfs_mount()
       smbfs_root()
       smbfs_start()
       smbfs_vfs_getattr()
       smbfs_sync()
       smbfs_unmount()
       smbfs_sysctl()
       smbfs_vget()
       smbfs_fhtovp()
       smbfs_vptofh()
       smb_flushvp()
       smbfs_composeacl()
       smbfs_close()
       smbfs_getattr()
       smbfs_setattr()
       smbfs_read()
       smbfs_write()
       smbfs_fixinheritance()
       smbfs_composeacl()
       smbfs_create()
       smbfs_create0()
       smbfs_remove()
       smbfs_rename()
       smbfs_link()
       smbfs_symlink()
       smbfs_readlink()
       smbfs_mknod()
       smbfs_mkdir()
       smbfs_rmdir()
       smbfs_readdir()
       smbfs_fsync()
       smbfs_pathconf()
       smbfs_ioctl()
       smbfs_advlock()
       smbfs_lookup()
       smbfs_offtoblk()
       smbfs_blktooff()
       smbfs_pagein()
       smbfs_pageout()
       smbfs_setxattr()
       smbfs_listxattr()
       smbfs_removexattr()
       smbfs_getxattr()
       smbfs_open()
       smbfs_vinvalbuf()
       smbfs_reclaim()
       smbfs_inactive()
       smbfs_mmap()
       smbfs_mnomap()

7.8.3 Codeset conversions

In Darwin, the ICONV module inside the kernel does the translation of UCS-2 to cp437 conversions because the path name and file names sent by the SMB servers are always in the Unicode format. The Solaris kernel, in order to understand them, has to translate to the native codeset and when it has to send something to the server, it has to send it in Unicode format.

The darwin smbfs uses UTF-8 <-> UCS-2 iconv if CAP_UNICODE capability is returned from server and otherwise, it uses UTF-8 <-> UCS-2 <-> CP437 code conversion.

In Solaris, the PSARC/2005/446 interfaces (uconv functions) implement codeset conversion. However, currently UCS-2 and pc437 are not supported; UCS-2 is no longer in common usage, and pc437 is the original character set of the IBM PC which is a subset of Unicode, and is only used as a fallback position if Unicode failed.

The uconv functions support the necessary Unicode conversions (but with a different name). UCS-2 is no longer recommended even from Microsoft and should be instead replaced with UTF-16. They are the same except that the UTF-16 (which Microsoft Windows supports these days) can support entire Unicode coding space of U+0000 to U+10FFFF. UCS-2 only supports so-called Basic Multilingual Plane (BMP) which is U+0000 to U+FFFF, i.e., just 16-bit range of Unicode not the current 21-bit range.

UTF-16 also differs in a sense that even though the scalar data type size is of uint_16, when it has to represent Unicode characters between U+10000 to U+10FFFF, it will use two 16-bit units.

So in summary, for UTF-8 <-> UCS-2 conversions we can replace them with UTF-8 <-> UTF-16 code conversions. For pc437, we will expect that any CIFS server we are interacting with supports Unicode, and not support this codeset either.

More information on the uconv functions can be found here: http://sac.sfbay.sun.com/Archives/CaseLog/arc/PSARC/2005/446/materials/uconv_functions.9f

7.8.4 Zones

Zones is a virtualization solution for Solaris which supports multiple virtual hosts on a single Solaris kernel instance. Kernel modules must be modified to use allocated memory and to ensure that threads are not able to access resources they should not or escalate privileges.

The expectation for the CIFS client is that it will work the same way in all zones without behavioural differences.

7.8.4.1 Conversion checklist

Almost all global variables (excluding stuff like tunables) must now be elements in a per-zone globals structure; the test for this is to compile your code and run 'nm' on the object files, and see what shows up as type OBJT.

Module startup functions must call zone_key_create() with a key and callbacks to an init routine, a shutdown routine and a fini routine; init/fini alloc/free the per-zone global struct, and the shutdown thread should message all threads in the zone to stop (which means all threads need to poll on something so they can be told to stop).

References to former globals must use zone_getspecific() with the key to get a pointer to the per-zone globals, and then dereference the item needed.

Creds, networking setup, file namespace are all per-zone, and threads must not operate on data structures from other zones; most routines can have ASSERTs, but VOP_INACTIVE() can be called on vnode recycling, so it should have an async handoff to the correct zone like NFS does. This is important because zones are used to implement compartmentalized security in the Trusted Solaris Extensions (aka Rampart).

7.8.4.2 Notes on the netsmb module

The netsmb module exports a device node which can be opened and ioctl()'ed to set up connections. In Darwin, it sets up 1024 device nodes to service connections. For Solaris, we will implement a clonable device driver which allocates a smb_dev_state_t structure as needed each time /dev/nsmb is opened, with no small arbitrary limit per-zone or per-system.

The zone "init" routine will set up the per-zone state corresponding to globals used in Darwin, with the fixed-size smb_dev_state_t table dropped. Simple lookup tables and device linkage tables (e.g. nsmb_ops) can be left as globals. From analysis, the variables in netsmb that must be per-zone is small: dev_lck, smb_iod_next, smb_major, smb_minor, smb_vclist, smb_vcnext and smbechoes. The only threads created, for the smb_iod's, have a well-defined termination method, so iterating over the current iod's and setting that bit will work.

The nsmb_ioctl() will also have to fail if called somehow from a thread belonging to another zone.

7.8.4.3 Notes on the smbfs module

Largely, smbfs can mimic what NFSv2 uses for Zones. Particular attention must be paid to thread creation and termination, as locking and concurrency issues there will be the thorniest problems. When looking for Zones calls, the interesting routines to look for are:

zone_key_create()
getzoneid()
zone_find_by_path()
zone_hold()
zone_rele()
zone_status_get()
zone_getspecific()
zone_setspecific()
zcmn_err()
zthread_create()

An important thing NOT to copy is nfs_zone(), which permits cross-zone filesystem access for network installs; this is not appropriate for CIFS/SMB.

8 Detailed System Design

8.1 smbutil

8.1.1 Purpose

smbutil is a user-visible command to map names, list shares and other simple things. This command makes use of the interface provided by libsmb.so library. This processes the output given by libsmb.so library and displays it to the user. There is not much occurring in this code other than issuing calls to the library and displaying the results.

We will extend it to accept passwords and pass them to the kernel to be stored for later use. The expected normal mode of operation is for smbutil to prompt the user for a password, but it will accept a password in a UNC name on the command line if given. The reasons for this are to permit access from command scripts and to maintain compatability with the behaviour of the Darwin code and the behaviour of Microsoft's "net use" command.

smbutil's "view" subcommand is analogous to "showmount -e <host>" usage for NFS and to the "smbclient -L <host>" usage for the Samba ftp-like client. We do not believe it is appropriate to try to unify these commands because of the NFS-centric nature of the showmount command and because of licensing challenges in mixing smbutil and smbclient sources. We will consider whether there is value in integrating with the dfshares and dfmounts commands, which may not be widely used.

8.1.2 Interfaces

See man page(http://jurassic.eng/home/thurlow/work/cifs/smbutil_man_final)

8.2 mount_smbfs

8.2.1 Purpose

mount_smbfs is the smbfs-specific mount utility called by mount(1M). It links with libsmb.so.

8.2.2 Interfaces

See man page(http://jurassic.eng/home/thurlow/work/cifs/mount_smbfs_final)

8.3 libsmb.so

8.3.1 Purpose

This library provides the interface to the Kernel NSMB driver through ioctl() calls. It has the code to use ioctl() calls using which an authenticated connection can be setup. Above that, libsmb implements the RPC/RAP functionality to do some administrative activities with the CIFS server. All this is done over the transaction SMB to search for a machine on the network, to get the share list, and to login at the user level for the commands 'smbutil' and 'mount_smbfs'.

The RPC stubs are generated from the IDL compiler. As of now we are not porting the IDL compiler, we are going to use the RPC stubs which are already generated.

8.3.2 Interfaces

libsmb.so exports the following interfaces, stability level 'Project Private':

Function	Usage
nb_ctx_create()	Create a NetBIOS context
nb_ctx_readrcsection()	Read the user's rc file
nb_ctx_resolve()	Resolve the name
nbns_resolvename()	Resolve name
nbns_getnodestatus()	Ping the machine on port 137
smb_ctx_setshare()	Set the share which we want to examine
smb_netshareenum()	RPC to call after the session is setup to get the list of shares
smb_ctx_lookup()	Setup the session by calling the appropriate ioctls

8.3.3 RPC and RAP

RPC (Remote Procedure Call) Protocol

RPC allows a process on one system to make function calls on another system. MS-RPC is Microsoft's implementation of RPC.

The client calls the local stub procedure - previously generated by the IDL compiler - which does this:

1. Collects the parameters required for a particular request(like NETSHAREENUM).

2. Translates them to a NDR format.

3. Calls the ioctl() to perform a transaction SMB.

The server does this:

1. Accepts the call and calls the server stub procedure.

2. Retrieves the parameters with the stubs and calls the actual procedure.

3. Returns the results as part of the CIFS transaction protocol reply.

Local stubs are generated using an IDL compiler when an interface is passed through it. We are currently using the pre-generated stubs. For any future additions to the functionality we need to port the IDL Compiler from a BSD or another Solaris source.

RAP (Remote Administration Protocol)

This is similar to RPC protocol which can be used to submit requests to a CIFS server and obtain the results from the server and is based on the transaction SMB of CIFS protocol. Transaction protocol is used when there is lots of data which needs to be exchanged and is mainly used for SMB Remote Procedure Calls. This transaction protocol is layered above the SMB header. More information about how to use the RAP commands and the parameters to be sent to the CIFS server can be obtained from the link below.

CIFS RAP Specification(http://jurassic.eng/home/pk162140/proj_docs/draft-leach-cifs-rap-spec-00.txt)

RAP commands:

Command	Result
NETSHAREENUM	List shares on a server.
NETSERVERENUM2	Enumerate the computers on a particular domain.
NETSERVERGETINFO	Get info about a server.
NETSHAREGETINFO	Get info about a particular share.
NETWKSTAUSERLOGON	Log a user on a remote CIFS server.
NETWKSTAUSERLOGOFF	Log off a user from a remote CIFS server.
NETUSERGETINFO	Get detailed information about a particular user.
NETWKSTAGETINFO	Get detailed information about a workstation.
SAMOEMCHANGEPASSWORD	Change user's password on a CIFS server.

How does NetShareEnum work?

NetShareEnum RAP function is used to retrieve information about each shared resource on a CIFS server.

The definition is:

unsigned short NetShareEnum(
   unsigned short       sLevel;          /* Level of information. Must be 1. */
   RCVBUF               pbBuffer;        /* Buffer to receive data. */
   RCVBUFLEN            cbBuffer;        /* size of data returned. */
   ENTCOUNT             pcEntriesRead;   /* no of entries read if the call is successful. */
   unsigned short       *pcTotalAvail;   /* No of shared resources on the CIFS server. */
 );

The transaction parameters for this call are:

1. 16-bit function number for NetShareEnum == 0.

2. Parameter descriptor string "WrLeh".

3. data descriptor string "B13BWz". only shares of length 12 and less are returned.

4. The actual parameters.

In this case it is 1(corresponding to "W"). This is the level of detail.

16-bit integer that contains the size of receive buffer.

There is no data as part of the request for this call.

The transaction response contains of a number of SHARE_INFO_1 structures, for example:

struct SHARE_INFO_1 {
        char                shi1_netname[13]    /* name of the resource. */
        char                shi1_pad;           /* alignment purposes. */
        unsigned short      shi1_type           /* type of shared resource (disk/printer/communications/IPC) */
        char                *shi1_remark;       /* comment about the shared resource. */
}

There is no auxiliary data to receive.

8.4 smbfs

8.4.1 Purpose

The smbfs module provides a VFS plugin to satisfy all filesystem operations on CIFS/SMB filesystems.

8.4.2 Interfaces

Code will be based off of nfsv2 codebase, stripped of non applicable code and repopulated with Darwin smbfs code where appropriate as listed below. The stability level of all vfsops and vnodeops interfaces is 'Consolidation Private'.

8.4.2.1 vfsops

Note that some vfsops in Solaris correspond not to vfsops in the smb-217.2 codebase, but to initialization functions. Where this occurs, the match is prefixed with (non-vfsop).

Solaris VFS	smb-217.2	Comments
vfs_mount	smbfs_mount
vfs_unmount	smbfs_unmount
vfs_root	smbfs_root
vfs_statvfs	smbfs_vfs_getattr
vfs_sync	smbfs_sync
vfs_vget	smbfs_vget	returns (ENOTSUP) for smbfs flat namespace lookup
vfs_mountroot	(non-vfsop) smbfs_module_start
vfs_freevfs	(non-vfsop) smbfs_module_stop
vfs_vnstate	N/A	Solaris nfsv2 nulls this. We will NULL as well.

8.4.2.2 vnodeops

Based off of nfsv2 codebase, stripped of non applicable code and repopulated with Darwin smbfs code where appropriate.

Note: the areas where the second column is blank means that this will be mainly based off the nfsv2 code.

Solaris VFS	smb-217.2	Comments
vop_open	smbfs_open	Gets the attributes, checks for permissions and then calls smbfs_smb_open()
vop_close	smbfs_close	Flushes and invalidates all dirty pages and calls smbfs_smb_close(). Removes entries from attribute cache.
vop_read	smbfs_read	Breaks into chunks and calls smbfs_readvnode(). This calls smb_read(), which calls smb_smb_read().
vop_write	smbfs_write	Breaks into chunks, calls smbfs_writevnode() -> smb_write(). Finally smb_smb_write() is called.
vop_ioctl	smbfs_ioctl	the Darwin version returns EINVAL, we will return ENOTTY per Solaris convention
vop_setfl		not used by NFS, not needed for CIFS
vop_getattr	smbfs_getattr	Does an attribute cache lookup, calls smbfs_smb_lookup() and then updates the attribute cache.
vop_setattr	smbfs_setattr	If file size needs to be set, call smbfs_smb_setfsize() If atime or mtime needs to be set, based on whether it is NT4 or DOS, call smbfs_smb_setpattr() or other functions. Then invalidate the attr cache and update the attr cache with the modified values.
vop_access
vop_lookup	smbfs_lookup	also used for named streams, so extended attribute code in Darwin will convert to calling this with a LOOKUP_XATTR flag
	smbfs_setxattr
	smbfs_removexattr
	smbfs_listxattr
vop_create	smbfs_create0	calls smbfs_create with vnop_create_args, nulls second & third args
vop_remove	smbfs_remove	Does a directory name purge and then smbfs_smb_delete().
vop_link	smbfs_link	This one currently returns ENOTSUP (not supported).
vop_rename	smbfs_rename	Calls smbfs_smb_delete() to delete the target smbnode and then calls smbfs_smb_rename().
vop_mkdir	smbfs_mkdir	Calls smbfs_smb_mkdir(). Then updates cached copy of mtime and marks attr cache invalid.
vop_rmdir	smbfs_rmdir	Calls smbfs_smb_rmdir(), updates cached copy of mtime and marks attr cache invalid, purges the directory cache.
vop_readdir	smbfs_readdir	Calls smbfs_readvnode()
vop_symlink	smbfs_symlink	A buffer, which includes sym link magic length and target length is created. File type is VLINK. Then the function smbfs_create() is called. The file created contains the buffer and file length is the buffer length.
vop_readlink	smbfs_readlink	smbfs_smb_tmpopen(); smb_read(); smbfs_smb_tmpclose().
vop_fsync	smbfs_fsync	Calls smb_flushvp()
vop_inactive	smbfs_reclaim	Frees smbnode and gives the vnode back to the system.
vop_fid
vop_rwlock
vop_rwunlock
vop_seek
vop_cmp		not used by NFS, will not be used for SMBFS
vop_frlock
vop_space
vop_realvp		will just return EINVAL
vop_getpage	smbfs_pagein
vop_putpage	smbfs_pageout
vop_map
vop_addmap	smbfs_mmap	smbfs_open and if successful sets flags for NISMAPPING
vop_delmap	smbfs_mnomap	smbfs_close and if successful removes flags for NISMAPPING
vop_poll		not used by nfs
vop_dump
vop_pathconf	smbfs_pathconf
vop_pageio
vop_dumpctl		not used by nfs
vop_dispose		not used by nfs
vop_setsecattr
vop_getsecattr
vop_shrlock
vop_vnevent
N/A	smbfs_blktooff	not necessary in Solaris
N/A	smbfs_offtoblk	not necessary in Solaris

8.5 netsmb

8.5.1 Purpose

The netsmb module is the device driver module which deals with the implementation of the network connections to the CIFS server. It appears in the device namespace as /dev/nsmb. It will push the TCP streams module below it. It uses the MBLK interfaces for storing/retrieving the packet information, SOCKFS module for the actual network interaction, CRYPTO module for the encryption of passwords, path and file names, and UCONV module for the Unicode conversions. The user level library libsmb.so makes ioctl() calls which are translated to the ioctl()'s in this module.

The stability level of these ioctl()'s is 'Project Private'. The stability level of the device name is 'Consolidation Private'.

8.5.2 Interfaces

8.5.2.1 ioctl()'s available from device driver

ioctl	Usage
SMBIOC_NEGOTIATE	negotiate a dialect and ground rules
SMBIOC_SSNSETUP	set up a session for the current user
SMBIOC_TCON	issue a tree connect to a particular share
SMBIOC_REQUEST	send an SMB protocol request
SMBIOC_T2RQ	issue a TRANS2 call
SMBIOC_LOOKUP	lookup a machine or domain name
SMBIOC_READ	alternate file read interface
SMBIOC_WRITE	alternate file write interface
SMBIOC_TDIS	disconnect from a file tree
SMBIOC_FLAGS2

8.5.2.2 Interfaces exported to smbfs module

The following interfaces are exported for use by the smbfs module, at stability level 'Project Private'.

void smb_scred_init(struct smb_cred *scred, vfs_context_t vfsctx);
int  smb_read(struct smb_share *ssp, u_int16_t fid, uio_t uio, struct smb_cred *scred);
int  smb_write(struct smb_share *ssp, u_int16_t fid, uio_t uio, struct smb_cred *scred, int timo);
int  smb_sigintr(vfs_context_t);
int  smb_put_dmem(struct mbchain *mbp, struct smb_vc *vcp, const char *src, int len, int caseopt, int *lenp);
int  smb_dev2share(int fd, struct smb_share **sspp);
void smb_share_unlock(struct smb_share *ssp, struct proc *p);
void smb_share_put(struct smb_share *ssp, struct smb_cred *scred);
void smb_iod_shutdown_share(struct smb_share *ssp);
int  smb_checksmp(void);

8.5.2.3 Interface to networking code

The netsmb module uses a set of BSD calls into the networking code that have a rough equivalence in Solaris's socket filesystem. They map as follows:

BSD Call	Solaris Call
sock_connect()	soconnect()
sock_setsockopt()	sotpi_setsockopt()
sock_sendmbuf()	sosendmsg()
sock_receive()	sorecvmsg()
sock_receivembuf()	sotpi_recvmsg()
sock_connectwait()	sowaitokack()
sock_isconnected()	soisconnected()
sock_close()	soshutdown()*
sock_shutdown()	soshutdown()
sock_nointerrupt()	*

*No direct mapping exists. Need to find equivalent calls.

Port 137 is used in sending and receiving the data to SMB servers and port 139 is used by the user level code in resolving the NetBIOS name. Packets are built into the MBLK data block using the MBCHAIN routines in a manner acceptable to the SMB protocol and sosendmsg() is used to send it out onto the network. Data is received in the same manner in the form of message block using sorecvmsg().

TCP connections are made when upper levels request traffic, and are closed after use by smbutil and left open by smbfs. The smbfs code may have some ability to multiplex CIFS traffic to the same server over virtual circuits; we will examine this code with an eye to behaviour on busy multiuser clients. We will also examine the value of an inactivity timeout. We will also study the use of port 445 (Raw TCP) for sending and receiving packets.

8.5.3 Consumers of the NetSMB module

The consumers of netsmb module includes the smbfs module, and the libsmb.so library in the context of the CIFS client. Although any application which wants to setup a connection to a SMB server could use this module, currently we want to restrict it's use by other modules.

9 Acknowledgements

The format of this document is adapted from: http://www.construx.com/survivalguide/desspec.htm Web page copyright © 1993-2002 Steven C. McConnell. Permission is hereby given to copy, adapt, and distribute this material as long as this notice is included on all such materials and the materials are not sold, licensed, or otherwise distributed for commercial gain. Software Design Specification copyright © 1994-1997 by Bradford D. Appleton