Overview ======== In order to improve our TCP/IP performance many moons ago, Synchronous STREAMS was introduced via PSARC/1994/151. Since then, we have made additional architectural changes to the TCP/IP stack (especially as a result of FireEngine -- PSARC/2002/433) which have removed TCP/IP's use of Synchronous STREAMS and left it mostly adrift. However, there is one remaining consumer: the routing socket module, RTS. This module makes clever but (as explained below) inappropriate use of Synchronous STREAMS to return error codes to applications issuing errant routing socket operations. To preserve these semantics, this case proposes to introduce a new lightweight Consolidation Private mechanism called SO_REPLY that allows STREAMS modules to return error codes in response to M_DATA messages, which RTS will be revised to use. With that addressed, this case then proposes to remove Synchronous STREAMS from Solaris. This allows over a thousand lines of complicated and subtle code to be removed, and simplifies many critical codepaths. Since Synchronous STREAMS was classified as Consolidation Private, and never leaked out of the ON consolidation, this case requests Micro Binding. RTS and Synchronous STREAMS =========================== Routing sockets were originally introduced as part of 4.4 BSD. Unlike Solaris, the BSD networking stack is mostly synchronous. Thus, BSD specified that the routing socket operations be issued using the write() system call, which would set errno upon failure. These semantics complicated matters when routing sockets were incorporated into Solaris 2.6. Specifically, in order to easily plug into the Solaris networking stack, routing sockets were implemented as a STREAMS module (RTS) that was autopushed on top of the IP module. However, STREAMS is usually asynchronous, meaning that STREAMS modules cannot affect the value returned by write() (which has usually returned successfully before the underlying STREAMS modules have even received the written data). Fortunately, Synchronous STREAMS had recently been introduced into Solaris for TCP/IP performance, and seemed to provide an nice solution to the problem. As such, RTS was fitted with Synchronous STREAMS entry points, enabling it to return errno values in response to each write() operation. However, Synchronous STREAMS was designed as a performance optimization, and thus there were a number of situations (such as the stream becoming flow controlled) that caused it to be disabled and the traditional asynchronous "slow-path" to be used instead. For this reason, RTS was coded to accept routing socket messages arriving through either mechanism. Unfortunately, this slow-path feature was inherently flawed, since errors will not be synchronously reported to the calling application. Worse yet, the application cannot anticipate when the slow path might be used, forcing the application to be suspicious of every successful write(). As the above makes clear, RTS's use of Synchronous STREAMS is neither appropriate nor correct. Thus, this case proposes to change RTS to instead use a new lightweight mechanism dubbed SO_REPLY, detailed below. SO_REPLY ======== In order to make use of the SO_REPLY mechanism, it must be explicitly enabled by the module or driver. This is done by allocating an M_SETOPTS streams message, setting the new SO_REPLY bit in the 'so_flags' member of the associated 'struct stroptions' structure, and sending the message up to the stream head. It is expected that modules will issue the M_SETOPTS in their open(9E) routine, before the consumer of the stream has had an opportunity to send any M_DATA messages downstream. Once SO_REPLY has been enabled on the stream, the stream head will block (interruptibly) after sending each M_DATA message, waiting for a reply from downstream. Rather than overload one of the existing message types to carry the reply, a new M_REPLY message type has been introduced. Each M_REPLY will have an associated strreply_t structure that contains the reply from downstream. At present, this structure is trivial and only contains the errno value associated with the requested operation: typedef struct strreply { int sr_errno; } strreply_t; While the definition of this structure is mostly syntactic sugar, it also makes it easy to pass additional information upstream in the future, if need be. Note that the new M_REPLY message type has a high-priority (>QPCTL) value to ensure that it will not be queued or delayed when it is sent upstream. As a convenience, a new mreply() utility routine has also been provided: void mreply(queue_t *wq, mblk_t *mp, int errno); This routine is analogous to miocack(9F): it converts the passed-in message to type M_REPLY, sets the sr_errno field of the resulting M_REPLY to the passed-in value, and sends the message upstream. Note again that this Consolidation Private mechanism is intentionally minimal -- for instance, there is no way to disable it once it is enabled on a stream. We can add these features in the future if they prove necessary. Revised RTS Module ================== [ NOTE: This section is informational only, and is not part of the ARC review since this just describes implementation of rts and the private interface between rts and sockfs. ] With the SO_REPLY mechanism in place, the changes to the RTS module itself are straightforward. First, as part of its open(9E) entry point, it enables SO_REPLY by sending the M_SETOPTS up to the stream head. Next, its write-side put(9E) entry point is modified to use mreply() to explicitly reply to all M_DATA messages. For M_DATA messages that need further processing by the IP module (downstream from RTS), a new flag is set inside RTS before passing the message downstream. When the reply comes back from IP and the flag is set, RTS's read-side put(9E) entry point uses mreply() to convert the message to an M_REPLY and send it upstream. Since SO_REPLY only causes the stream head to wait for replies to M_DATA message, special care must be taken to ensure that all routing socket requests are sent using M_DATA messages, rather than M_PROTO/M_PCPROTO T_DATA_REQ messages (currently, either may be sent). To guarantee this: * RTS's open(9E) routine has been tightened to only allow it to be succeed if called by sockfs. (We have never supported or documented any other use, but have not prevented it either). * RTS's T_INFO_ACK has been changed to indicate a "0" TSDU_size. This tells sockfs to always send data in M_DATA messages (this functionality is already in sockfs). Removed and Revised Interfaces ============================== With RTS changed to use SO_REPLY, the Synchronous STREAMS mechanism can be retired. Again, all of these symbols were Consolidation Private, and are not known to be in use outside of ON. The affected header files are: Specifically, the following non-static functions will be removed entirely: strwakeq() struioget() infonext() uiodup() rwnext() isuioq() qwait_rw() strsetuio() The function signature of strmakedata() will be changed. The following constants will be removed: D_SYNCSTR QWANTWSYNC QWANTRMQSYNC QSYNCSTR QSTRUIOT STRUIOT_IP STRUIOT_NONE STRUIOT_DONTCARE STRUIO_SPEC STRUIO_DONE STRUIO_POSTPONE STRUIOT_STANDARD STRUIO_MAPIN DEF_IOV_MAX INFOD_BYTES INFOD_FIRSTBYTES INFOD_COUNT INFOD_COPYOUT The following fields will be removed: from stdata_t: sd_wakeq sd_struiordq sd_struiowrq sd_struiodnak sd_struionak from queue_t: q_struiot The following structures will be removed: struct struiod (struiod_t) struct infod (infod_t) For compatibility with modules that may have set the undocumented qi_rwp qi_infop, and qi_struiot qinit(9S) fields to NULL, these fields will be left as-is.