Re: initial stab at a design document for SMB transport layer overhaul

Pavel Shilovsky <piastry@xxxxxxxxxxx> · Tue, 12 Oct 2010 22:11:35 +0400

2010/10/12 Jeff Layton <jlayton@xxxxxxxxx>:
> Many of the problems that we have with cifs I think come down to the
> fact that we have a very ad-hoc approach to the transport layer. It's
> been hacked on for years with no real clear goal in mind for its
> behavior.
>
> To complicate matters further, there's also a smb2fs in the works that
> uses a cut-and-pasted version of the socket handling code from cifs.
> For some time now, I've wanted to rip and replace much of the transport
> layer in cifs.
>
> About a year ago, I spent some time on a patchset to add the ability
> for the sunrpc layer to talk SMB. I got it working, but the patchset
> was pretty invasive and it wasn't a great fit for CIFS. I still think
> the overall idea of an SMB layer with well-defined behavior is the right
> approach however.
>
> The following document is a first stab at outlining the behavior and
> overall design for such a beast. I think also that a well-defined SMB
> layer in the kernel may have use beyond just cifs and smb2fs. Please
> take a look at it as you are able and comment.
>
> It's still very rough but I want to get this out there and have people
> start thinking about the design before I start coding. Once I have some
> feedback on the overall design then I'll plan to sit down and start
> working on an implementation.
>
> Questions, concerns and comments appreciated...
>
> --------------------------[snip]------------------------------
>
>                Proposal for A Unified SMB Layer for Linux
>
> Overview:
> =========
> The kernel has had two different SMB/CIFS implementations. One (smbfs)
> is now deprecated, in favor of the later one (cifs). Additionally, there
> is at the time of this writing a new filesystem being developed for the
> smb2 protocol. Much of that implementation was done via cut-and-paste
> from cifs. Obviously, this is less than ideal. Each of these
> implementations however has implemented its own transport code -- also
> less than ideal.
>
> This document is a proposal to add a new unified transport layer that
> will work for SMB and SMB2. I intend to loosely model this layer after the
> sunrpc layer in the kernel.
>
> Implementation:
> ===============
> The smb layer code will act as the mediator between the upper-layer
> filesystem code and the socket layer. The filesystem will request the
> creation of an smb "client". The smb layer will search for a suitable
> one and increase the refcount on the existing socket if one is
> available.
>
> If one isn't available then a new socket will be opened and the SMB
> layer will do a NEGOTIATE_PROTOCOL request and wait for the response.
> The caller is responsible for specifying the upper and lower bound of
> the SMB version that should be used. In general, we'll attempt a
> negprot for higher versions before lower versions. Once the
> NEGOTIATE_PROTOCOL exchange is completed, the results from it will be
> stored in the smb_client structure for use by the upper layers.
>
> The code will use a state machine to manage the socket's receive path,
> and will overload the sk_* functions to handle the socket without
> needing a dedicated thread for this. When the sk_data_ready callback
> fires, data will be recevied off the socket in interrupt context until
> we can get down to the MID in the header. At that point, we'll be able
> to wake up whatever thread is waiting for the reply to do the rest. If

Should we read a whole request by state machine and then wake up
awaiting thread?

> it's an async request, a workqueue task will be queued to a smbiod
> workqueue to handle it.

So, do you mean that there will be another thread that awaits all
async requests on smbiod workqueue and then calls callback for
handling?

> To handle a truly async request from the server to the client (i.e. an
> oplock break or similar), the upper layer will need to register a
> callback that will be queued to the workqueue.
>
> Calls will be issued to the SMB layer in a similar fashion to how it
> works with the kernel's sunrpc layer. The upper function will create
> SMB "tasks" and those will be run using a smb_run_task function. This
> will allow for async requests as well, with async replies being handled
> by the smbiod workqueue.
>
> Tasks (processes) that are waiting for replies will be put to sleep in
> TASK_KILLABLE sleep. Fatal signals will stop the sleeping and return an
> error back to the upper layer.
>
> Reconnect behavior:
> ===================
> If the server issues a TCP RST on the socket, or the client decides that
> the kernel will call the sk_state_change callback for the socket.
>
> At this point, sending of new SMBs will be suspended and any calls in
> flight will be cancelled and waiters woken back up to reissue those
> calls.
>
> The SMB layer will then reconnect the socket (probably via a
> connect_worker workqueue task) and then re-do the
> NEGOTIATE_PROTOCOL request. Once that's complete, the smb client will be
> marked as being active again, and the smb layer will call back the upper
> layers to let them know that they should redo SESSION_SETUPs etc.
>
> Timeout behavior:
> =================
> Whenever a request is sent to the server, then a timer will be set, and
> the upper layer will need to specify whether it wants "hard" or "soft"
> semantics for dealing with timeouts.
>
> If the server does not respond within a certain amount of time, then SMB
> layer will begin sending SMB echo requests to the server at a set
> interval (FIXME: stop sending these when send buffer is full?)
>
> If the server responds to the echo requests, then the client will wait
> indefinitely for the response to the original call. If the server is not
> responding to those requests, then there are two cases:
>
> hard: the client will wait indefinitely for a response from the server.
> If the server eventually starts responding to the echo requests, then
> things will proceed normally. If the server instead issues a TCP RST
> then we'll handle a reconnect. Otherwise, we'll keep sending SMB echoes
> (at least until we no longer have send buffers for the socket).
>
> soft: the client will wait for the server to respond for a certain
> period of time. If it doesn't respond within that interval, it will
> disconnect the socket and attempt to reconnect. If that reconnect fails
> (ETIMEDOUT or ECONNREFUSED, ENETUNREACH, etc...) then it will return an
> error back to the upper layer.

In general, the doc looks very good to me. May be you move it to Google Docs?

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html