On Wed, Dec 11, 2013 at 2:32 PM, Matt W. Benjamin <matt@xxxxxxxxxxxx> wrote: > Hi Ceph devs, > > For the last several weeks, we've been working with engineers at > Mellanox on a prototype Ceph messaging implementation that runs on > the Accelio RDMA messaging service (libxio). Very cool! An RDMA Messenger has been a cool-sounding project for which we haven't been able to get time for several years; I'm glad somebody is getting the chance to explore it seriously. > Accelio is a rather new effort to build a high-performance, high-throughput > message passing framework atop openfabrics ibverbs and rdmacm primitives. > > It's early days, but the implementation has started to take shape, and > gives a feel for what the Accelio architecture looks like when using the > request-response model, as well as for our prototype mapping of the > xio framework concepts to the Ceph ones. > > The current classes and responsibility breakdown somewhat as follows. > The key classes in the TCP messaging implementation are: > > Messenger (abstract, represents a set of bidirectional communication endpoints) > SimpleMessenger (concrete TCP messenger) > > Message (abstract, models a message between endpoints, all Ceph protocol messages > derive from Message, obviously) > > Connection (concrete, though it -feels- abstract; Connection models a communication > endpoint identifiable by address, but has -some- coupling with the internals of > SimpleMessenger, in particular, with its Pipe, below). > > Pipe (concrete, an active (threaded) object that encapsulates various operations on > one side (send or recv) of a TCP connection. The Pipe is really where a -lot- of > the heavy lifting of SimpleMessenger is localized, and not just in the obvious > ways--eg, Pipe drives the dispatch queue in SimpleMessenger, so a lot of it's > visible semantics are built in cooperation with Pipe). > > Dispatcher (abstract, models the application processing messages and sending replies--ie, the upper edge of Messenger). Good summary. You've left me feeling a little embarrassed about the Connection class with that description. ;) > The approach I took in incorporating Accelio was to build on the key abstractions > of Messenger, Connection, and Dispatcher, and Message, and build a corresponding > family of concrete classes: > > XioMessenger (concrete, implements Messenger, encapsulates xio endpoints, aggregates > dispatchers as normal). > > XioConnection (concrete, implements Connection) > > XioPortal (concrete, a new class that represents worker thread contexts for all XioConnections in a given XioMessenger) > > XioMsg (concrete, a "transfer" class linking a sequence of low-level Accelio datagrams with a Message being sent) > > XioReplyHook (concrete, derived from Ceph::Context [indirectly via Message::ReplyHook], links a sequence of low-level Accelio datagrams for a Message that has been received-- that is, part of a new "reply" abstraction exposed to Message and Messenger). > > As noted above, there is some leakage of SimpleMessenger primitives into classes that are intended to be abstract, and some refactoring was needed to fit XioMessenger into the framework. The main changes I prototyped are as follows: > > All traces of Pipe are removed from Connection, which is made abstract. A new > PipeConnection is introduced, that knows about Pipes. SimpleMessenger now uses > instances of PipeConnection as its concrete connection type. This all makes sense. > The most interesting changes I introduced are driven by the need to support > Accelio's request/response model, which exists mainly to support RDMA memory > registration primitives, and needs a concrete realization in the Messenger > framework. > > To accomodate it, I've introduced two concepts. First, callers replying to a Message use a new Messenger::send_reply(Message *msg, Message *reply) method. In SimpleMessenger, this just maps to a call to send_message(Message *, Connection*), but in XioMessenger, the reply is delivered through a new Message::reply_hook completion functor that XioConnection sets when a message is being dispatched. This is a general mechanism, new Messenger implementations can derive from Message::ReplyHook to define their own reply behavior, as needed. Can you talk more about the request/response model in the communication layer and why you're explicitly specifying what messages are replies to others? I'm not sure what makes that useful, or how a model where it is deals with stuff like 1) the two "ack/commit" responses to write requests, or 2) some of the requests in which there is not an explicit response message (especially OSD->monitor stuff like failure reports), or 3) where a request does not get a direct response message, but triggers a special indirect response of some kind (like the monitor not acking a change request explicitly, but making sure to send new maps to the person who requested the map change). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html