On Thu, 2 Jun 2016, Haomai Wang wrote: > On Thu, Jun 2, 2016 at 11:43 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > > Based on the discussion during CDM yesterday I wrote up a nicer-looking > > spec of the protocol in rst: > > > > https://github.com/ceph/ceph/pull/9461 > > > > Please let me know if this looks right. I have two questions: > > > > 1. Is TAG_START is really necessary? I guess it doesn't hurt, and makes > > it easy to add flags later. > > > > 2. We don't explicitly have anything here that indicates a session is > > stateless or stateful. Currently this is determined by the Policy stuff > > on either end and the peers just happen to agree. Setting/asserting > > it explicitly has part of the handshake seems like a good idea. Maybe a > > flags field in the TAG_IDENT message, with a flags for lossy/lossess, > > whether we initiate connections (true for client or p2p servers)? > > we already have CEPH_MSG_CONNECT_LOSSY flag when handshake. Oh yeah! I added a flags field to TAG_IDENT. sage > > > > > sage > > > > > > On Sat, 28 May 2016, Yehuda Sadeh-Weinraub wrote: > > > >> On Fri, May 27, 2016 at 10:37 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: > >> > On Fri, 27 May 2016, Yehuda Sadeh-Weinraub wrote: > >> >> On Thu, May 26, 2016 at 11:17 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: > >> >> > I wrote up a basic proposal for the new msgr2 protocol: > >> >> > > >> >> > http://pad.ceph.com/p/msgr2 > >> >> > > >> >> > It is pretty similar to the current protocol, with a few key changes: > >> >> > > >> >> > 1. The initial banner has a version number for protocl features supported > >> >> > and required. This will allow optional behavior later. The current > >> >> > protocol doesn't allow this (the banner string is fixed and has to match > >> >> > verbatim). > >> >> > > >> >> > 2. The auth handshake is a low-level msgr exchange now. This more or less > >> >> > matches the MAuth and MAuthReply exchange with the mon. Also, the > >> >> > authenticator/ticket presentation for established clients can be sent here > >> >> > as part of this exchange, instead of as part of the msg_connect and > >> >> > msg_connect_reply exchnage. > >> >> > > >> >> > 3. The identification of peers during connect is moved to the TAG_IDENT > >> >> > stage. This way it could happen after authentication and/or encryption, > >> >> > if we like. (Not sure it matters.) > >> >> > > >> >> > 4. Signatures are a separate message now that follows the previous > >> >> > message. If a message doesn't have a signature that follows, it is > >> >> > dropped. Once authenticated we can sign all the other handshake exchanges > >> >> > (TAG_IDENT, etc.) as well as the messages themselves. > >> >> > > >> >> > >> >> Is there a reason why the signature needs to be a separate message? It > >> >> would add extra overhead, and it seems to me that it would complicate > >> >> implementation (in terms of message state and such). > >> > > >> > It doesn't have to be--I was just wanting to keep things simple. We could > >> > similarly make it part of the underlying format, e.g., > >> > > >> > tag byte > >> > 8 byte signature > >> > payload > >> > >> signature should come after payload, but yeah. Might need to define > >> extended envelope to allow future extensions. > >> > >> > > >> > or whatever. That's basically the same thing, except we save 1 byte. > >> > > >> >> > 5. The reconnect behavior for stateful connections is a separate > >> >> > exchange. This keeps the stateless connections free of clutter. > >> >> > > >> >> > 6. A few changes in the auth_none and cephx integratoin will be needed. > >> >> > For example, all the current stubs assume that authentication happens over > >> >> > MAuth message and authorization happens in an authorizer blob in > >> >> > ceph_msg_connect. Now both are part of TAG_AUTH_REQUEST, so we'll need to > >> >> > multiplex the cephx message blobs. Also, because the IDENT exchanges > >> >> > happens later, we may need to pass additional info in the auth handshake > >> >> > messages (like the peer type, or whatever else is needed). > >> >> > > >> >> > 7. Lots of messages can go either way, and I tried ot avoid a strict > >> >> > request/response model so that things could be pipelined, and we'd spend a > >> >> > minimal amount of time waiting for a response from the other end. For > >> >> > example, > >> >> > > >> >> > C: > >> >> > initiates connection > >> >> > S: > >> >> > accepts connection > >> >> > -> banner > >> >> > -> TAG_AUTH_METHODS > >> >> > C: > >> >> > -> banner > >> >> > -> TAG_AUTH_SET_METHOD > >> >> > -> TAG_AUTH_AUTH_REQUEST > >> >> > S: > >> >> > -> TAG_AUTH_REPLY > >> >> > C: > >> >> > -> TAG_ENCRYPT_BEGIN > >> >> > -> TAG_IDENT > >> >> > -> TAG_SIGNATURE > >> >> > >> >> Can we have the client start authenticating with some predetermined > >> >> auth params, and resort to having the server responding with > >> >> AUTH_METHODS only if it doesn't support the method selected by the > >> >> client. Even if not having it preconfigured, the auth method usually > >> >> doesn't change across connection instances, so we can have the client > >> >> cache that info per server. That would then be something like this: > >> >> > >> >> a first connection: > >> >> > >> >> C: > >> >> initiates connection > >> >> -> banner > >> >> -> TAG_AUTH_GET_METHODS <-- be explicit > >> >> -> TAG_AUTH_SET_METHOD <-- opportunistically trying a specific > >> >> method type anyway > >> >> -> TAG_AUTH_AUTH_REQUEST > >> >> > >> >> S: > >> >> accepts connection > >> >> -> banner > >> >> -> TAG_AUTH_REPLY > >> >> > >> >> > >> >> a followup connection: > >> >> > >> >> > >> >> C: > >> >> initiates connection > >> >> -> banner > >> >> -> TAG_AUTH_SET_METHOD > >> >> -> TAG_AUTH_AUTH_REQUEST > >> >> > >> >> S: > >> >> accepts connection > >> >> -> banner > >> >> -> TAG_AUTH_REPLY > >> > > >> > Yeah.. of even just make the initial connection try it's preferred method > >> > and only do the GET_METHODS if it is rejected. > >> > > >> > >> Right. In any case, the protocol should enable this flexibility. > >> > >> > >> > If you do a connect and immediately write a few bytes to teh TCP stream, > >> > does that actaully translate to fewer packets? I was guessing that the > >> > server writing the first bytes of the exchange would be fine but if it > >> > speeds things up for the client to optimistically start the exchange too > >> > we may as well... > >> > > >> > >> While haven't really looked at it recently, I don't think it'd be > >> possible to embed data with the SYN packet using the plain vanilla tcp > >> implementation. However, I believe that doing connect() and sending > >> data immediately following it should improve things, specifically if > >> doing async connect (as with the async messenger), but this still > >> needs to be proven. > >> > >> Yehuda > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >> > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html