Re: msgr2 protocol

Sage Weil <sweil@xxxxxxxxxx> · Mon, 12 Sep 2016 13:21:14 +0000 (UTC)

On Mon, 12 Sep 2016, Haomai Wang wrote:
> On Mon, Sep 12, 2016 at 1:05 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> > On Sat, 10 Sep 2016, Haomai Wang wrote:
> >> About thing is v1/v2 compatible. I rethink the details:
> >>
> >> 0. we need to define the new banner which must longer than before("ceph v027")
> >> 1. assume msgr v2 banner is "ceph v2 %64llx %64llx\n"
> >> 2. both in simle/async codes, server side must issue banner firstly
> >> 3. if server side supports v2 and client only supports v1, client will
> >> receive 9 bytes and do memcmp, then reject this connection via closing
> >> socket. So server side could retry the older version
> >> 4. if server side only supports v1 and client supports v2, client
> >> according banner to reply corresponding banner
> >>
> >> This tricky design is based on the implementation fact "accept side
> >> issue the banner firstly" and "new banner is longer than old banner",
> >> and this way doesn't need to involve other dependences like mon port
> >> changes.
> >>
> >> Does this way has problem?
> >
> > I was thinking we avoid this problem and any hacky initial handshakes by
> > speaking v2 on the new port and v1 on the old port.  Then the monmap has
> > an entity_addrvec_t with both a v1 and v2 address (encoding with just the
> > v1 address for old clients). Same for the OSDs.
> 
> This way is ok to me. So another change is double messenger
> instances(to v1 and v2) or let each messenger support multi binding
> addresses(this may need to refactor messenger interface).

Yeah.  I'm guessing we'll want to have an entity_addrvec_t with address 
types mapped to different Messenger implementations (e.g., xio), so we'll 
wan to allow multiple instance eventually.  But we'll also just want to
allow multiple binding (v1 + v2, or ipv4 + ipv6).  :/

sage

> 
> >
> > The v1 handshake just isn't extensible (how do you tell a v2 client
> > connecting that you speak both v1 and v2?).
> >
> > sage
> >
> >
> >>
> >>
> >> On Sat, Sep 10, 2016 at 11:37 AM, Haomai Wang <haomai@xxxxxxxx> wrote:
> >> >
> >> >
> >> > On Sat, Sep 10, 2016 at 5:14 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> >> >>
> >> >> On Sat, 10 Sep 2016, Haomai Wang wrote:
> >> >> > @sage in current impl, when logic fault like state mismatch, data format
> >> >> > mismatch or anything else, connection will abort session via closing
> >> >> > socket.
> >> >> > And the peer side would do something according to policy too.
> >> >> > In msgr v2 when introducing multi streams in the same connection, we
> >> >> > can't
> >> >> > simply abort socket to indicate something wrong now. I think we need to
> >> >> > introduce TAG_ABORT with error message.
> >> >> >
> >> >> > But the peer side may stuck into a state like reading enough data as
> >> >> > "length" indicate. It may miss the TAG_ABORT notify or other reconnect
> >> >> > tag.
> >> >> > A tricky thing is we use tcp OOB bit to send which exactly trigger
> >> >> > "urgent"
> >> >> > signal when receiving, but it only occur 1 byte in tcp proto which can
> >> >> > be
> >> >> > used here to indicate the stream id(32bit designed now).
> >> >> >
> >> >> > What's more, multi stream mixed within one socket may make trouble to
> >> >> > message receiving when potential tcp packet silent error. So it looks we
> >> >> > can't use the same socket the multi stream to meet our demands.
> >> >> >
> >> >> > Any idea?
> >> >>
> >> >> I think we intorduce a TAG_ABORT to interrupt the stream.  And then we
> >> >> have to assume that the low-level msgr2 implementation that reads and
> >> >> writes frames (which have their own frame_len) is not buggy.  In practice,
> >> >> the aborts tend to happen because we get a message we don't understand
> >> >> (version mismatch, encoding compatibility bug, etc.), and that'll happen
> >> >> at a higher level after frames have been read... so a TAG_ABORT will be
> >> >> sufficient.
> >> >
> >> >
> >> > yes, if frame is ok. It should be ok.... Let's go through this firstly...
> >> > The worse case is the frame length is not expected as data transferred.
> >> >
> >> >>
> >> >>
> >> >> Also, we can have an option to make aborts close the socket.  That'll be
> >> >> fine for now anyway, although later it's probably to disruptive when
> >> >> multiple streams are sharing a socket...
> >> >>
> >> >> sage
> >> >>
> >> >>
> >> >>  >
> >> >> >
> >> >> >
> >> >> > On Mon, Jun 13, 2016 at 7:59 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> >> >> >       On Sat, 11 Jun 2016, Marcus Watts wrote:
> >> >> >       > If the client doesn't look at "features" before it sends
> >> >> >       stuff, it
> >> >> >       > will not be able to be very smart about taking advantage of
> >> >> >       some
> >> >> >       > future better method.  In fact, there isn't much advantage
> >> >> >       > to the server sending anything early - it could just as easily
> >> >> >       > wait until after it's seen the clients request.
> >> >> >       >
> >> >> >       > Failing hard & retrying on a failed reconnect is going to be
> >> >> >       slower.
> >> >> >       > On the bright side, at least it shouldn't happen often.
> >> >> >
> >> >> >       Yep.  Well, I think it is the client's (limited choice).  If it
> >> >> >       needs to
> >> >> >       know the server features, it needs to either wait for them, or
> >> >> >       make some
> >> >> >       optimistic choice and be prepared to pay the cost of a mistake.
> >> >> >       We should
> >> >> >       give the client choice, though, if we can.
> >> >> >
> >> >> >       > If you're sending encryption (w/ different auth or keys) from
> >> >> >       several
> >> >> >       > different streams, how are you planning to indicate which bits
> >> >> >       > go with which scheme?, and which bits are you planning to
> >> >> >       encrypt
> >> >> >       > and which not?
> >> >> >
> >> >> >       This is what he stream ids are for, and why the outer portion of
> >> >> >       the frame
> >> >> >       is unencrypted.  See
> >> >> >
> >> >> >
> >> >> > https://github.com/ceph/ceph/pull/9461/files#diff-83789b4be697d82eedbcbe330
> >> >> >       c44b436R68
> >> >> >
> >> >> >        +  stream_id (le32)
> >> >> >        +  frame_len (le32)
> >> >> >        +  tag (TAG_* byte)
> >> >> >        +  payload
> >> >> >        +  [payload padding -- only present after stream auth phase]
> >> >> >        +  [signature -- only present after stream auth phase]
> >> >> >
> >> >> >       The tag and payload (and padding) would be encrypted or signed,
> >> >> >       but not
> >> >> >       the stream id and frame_len.
> >> >> >
> >> >> >       > Byte count limits.  Basically, you don't want collisions
> >> >> >       because
> >> >> >       > of duplicated keys or data.  This depends on your crypto
> >> >> >       system,
> >> >> >       > so, for instance, you should not encrypt with one key more
> >> >> >       than
> >> >> >       >       aes, cbc        about 2^68 bytes
> >> >> >       >       aes, ctr        exactly 2^128 bytes
> >> >> >       > more generally, this depends on mode, blocksize, ...
> >> >> >       > This applies across *all* uses of the key - and so you would
> >> >> >       > generally want to use the session key directly as little as
> >> >> >       possible.
> >> >> >       > (in particular, using the session key for ctr directly would
> >> >> >       be very very bad.)
> >> >> >       >
> >> >> >       > If you've got multiple streams going already, you should be
> >> >> >       able
> >> >> >       > to include a fairly simple rekey method with little effort.
> >> >> >       > For instance, as part of the method, you could,
> >> >> >       >       up front as part of the method
> >> >> >       >               send a per-stream key encrypted under the shared
> >> >> >       secret.
> >> >> >       >       prepend to the first data sent in a payload
> >> >> >       >               byte limit, stream key #0 (encrypted under the
> >> >> >       per-stream key)
> >> >> >       >               then encrypt the next N bytes with stream key #0
> >> >> >       >       when the byte limit is reached, prepend to the
> >> >> >       >               next data sent in a payload
> >> >> >       >               byte limit, stream key #1 (encrypted under the
> >> >> >       per-stream key)
> >> >> >       >               then encrypt the next N bytes with stream key #1
> >> >> >       >       &etc.
> >> >> >
> >> >> >       Good idea.  If I understand correctly, it means that the
> >> >> >       session_key is
> >> >> >       only used to send the new/next random encryption key, and if we
> >> >> >       make the
> >> >> >       byte limit part of the initial protocol we get the rotation we
> >> >> >       need.  It
> >> >> >       might be simpler to do it as a frame limit instead of byte
> >> >> >       limit, and
> >> >> >       assume max-length frames (2^32 bytes).  We could still be super
> >> >> >       conservative and rotate the encryption key every 2^16 messages
> >> >> >       or
> >> >> >       something...?  And rotating the key on frame boundaries should
> >> >> >       be much
> >> >> >       simpler to implement.
> >> >> >
> >> >> >       Anyway, that part can be defined a bit later, I think.
> >> >> >
> >> >> >       Thanks!
> >> >> >       sage
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html