Re: msgr2 protocol

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 12, 2016 at 1:05 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Sat, 10 Sep 2016, Haomai Wang wrote:
>> About thing is v1/v2 compatible. I rethink the details:
>>
>> 0. we need to define the new banner which must longer than before("ceph v027")
>> 1. assume msgr v2 banner is "ceph v2 %64llx %64llx\n"
>> 2. both in simle/async codes, server side must issue banner firstly
>> 3. if server side supports v2 and client only supports v1, client will
>> receive 9 bytes and do memcmp, then reject this connection via closing
>> socket. So server side could retry the older version
>> 4. if server side only supports v1 and client supports v2, client
>> according banner to reply corresponding banner
>>
>> This tricky design is based on the implementation fact "accept side
>> issue the banner firstly" and "new banner is longer than old banner",
>> and this way doesn't need to involve other dependences like mon port
>> changes.
>>
>> Does this way has problem?
>
> I was thinking we avoid this problem and any hacky initial handshakes by
> speaking v2 on the new port and v1 on the old port.  Then the monmap has
> an entity_addrvec_t with both a v1 and v2 address (encoding with just the
> v1 address for old clients). Same for the OSDs.

This way is ok to me. So another change is double messenger
instances(to v1 and v2) or let each messenger support multi binding
addresses(this may need to refactor messenger interface).

>
> The v1 handshake just isn't extensible (how do you tell a v2 client
> connecting that you speak both v1 and v2?).
>
> sage
>
>
>>
>>
>> On Sat, Sep 10, 2016 at 11:37 AM, Haomai Wang <haomai@xxxxxxxx> wrote:
>> >
>> >
>> > On Sat, Sep 10, 2016 at 5:14 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> >>
>> >> On Sat, 10 Sep 2016, Haomai Wang wrote:
>> >> > @sage in current impl, when logic fault like state mismatch, data format
>> >> > mismatch or anything else, connection will abort session via closing
>> >> > socket.
>> >> > And the peer side would do something according to policy too.
>> >> > In msgr v2 when introducing multi streams in the same connection, we
>> >> > can't
>> >> > simply abort socket to indicate something wrong now. I think we need to
>> >> > introduce TAG_ABORT with error message.
>> >> >
>> >> > But the peer side may stuck into a state like reading enough data as
>> >> > "length" indicate. It may miss the TAG_ABORT notify or other reconnect
>> >> > tag.
>> >> > A tricky thing is we use tcp OOB bit to send which exactly trigger
>> >> > "urgent"
>> >> > signal when receiving, but it only occur 1 byte in tcp proto which can
>> >> > be
>> >> > used here to indicate the stream id(32bit designed now).
>> >> >
>> >> > What's more, multi stream mixed within one socket may make trouble to
>> >> > message receiving when potential tcp packet silent error. So it looks we
>> >> > can't use the same socket the multi stream to meet our demands.
>> >> >
>> >> > Any idea?
>> >>
>> >> I think we intorduce a TAG_ABORT to interrupt the stream.  And then we
>> >> have to assume that the low-level msgr2 implementation that reads and
>> >> writes frames (which have their own frame_len) is not buggy.  In practice,
>> >> the aborts tend to happen because we get a message we don't understand
>> >> (version mismatch, encoding compatibility bug, etc.), and that'll happen
>> >> at a higher level after frames have been read... so a TAG_ABORT will be
>> >> sufficient.
>> >
>> >
>> > yes, if frame is ok. It should be ok.... Let's go through this firstly...
>> > The worse case is the frame length is not expected as data transferred.
>> >
>> >>
>> >>
>> >> Also, we can have an option to make aborts close the socket.  That'll be
>> >> fine for now anyway, although later it's probably to disruptive when
>> >> multiple streams are sharing a socket...
>> >>
>> >> sage
>> >>
>> >>
>> >>  >
>> >> >
>> >> >
>> >> > On Mon, Jun 13, 2016 at 7:59 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> >> >       On Sat, 11 Jun 2016, Marcus Watts wrote:
>> >> >       > If the client doesn't look at "features" before it sends
>> >> >       stuff, it
>> >> >       > will not be able to be very smart about taking advantage of
>> >> >       some
>> >> >       > future better method.  In fact, there isn't much advantage
>> >> >       > to the server sending anything early - it could just as easily
>> >> >       > wait until after it's seen the clients request.
>> >> >       >
>> >> >       > Failing hard & retrying on a failed reconnect is going to be
>> >> >       slower.
>> >> >       > On the bright side, at least it shouldn't happen often.
>> >> >
>> >> >       Yep.  Well, I think it is the client's (limited choice).  If it
>> >> >       needs to
>> >> >       know the server features, it needs to either wait for them, or
>> >> >       make some
>> >> >       optimistic choice and be prepared to pay the cost of a mistake.
>> >> >       We should
>> >> >       give the client choice, though, if we can.
>> >> >
>> >> >       > If you're sending encryption (w/ different auth or keys) from
>> >> >       several
>> >> >       > different streams, how are you planning to indicate which bits
>> >> >       > go with which scheme?, and which bits are you planning to
>> >> >       encrypt
>> >> >       > and which not?
>> >> >
>> >> >       This is what he stream ids are for, and why the outer portion of
>> >> >       the frame
>> >> >       is unencrypted.  See
>> >> >
>> >> >
>> >> > https://github.com/ceph/ceph/pull/9461/files#diff-83789b4be697d82eedbcbe330
>> >> >       c44b436R68
>> >> >
>> >> >        +  stream_id (le32)
>> >> >        +  frame_len (le32)
>> >> >        +  tag (TAG_* byte)
>> >> >        +  payload
>> >> >        +  [payload padding -- only present after stream auth phase]
>> >> >        +  [signature -- only present after stream auth phase]
>> >> >
>> >> >       The tag and payload (and padding) would be encrypted or signed,
>> >> >       but not
>> >> >       the stream id and frame_len.
>> >> >
>> >> >       > Byte count limits.  Basically, you don't want collisions
>> >> >       because
>> >> >       > of duplicated keys or data.  This depends on your crypto
>> >> >       system,
>> >> >       > so, for instance, you should not encrypt with one key more
>> >> >       than
>> >> >       >       aes, cbc        about 2^68 bytes
>> >> >       >       aes, ctr        exactly 2^128 bytes
>> >> >       > more generally, this depends on mode, blocksize, ...
>> >> >       > This applies across *all* uses of the key - and so you would
>> >> >       > generally want to use the session key directly as little as
>> >> >       possible.
>> >> >       > (in particular, using the session key for ctr directly would
>> >> >       be very very bad.)
>> >> >       >
>> >> >       > If you've got multiple streams going already, you should be
>> >> >       able
>> >> >       > to include a fairly simple rekey method with little effort.
>> >> >       > For instance, as part of the method, you could,
>> >> >       >       up front as part of the method
>> >> >       >               send a per-stream key encrypted under the shared
>> >> >       secret.
>> >> >       >       prepend to the first data sent in a payload
>> >> >       >               byte limit, stream key #0 (encrypted under the
>> >> >       per-stream key)
>> >> >       >               then encrypt the next N bytes with stream key #0
>> >> >       >       when the byte limit is reached, prepend to the
>> >> >       >               next data sent in a payload
>> >> >       >               byte limit, stream key #1 (encrypted under the
>> >> >       per-stream key)
>> >> >       >               then encrypt the next N bytes with stream key #1
>> >> >       >       &etc.
>> >> >
>> >> >       Good idea.  If I understand correctly, it means that the
>> >> >       session_key is
>> >> >       only used to send the new/next random encryption key, and if we
>> >> >       make the
>> >> >       byte limit part of the initial protocol we get the rotation we
>> >> >       need.  It
>> >> >       might be simpler to do it as a frame limit instead of byte
>> >> >       limit, and
>> >> >       assume max-length frames (2^32 bytes).  We could still be super
>> >> >       conservative and rotate the encryption key every 2^16 messages
>> >> >       or
>> >> >       something...?  And rotating the key on frame boundaries should
>> >> >       be much
>> >> >       simpler to implement.
>> >> >
>> >> >       Anyway, that part can be defined a bit later, I think.
>> >> >
>> >> >       Thanks!
>> >> >       sage
>> >> >
>> >> >
>> >> >
>> >> >
>> >
>> >
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux