Re: [GSoC] wrong patch "msg: change entity_addr_t encode/decode"

Haomai Wang <haomai@xxxxxxxx> · Thu, 16 Jun 2016 09:04:52 +0800

On Thu, Jun 16, 2016 at 5:17 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> I've rebased this and pushed a new wip-addrvec branch, PR
>
>         https://github.com/ceph/ceph/pull/9726
>
> I wonder if a good way to ease into using this would be to make
> AsyncMessenger allow binding to multiple addresses (IPv4 and IPv6).
>
> Haomai, what do you think?
>
> That will help us fix the multiple-address issues in msg/async (for
> example, announcing ourselves using the correct addr depending on which
> socket we accepted the connection on).  That will be one less thing in the
> way to making msg/async speak msgr2 on some connections.

Hmm, you mean in multi osd instance example, we need to make each osd
instance own a bind address?

What's the problem if they use the same address, I guess the reason
should be we have existing codes rely on this independent address
potentially.

>
>
> I started working on another branch that cleans up and refactors some of
> the authentication callbacks from the msgr into osd/mds/mon code so that
> we can accomodate both msgr1 and msgr2 authentication.
>
>         https://github.com/ceph/ceph/pull/9727
>
> A bit more cleanup still to do there, but soon I think we'll be to the
> point where we need the msg/async half that implements the protocol so we
> can wire the two together.

ok...

>
>
> sage
>
>
> On Sat, 11 Jun 2016, Junwang Zhao wrote:
>
> > Hi Sage,
> >
> > I have modified the patch to make it PASS the unittest, you may want
> > to have a look and cherry-pick the updated commit[0].
> >
> > Thanks,
> > Zhao
> >
> > [0] https://github.com/zhjwpku/ceph/commit/b1f1183e3e2f39bbdbfa326d0dfc6f9b95611321
> >
> > On Sat, Jun 11, 2016 at 1:07 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > > On Fri, 10 Jun 2016, Junwang Zhao wrote:
> > >> Hi Sage,
> > >>
> > >> Updated and pushed to another branch, see [0], the 'make check'
> > >> shows that the test cases FAILS, could you tell me how to debug
> > >> these unittest? I hope to see the result using "cout" but I don't know
> > >> where those output goes to :(
> > >>
> > >> [0]https://github.com/zhjwpku/ceph/commit/e7307deae01fd6b675350fdae9c493eae04fdde9
> > >
> > > I fixed a problem in the messengesr that was breaking the addr learning,
> > > and thus a bunch of the unit tests that try to run test clusters.  I also
> > > rebased it on master.  Pushed an updated wip-addr-more to my git tree
> > > (liewegas/ceph)!
> > >
> > > sage
> > >
> > >>
> > >> Thanks,
> > >> Zhao
> > >>
> > >> On Fri, Jun 10, 2016 at 10:37 AM, Junwang Zhao <zhjwpku@xxxxxxxxx> wrote:
> > >> > Hi Sage,
> > >> >
> > >> > I have go a little bit further, the cases you mentioned really make sense,
> > >> > please see [0] and make some comments, I am still not good at writting
> > >> > the unittest.
> > >> >
> > >> > [0]https://github.com/zhjwpku/ceph/commit/4da16085a650f1dd0b743560836fdf213507d3b4
> > >> >
> > >> > Thanks!
> > >> > Zhao
> > >> >
> > >> > On Fri, Jun 10, 2016 at 12:10 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > >> >> On Wed, 8 Jun 2016, Junwang Zhao wrote:
> > >> >>> Hi Sage,
> > >> >>>
> > >> >>> Write some unittest, but seems not right, see [0], give some comments
> > >> >>> please, I use `cout` to print the value, but I don't know where to see them.
> > >> >>>
> > >> >>> I got a lot of other "FAIL" in the wip-addr-more branch, like:
> > >> >>>   ceph-detect-init/run-tox.sh
> > >> >>>   ceph-disk/run-tox.sh
> > >> >>>   test/run-rbd-unit-tests.sh
> > >> >>> I am wondering is this because my working environment? My laptop
> > >> >>> is really not very powerful, "i5-3210M CPU, 8G memory" :(
> > >> >>> make check is quite slow.
> > >> >>>
> > >> >>> [0] https://github.com/zhjwpku/ceph/commit/e15a240d56c2b67a8b0906dc80650a91bff1518f
> > >> >>
> > >> >> I'm traveling today and not able to replicate this yet--I'll take a look
> > >> >> tomorrow.
> > >> >>
> > >> >> In the meantime, though, it doesn't look like your new tests are doing
> > >> >> quite what I think we need them to.  I think the important cases are:
> > >> >>
> > >> >> 1- make sure that any encoded entity_addr_t can be decoded as a
> > >> >> entity_addrvec_t, with size 1, where item 0 matches the addr.  Can use the
> > >> >> existing arrays of addrs to parse to test this.
> > >> >>
> > >> >> 2- make sure that in all of those cases, if we encode the entity_addrvec_t
> > >> >> with features 0, and decode as an entity_addr_t, we get back the same
> > >> >> entity_addr_t.
> > >> >>
> > >> >> 3- the addrvec size > 1 cases:
> > >> >>
> > >> >> - populate an addrvec with multiple addrs where one is legacy and others
> > >> >> are not, and ensure that encoding with features 0 and decoding as an addr
> > >> >> gives back the legacy item.
> > >> >>
> > >> >> - multiple legacy addrs with features 0 should encode the first one in the
> > >> >> list
> > >> >>
> > >> >> - a vec with all non-legacy addrs encoded with features 0 and decoded as
> > >> >> addr should give back an addr that is not entity_addr_t() but is
> > >> >> unusable/bogus.
> > >> >>
> > >> >> 4- the empty addrvec case:
> > >> >>
> > >> >> - encode with features 0 and decoding as addr should give back
> > >> >> entity_addr_t().
> > >> >>
> > >> >> I think that's it?
> > >> >>
> > >> >> sage
> > >> >>
> > >> >>
> > >> >>>
> > >> >>> On Mon, Jun 6, 2016 at 4:28 PM, Junwang Zhao <zhjwpku@xxxxxxxxx> wrote:
> > >> >>> > Hi, Sage,
> > >> >>> >
> > >> >>> > Sorry I did not see your reply util now, I will work on it.
> > >> >>> >
> > >> >>> > Thanks!
> > >> >>> > Zhao
> > >> >>> >
> > >> >>> > On Fri, Jun 3, 2016 at 1:46 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > >> >>> >> On Thu, 2 Jun 2016, Sage Weil wrote:
> > >> >>> >>> Hi Zhao,
> > >> >>> >>>
> > >> >>> >>> On Tue, 31 May 2016, Junwang Zhao wrote:
> > >> >>> >>> > Hi Sage & Haomai,
> > >> >>> >>> >
> > >> >>> >>> > I try to run './vstart.sh -d -x -n' to test the patches today, I got
> > >> >>> >>> > some error like 'wrong node', after some work, I found the
> > >> >>> >>> > error was introduced by this patch[0]. I change the code to
> > >> >>> >>> > use the legacy decode and encode, but got stuck somewhere.
> > >> >>> >>> >
> > >> >>> >>> > --- a/src/msg/msg_types.h
> > >> >>> >>> > +++ b/src/msg/msg_types.h
> > >> >>> >>> > @@ -371,7 +371,7 @@ struct entity_addr_t {
> > >> >>> >>> >    // broader study
> > >> >>> >>> >
> > >> >>> >>> >    void encode(bufferlist& bl, uint64_t features) const {
> > >> >>> >>> > -    if ((features & CEPH_FEATURE_MSG_ADDR2) == 0) {
> > >> >>> >>> > +    if ((features & CEPH_FEATURE_MSG_ADDR2) != 0) {
> > >> >>> >>> >        ::encode((__u32)0, bl);
> > >> >>> >>> >        ::encode(nonce, bl);
> > >> >>> >>> >        sockaddr_storage ss = get_sockaddr_storage();
> > >> >>> >>> >
> > >> >>> >>> >
> > >> >>> >>> > Can you please see the patch again and give some comments?
> > >> >>> >>> >
> > >> >>> >>> > [0]https://github.com/zhjwpku/ceph/commit/8934f038fdf28156038770e601e4a27f7a684390
> > >> >>> >>>
> > >> >>> >>> This was enough to make things work fo rme:
> > >> >>> >>>
> > >> >>> >>> diff --git a/src/msg/msg_types.h b/src/msg/msg_types.h
> > >> >>> >>> index 9c521e6..0da0614 100644
> > >> >>> >>> --- a/src/msg/msg_types.h
> > >> >>> >>> +++ b/src/msg/msg_types.h
> > >> >>> >>> @@ -199,8 +199,7 @@ WRITE_CLASS_ENCODER(ceph_sockaddr_storage)
> > >> >>> >>>
> > >> >>> >>>  struct entity_addr_t {
> > >> >>> >>>    typedef enum {
> > >> >>> >>> -    TYPE_NONE = 0,
> > >> >>> >>> -    TYPE_LEGACY = 1,
> > >> >>> >>> +    TYPE_LEGACY = 0,
> > >> >>> >>>    } type_t;
> > >> >>> >>>
> > >> >>> >>>    __u32 type;
> > >> >>> >>> @@ -211,7 +210,7 @@ struct entity_addr_t {
> > >> >>> >>>      sockaddr_in6 sin6;
> > >> >>> >>>    } u;
> > >> >>> >>>
> > >> >>> >>> -  entity_addr_t() : type(0), nonce(0) {
> > >> >>> >>> +  entity_addr_t() : type(TYPE_LEGACY), nonce(0) {
> > >> >>> >>>      memset(&u, 0, sizeof(u));
> > >> >>> >>>    }
> > >> >>> >>>    explicit entity_addr_t(const ceph_entity_addr &o) {
> > >> >>> >>>
> > >> >>> >>> but I think this is not the right fix.  I think we still want
> > >> >>> >>> TYPE_NONE, TYPE_LEGACY, and (soon) TYPE_MSGR2, and that the core problem
> > >> >>> >>> is places that are creating addrs aren't setting the type properly.  Going
> > >> >>> >>> to play with this a bit...
> > >> >>> >>
> > >> >>> >> I pushed a wip-addr-more branch:
> > >> >>> >>
> > >> >>> >>         https://github.com/liewegas/ceph/commits/wip-addr-more
> > >> >>> >>
> > >> >>> >> This fixes the parser to take type prefixes, default to legacy, and
> > >> >>> >> changes entity_addr_t() so that it renders are "-" instead of ":/0" (which
> > >> >>> >> was weird anyway).
> > >> >>> >>
> > >> >>> >> I think what we need next are some unit tests in test/test_addrs.cc that
> > >> >>> >> do things like encode a entity_addr_t with no features, decode as
> > >> >>> >> entity_addrvec_t, or construct varous addrvecs, encode with no features,
> > >> >>> >> an ensure a decode to entity_addr_t gives back the right addr from the
> > >> >>> >> list.
> > >> >>> >>
> > >> >>> >> sage
> > >> >>> --
> > >> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > >> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >>>
> > >> >>>
> > >>
> > >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html