On Tue, Sep 13, 2016 at 6:59 AM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote: > Hi > > When running cephtool-test-mon.sh, part of it executes: > ceph tell osd.0 version > I see reports on the commandline, I guess that this is the OSD > complaining that things are wrong: > > 2016-09-12 23:50:39.239037 814e50e00 0 -- 127.0.0.1:0/1925715881 >> > 127.0.0.1:6800/26384 conn(0x814fde800 sd=18 :-1 > s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 > l=1)._process_connection connect claims to be 127.0.0.1:6800/1026384 not > 127.0.0.1:6800/26384 - wrong node! > > Which it will run until it is shot down.... after 3600 secs. > > the nonce is incremented with 1000000 on every rebind. > > But what I do not understand is how this mismatch has occurred. > I would expect port 6800 to be the port on which the OSD is connected > too, so the connecting party (ceph in this case) thinks the nonce to be > 1026384. Did the MON have this information? And where did the MON then > get it from.... > > Somewhere one of the parts did not receive the new nonce, or did not > also increment it? nonce is a part of ceph_entity_addr, so OSDMap will take this > > Any suggestions welcomed on directions where to look, > > Thanx, > --WjW > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html