Re: Mismatching nonce for 'ceph osd.0 tell'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 13, 2016 at 6:59 AM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:
> Hi
>
> When running  cephtool-test-mon.sh, part of it executes:
>   ceph tell osd.0 version
> I see reports on the commandline, I guess that this is the OSD
> complaining that things are wrong:
>
> 2016-09-12 23:50:39.239037 814e50e00  0 -- 127.0.0.1:0/1925715881 >>
> 127.0.0.1:6800/26384 conn(0x814fde800 sd=18 :-1
> s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0
> l=1)._process_connection connect claims to be 127.0.0.1:6800/1026384 not
> 127.0.0.1:6800/26384 - wrong node!
>
> Which it will run until it is shot down.... after 3600 secs.
>
> the nonce is incremented with 1000000 on every rebind.
>
> But what I do not understand is how this mismatch has occurred.
> I would expect port 6800 to be the port on which the OSD is connected
> too, so the connecting party (ceph in this case) thinks the nonce to be
> 1026384. Did the MON have this information? And where did the MON then
> get it from....
>
> Somewhere one of the parts did not receive the new nonce, or did not
> also increment it?

nonce is a part of ceph_entity_addr, so OSDMap will take this

>
> Any suggestions welcomed on directions where to look,
>
> Thanx,
> --WjW
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux