Re: Mismatching nonce for 'ceph osd.0 tell'

kefu chai <tchaikov@xxxxxxxxx> · Thu, 8 Dec 2016 18:03:19 +0800

On Tue, Oct 4, 2016 at 7:57 PM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:
> On 3-10-2016 19:50, Gregory Farnum wrote:
>>> Question here is:
>>>   If I ask 'ceph osd dump', I'm actually asking ceph-mon.
>>>   And cehp-mon has learned this from (crush?)maps being sent to it by
>>>   ceph-osd.
>>
>> The monitor has learned about specific IP addresses/nonces/etc via
>> MOSDBoot messages from the OSDs. The crush locations are set via
>> monitor command messages, generally invoked as part of the init
>> scripts. Maps are generated entirely on the monitor. :)
>>
>>> Is there an easy way to debug/monitor the content of what ceph-osd sends
>>> and ceph-mon receives in the maps?
>>> Just to make sure that it is clear where the problem occurs.
>>
>> You should be able to see the info going in and out by bumping the
>> debug levels up — every message's "print" function is invoked when
>> it's sent/received as long as you have "debug ms = 1". It looks like
>> the MOSDBoot message doesn't natively dump its addresses but you can
>> add them easily if you need to.
>
> Hi Greg,
>
> Thanx for the answer....
>
> I've got debug_ms already pumped up all the way to 20.
> So I do get to see what addresses are selected during bind. But still
> they do not end up at the MON, and 'ceph osd dump' reports:
>         :/0
> as bind address.
>
> I'm going to add some more debugs to actually see what MOSDBoot is doing....

there are multiple messengers used by ceph-osd, the one connected by
rados client is the external/public messenger. it is also used by osd
to talk with the monitor.

the nonce of the external address of an OSD does not change after it's
up: it's always the pid of ceph-osd process. and the (peer) address of
the booting OSD collected by monitor comes from the connection's
peer_addr field, which is set when the monitor accepts the connection
from OSD. see STATE_ACCEPTING_WAIT_BANNER_ADDR case block in
AsyncConnection::_process_connection().

but there are chances that an OSD is restarted and fail to bind its
external messenger to the specified the port. in that case, ceph-osd
will try with another port, but keep the nonce the same. but when it
comes to other messengers used by ceph-osd, their nonces increase by
1000000 every time they rebind. that's why "ceph osd thrash" can
change the nonces of the cluster_addr, heartbeat_back_addr and
heartbeat_front_addr. the PR of
https://github.com/ceph/ceph/pull/11706 actually changes the behavior
of the messengers of these three messengers. and it has nothing to do
with the external messenger to which the ceph cli client is
connecting.

so you might want to check
1) how/why the nonce of the messenger in MonClient is 1000000 + $pid
2) while the nonce of the same messenger is $pid when the ceph cli
connects to it.

my PR of https://github.com/ceph/ceph/pull/11804 is more of a cleanup.
it avoids setting the nonce before the rebind finishes. and i tried
with your producer on my linux box, no luck =(

>
> It looks like I've tackled most of the EventKqueue forking trouble, by
> keeping a full administration.
> Got to make some more FreeBSD tests to see if I'm actually solving what
> I think is the problem. :)
>
> --WjW
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Regards
Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html