Re: High MON cpu usage when cluster is changing

Xiaoxi Chen <superdebuger@xxxxxxxxx> · Sat, 14 Apr 2018 22:39:35 +0800

Looks like crush_hash_32_3 is from prime_pg_temp and the bufferptr
stuff is from reencoding compact osdmap.

With the fact that client version is not easy to upgrade,  what would
be the best workaround for this issue ? should I add more monitors
into quorum so that clients load can be more distributed?

2018-04-14 13:09 GMT+08:00 Xiaoxi Chen <superdebuger@xxxxxxxxx>:
> Thanks Sage,  that make sense, most of our clients are pre-luminous ,
> jewel taking 83% percent and rest are mostly hammer.
> But we dont enable balancer , nor do `reweight by utilization` at all.
>
> I just `ceph osd down osd.0~9` , the osd.9 take a very long time to
> _boot, during that period, the `crush_hash32_3` looks like the biggest
> and reach to 30%+, and also a unnamed function ` 0x00000038e9a` also
> takes 20%.
>
> sometimes buffer::ptr also takes  20%
> 21.69%  ceph-mon              [.]
> ceph::buffer::ptr::ptr(ceph::buffer::ptr const&, unsigned int,
> unsigned int)                                                       ◆
> 20.60%  ceph-mon              [.] ceph::buffer::ptr::copy_out(unsigned
> int, unsigned int, char*) const
>                        ▒
> 15.77%  ceph-mon              [.] ceph::buffer::ptr::release()
>
>
>
> From 'top -H `-p pid of mon` ' ,  pipe_writers are full of the list.
>
> xiaoxi
>
> 2018-04-14 6:05 GMT+08:00 Sage Weil <sweil@xxxxxxxxxx>:
>> On Sat, 14 Apr 2018, Xiaoxi Chen wrote:
>>> Hi,
>>>
>>>     we are consistently seeing this issue after upgrading to luminous
>>> from jewel  , the behavior looks like monitor cannot handle
>>> mon_subscribe from client fast enough, then we see
>>>
>>>     high cpu (1600% +  with simple messenger) for monitor
>>>     cluster pg state changing slowly as OSDs cannot get latest map fast enough.
>>>     in some cases like reboot an OSD node( 24 OSDs per node) can cause
>>> even bigger impact, OSDs even cannot update their auth in time and
>>> after a while we saw massive OSDs been marked down due to heartbeat
>>> failure, like
>>>        2018-04-11 21:19:24.772558 7f6bbb7f5700  0 cephx server
>>> osd.234:  unexpected key: req.key=690bba2ca98774a2
>>> expected_key=f63feaae2014a837
>>> 2018-04-11 21:19:26.539295 7f6bbb7f5700  0 cephx server osd.365:
>>> unexpected key: req.key=a0eb995e1bef1bf4 expected_key=bafe2e4d55a63478
>>>
>>>    There are a bit more details about the attempts we have made , in
>>> the ticket  http://tracker.ceph.com/issues/23713.
>>>
>>>    Any suggestion is much appreciated. Thanks.
>>
>> My guess is that this is the compat reencoding of the OSDMap for the
>> pre-luminous clients.
>>
>> Are you by chance making use of the crush-compat balancer? That would
>> additionally require a reencoded crush map.
>>
>> Can you do a 'perf top -p `pidof ceph-mon`' while this is happening to see
>> where the time is being spent?
>>
>> Thanks!
>> sage
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html