Re: High MON cpu usage when cluster is changing

Xiaoxi Chen <superdebuger@xxxxxxxxx> · Sat, 14 Apr 2018 13:09:52 +0800

Thanks Sage,  that make sense, most of our clients are pre-luminous ,
jewel taking 83% percent and rest are mostly hammer.
But we dont enable balancer , nor do `reweight by utilization` at all.

I just `ceph osd down osd.0~9` , the osd.9 take a very long time to
_boot, during that period, the `crush_hash32_3` looks like the biggest
and reach to 30%+, and also a unnamed function ` 0x00000038e9a` also
takes 20%.

sometimes buffer::ptr also takes  20%
21.69%  ceph-mon              [.]
ceph::buffer::ptr::ptr(ceph::buffer::ptr const&, unsigned int,
unsigned int)                                                       ◆
20.60%  ceph-mon              [.] ceph::buffer::ptr::copy_out(unsigned
int, unsigned int, char*) const
                       ▒
15.77%  ceph-mon              [.] ceph::buffer::ptr::release()

>From 'top -H `-p pid of mon` ' ,  pipe_writers are full of the list.

xiaoxi

2018-04-14 6:05 GMT+08:00 Sage Weil <sweil@xxxxxxxxxx>:
> On Sat, 14 Apr 2018, Xiaoxi Chen wrote:
>> Hi,
>>
>>     we are consistently seeing this issue after upgrading to luminous
>> from jewel  , the behavior looks like monitor cannot handle
>> mon_subscribe from client fast enough, then we see
>>
>>     high cpu (1600% +  with simple messenger) for monitor
>>     cluster pg state changing slowly as OSDs cannot get latest map fast enough.
>>     in some cases like reboot an OSD node( 24 OSDs per node) can cause
>> even bigger impact, OSDs even cannot update their auth in time and
>> after a while we saw massive OSDs been marked down due to heartbeat
>> failure, like
>>        2018-04-11 21:19:24.772558 7f6bbb7f5700  0 cephx server
>> osd.234:  unexpected key: req.key=690bba2ca98774a2
>> expected_key=f63feaae2014a837
>> 2018-04-11 21:19:26.539295 7f6bbb7f5700  0 cephx server osd.365:
>> unexpected key: req.key=a0eb995e1bef1bf4 expected_key=bafe2e4d55a63478
>>
>>    There are a bit more details about the attempts we have made , in
>> the ticket  http://tracker.ceph.com/issues/23713.
>>
>>    Any suggestion is much appreciated. Thanks.
>
> My guess is that this is the compat reencoding of the OSDMap for the
> pre-luminous clients.
>
> Are you by chance making use of the crush-compat balancer? That would
> additionally require a reencoded crush map.
>
> Can you do a 'perf top -p `pidof ceph-mon`' while this is happening to see
> where the time is being spent?
>
> Thanks!
> sage
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html