Re: OSDs stuck in heartbeat_map is_healthy "suicide timed out" infinite loop

Igor Fedotov <igor.fedotov@xxxxxxxx> · Wed, 27 Apr 2022 19:49:53 +0300

Hi Vladimir,

just try manual DB compaction through ceph-kvstore-tool for these OSD.

Highly likely it's a known issue with DB performance drop after bulk 
data removal.

Thanks,

Igor

On 4/25/2022 8:47 PM, Vladimir Brik wrote:
Hello

I have 3 OSDs that are stuck in a perpetual loop of

heartbeat_map is_healthy ... had timed out after 15.000000954s
<repeated many, many times>
heartbeat_map is_healthy ... had suicide timed out after 150.000000000s
*** Caught signal (Aborted) **

This began happening some time after I had moved a pool off these 
OSDs. Now the pools that still use these 3 OSDs are in trouble and I 
don't know how to resolve this situation. I am running 16.2.7.

Can anybody help?

Not sure if it's relevant but these OSDs have a custom device class 
"nvme", so there is this line in the logs:
7fe6308fc080 -1 osd.74 15842 mon_cmd_maybe_osd_create fail: 'osd.74 
has already bound to class 'nvme', can not reset class to 'ssd'; use 
'ceph osd crush rm-device-class <id>' to remove old class
first': (16) Device or resource busy

I tried to set the following in ceph.conf, but it didn't seem to make 
a difference.
[osd]
        osd_max_scrubs = 0
        osd_heartbeat_grace = 200
        osd_scrub_thread_suicide_timeout = 600
        osd_op_thread_suicide_timeout = 1500
        osd_command_thread_suicide_timeout = 9000

Thanks,

Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx