Re: OSDs stuck in heartbeat_map is_healthy "suicide timed out" infinite loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just in case somebody runs into a similar issue:

I my case a manual compaction was needed, apparently. To do that I added the following to my ceph.conf and restarted the OSDs in question:

[osd]
        osd_compact_on_start = true

(apparently osd_compact_on_start has to be in ceph.conf because compaction is initiated before the OSD is connected to the monitors)

Vlad

On 4/25/22 12:47, Vladimir Brik wrote:
Hello

I have 3 OSDs that are stuck in a perpetual loop of

heartbeat_map is_healthy ... had timed out after 15.000000954s
<repeated many, many times>
heartbeat_map is_healthy ... had suicide timed out after 150.000000000s
*** Caught signal (Aborted) **

This began happening some time after I had moved a pool off these OSDs. Now the pools that still use these 3 OSDs are in trouble and I don't know how to resolve this situation. I am running 16.2.7.

Can anybody help?

Not sure if it's relevant but these OSDs have a custom device class "nvme", so there is this line in the logs: 7fe6308fc080 -1 osd.74 15842 mon_cmd_maybe_osd_create fail: 'osd.74 has already bound to class 'nvme', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class
first': (16) Device or resource busy

I tried to set the following in ceph.conf, but it didn't seem to make a difference.
[osd]
         osd_max_scrubs = 0
         osd_heartbeat_grace = 200
         osd_scrub_thread_suicide_timeout = 600
         osd_op_thread_suicide_timeout = 1500
         osd_command_thread_suicide_timeout = 9000


Thanks,

Vlad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux