Random heartbeat_map timed out

Seena Fallah <seenafallah@xxxxxxxxx> · Wed, 23 Dec 2020 15:54:02 +0330

Hi,

All my OSD nodes in the SSD tier are getting heartbeat_map timed out
randomly and I don't find why!

7ff2ed3f2700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread
0x7ff2c8943700' had timed out after 15

It occurs many times in a day and causes my cluster to be down.

Is there any way to find why the OSDs get time out? I don't think it's
because of heartbeat and there is an issue with OSD that came to the
heartbeat to be timeout because ODSs don't suicide and OSDs get too slow
and cause downtime on RBD and S3 gateway because the queue is full!

Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx