Hi, All my OSD nodes in the SSD tier are getting heartbeat_map timed out randomly and I don't find why! 7ff2ed3f2700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7ff2c8943700' had timed out after 15 It occurs many times in a day and causes my cluster to be down. Is there any way to find why the OSDs get time out? I don't think it's because of heartbeat and there is an issue with OSD that came to the heartbeat to be timeout because ODSs don't suicide and OSDs get too slow and cause downtime on RBD and S3 gateway because the queue is full! Thanks. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx