Re: Random heartbeat_map timed out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Seena,

one of the frequent cause for such a timeout is slow RocksDB operationing. Which in turn might be caused by bluefs_buffered_io set to false and/or DB "fragmentation" after massive data removal.

Hence the potential workarounds are adjusting bluefs_buffered_io and manual RocksDB compaction.

This topic has been discussed in this mailing list and relevant tickets multiple times.


Thanks,

Igor

On 12/23/2020 3:24 PM, Seena Fallah wrote:
Hi,

All my OSD nodes in the SSD tier are getting heartbeat_map timed out
randomly and I don't find why!

7ff2ed3f2700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread
0x7ff2c8943700' had timed out after 15

It occurs many times in a day and causes my cluster to be down.

Is there any way to find why the OSDs get time out? I don't think it's
because of heartbeat and there is an issue with OSD that came to the
heartbeat to be timeout because ODSs don't suicide and OSDs get too slow
and cause downtime on RBD and S3 gateway because the queue is full!

Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux