Tuning osd hearbeat interval and grace period

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am wondering if anyone has had experience tuning the following options to get faster failure detection of a storage node:
- osd heartbeat interval (default 6s)
- osd heartbeat grace (default 20s)

I am working with a very small cluster:
- 2 storage nodes
- 1 to 6 OSDs per storage node
- replication of 2

In this configuration, losing a storage node (e.g. power failure) results in an interruption to users of the cluster for 30 or more seconds - due to the length of the heartbeat interval and grace period. I am just wondering why the defaults for these are so high and whether anyone has experience with tuning these to reduce the service interruption on storage node failure. I know there is always a trade-off between faster failure detection times and incorrectly detecting a failure - just wondering how much room there is to reduce these settings.

Bart Wensley, Wind River



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux