On 17-12-06 07:01 AM, Stefan Kooman wrote:
[osd] # http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ osd crush update on start = false osd heartbeat interval = 1 # default 6 osd mon heartbeat interval = 10 # default 30 osd mon report interval min = 1 # default 5 osd mon report interval max = 15 # default 120 The osd would almost immediately see a "cut off" to their partner OSD's in the placement group. By default they wait 6 seconds before sending their report to the monitors. During our analysis this is exactly the time the monitors were keeping an election. By tuning all of the above we could get them to send their reports faster, and by the time the election process was finished the monitors would handle the reports from the OSDs and come to the conclusion that a DC is down, flag it down and allow for normal client IO again. Of course, stability and data safety is most important to us. So if any of these settings make you worry please let us know.
Heartbeats, especially in Luminous, are quite heavy bandwidth-wise if you have a lot of OSDs in clusters. You may want to keep osd heartbeat interval at 3 lowest, or if that's not acceptable then at least set "osd heartbeat min size" to 0.
-- Piotr Dałek piotr.dalek@xxxxxxxxxxxx https://www.ovh.com/us/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com