On Tue, Feb 3, 2015 at 2:38 PM, Christian Eichelmann <christian.eichelmann@xxxxxxxx> wrote: > Hi all, > > during some failover tests and some configuration tests, we currently > discover a strange phenomenon: > > Restarting one of our monitors (5 in sum) triggers about 300 of the > following events: > > osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after > 22.005858 >= grace 20.000000) > > The osds come back up shortly after the have been marked down. What I > don't understand is: How can a restart of one monitor prevent the osds > from talking to each other and marking them down? > > FYI: > We are currently using the following settings: > mon osd adjust hearbeat grace = false > mon osd min down reporters = 20 > mon osd adjust down out interval = false > > Regards, > Christian Can confirm simular behavior but in less excessive sizes: leader mon restart may trigger small number of wrong markings as down or pg rebalance, preconditions are very uncertain. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com