Monitor Restart triggers half of our OSDs marked down

Christian Eichelmann <christian.eichelmann@xxxxxxxx> · Tue, 03 Feb 2015 12:38:55 +0100

Hi all,

during some failover tests and some configuration tests, we currently
discover a strange phenomenon:

Restarting one of our monitors (5 in sum) triggers about 300 of the
following events:

osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after
22.005858 >= grace 20.000000)

The osds come back up shortly after the have been marked down. What I
don't understand is: How can a restart of one monitor prevent the osds
from talking to each other and marking them down?

FYI:
We are currently using the following settings:
mon osd adjust hearbeat grace = false
mon osd min down reporters = 20
mon osd adjust down out interval = false

Regards,
Christian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com