Re: Monitor Restart triggers half of our OSDs marked down

Andrey Korolyov <andrey@xxxxxxx> · Tue, 3 Feb 2015 15:41:19 +0400

On Tue, Feb 3, 2015 at 2:38 PM, Christian Eichelmann
<christian.eichelmann@xxxxxxxx> wrote:
> Hi all,
>
> during some failover tests and some configuration tests, we currently
> discover a strange phenomenon:
>
> Restarting one of our monitors (5 in sum) triggers about 300 of the
> following events:
>
> osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after
> 22.005858 >= grace 20.000000)
>
> The osds come back up shortly after the have been marked down. What I
> don't understand is: How can a restart of one monitor prevent the osds
> from talking to each other and marking them down?
>
> FYI:
> We are currently using the following settings:
> mon osd adjust hearbeat grace = false
> mon osd min down reporters = 20
> mon osd adjust down out interval = false
>
> Regards,
> Christian

Can confirm simular behavior but in less excessive sizes: leader mon
restart may trigger small number of wrong markings as down or pg
rebalance, preconditions are very uncertain.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com