Re: Monitor Restart triggers half of our OSDs marked down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 3, 2015 at 3:38 AM, Christian Eichelmann
<christian.eichelmann@xxxxxxxx> wrote:
> Hi all,
>
> during some failover tests and some configuration tests, we currently
> discover a strange phenomenon:
>
> Restarting one of our monitors (5 in sum) triggers about 300 of the
> following events:
>
> osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after
> 22.005858 >= grace 20.000000)
>
> The osds come back up shortly after the have been marked down. What I
> don't understand is: How can a restart of one monitor prevent the osds
> from talking to each other and marking them down?
>
> FYI:
> We are currently using the following settings:
> mon osd adjust hearbeat grace = false
> mon osd min down reporters = 20
> mon osd adjust down out interval = false

That's really strange. I think maybe you're seeing some kind of
secondary effect; what kind of CPU usage are you seeing on the
monitors during this time? Have you checked the log on any OSDs which
have been marked down?

I have a suspicion that maybe the OSDs are detecting their failed
monitor connection and not being able to reconnect to another monitor
quickly enough, but I'm not certain what the overlaps are there.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux