Re: cluster not coming up after reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 04/23/2015 06:58 PM, Craig Lewis wrote:
Yes, unless you've adjusted:
[global]
   mon osd min down reporters = 9
   mon osd min down reports = 12

OSDs talk to the MONs on the public network.  The cluster network is
only used for OSD to OSD communication.

If one OSD node can't talk on that network, the other nodes will tell
the MONs that it's OSDs are down.  And that node will also tell the MONs
that all the other OSDs are down.  Then the OSDs marked down will tell
the MONs that they're not down, and the cycle will repeat.

Thanks for the explanation, that makes sense now! Good to know I should set those values:)

I'm somewhat surprised that your cluster eventually stabilized.
The OSDs of that one node were eventually set 'out' of the cluster. I guess the osds where down long enough to get marked out? (Or the monitors took some action after too many failures?) And then the other OSDs could stay up I guess:)


I have 8 OSDs per node.  I set my min down reporters high enough that no
single node can mark another node's OSDs down.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux