Re: Network failure scenarios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Sage Weil" <sage@xxxxxxxxxxx>
> To: "Keith Phua" <keith@xxxxxxxxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxx
> Sent: Friday, August 23, 2013 12:48:18 PM
> Subject: Re:  Network failure scenarios
> 
> On Fri, 23 Aug 2013, Keith Phua wrote:
> > Hi,
> > 
> > It was mentioned in the devel mailing list that for 2 networks setup, if
> > the cluster network failed, the cluster behave pretty badly. Ref:
> > http://article.gmane.org/gmane.comp.file-systems.ceph.devel/12285/match=cluster+network+fail
> > 
> > May I know if this problem still exist in cuttlefish or dumpling?
> 
> This is fixed in dumpling.  When an osd is marked down, it verifies that
> it is able to connect to other hosts on both its public and cluster
> network before trying to add itself back into the cluster.
>  

Alright, that's great!

> > If I have 2 racks of servers in a cluster and a total of 5 mons. Rack1
> > contains 3 mons, 120 osds and rack2 contains 2 mons, 120 osds. In a 2
> > networks setup, May I know what will happen when the following problem
> > occurs:
> > 
> > 1. Public network links between rack1 and rack2 failed resulting rack1
> > mons uncontactable with rack2 mons. osds of both racks still connected.
> > Will the cluster see it as 2 out of 5 mons failed or 3 out of 5 mons
> > failed?
> 
> This is a classic partition.  One rack will see 3 working and 2 failed
> mons, and the cluster will appear "up".  The other rack will see 2 working
> and 3 failed mons, and will be effectively down.

For this scenarios, since it's only the public network link between the 2 racks is downed but cluster network between the racks is still up, will the cluster treat it as "up" with 2 mons down?  Will the rack 2 still be effectively down?

> 
> > 2. Cluster network links between rack1 and rack2 failed resulting osds
> > in rack1 and osds in rack2 disconnected as mentioned above.
> 
> Here all the mons are available.  OSDs will get marked down by peers in
> the opposite rack because the cluster network link has failed.  They will
> only try to mark themselves back up if they are able to reach 1/3 of their
> peers.  This value is currently hard-coded; we can easily make it tunable.
> (https://github.com/ceph/ceph/pull/533)

So in this case if the crushmap is configured to distribute data across hosts, those OSDs who are able to reach their peers within the same rack will be up and those peers across the rack will be marked down.  So will the OSDs whose peers are across the racks start to self-heal and replicate within the same rack after sometime?

> 
> > 3. Both network links between rack1 and rack2 failed. Split-brain seems
> > to occur.  Will the cluster halt? Or rack 1 starts to self-healed and
> > replicate data in rack1 since rack1 will have 3 mons out of 5 mons
> > working?
> 
> This is really the same as 1.  Only the half with a majority of
> communicating monitors will be 'up'; the other part of the cluster will
> not be allowed to do anything.
> 

Does it mean also the half with a majority mons that are up will start to self-heal and replicate the data within that rack after sometime and if the rack is near full, the 'cluster' will halt?

Thanks Sage!

> sage
> 
> > In the above scenarios, all links within the rack are all working.
> > 
> > Your valuable comments are greatly appreciated.
> > 
> > Keith
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux