Re: split brain case

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Thu, 29 Mar 2018 10:51:09 +0200



    On 29.03.2018 10:25, ST Wong (ITSC)
      wrote:

    
        Hi all,
         
        We put 8 (4+4) OSD and 5 (2+3) MON
            servers in server rooms in 2 buildings for redundancy.  The
            buildings are connected through direct connection.
        While servers in each building have
            alternate uplinks.   What will happen in case the link
            between the buildings is broken (application servers in each
            server room will continue to write to OSDs in the same room)
            ?
         
        Thanks a lot.
        Rgds
        /st wong
      
    
    my guesstimate is that the serverroom with 3 mons will retain
    quorum, and continue operation. the room with 2 mon's will notice
    they are split out and block. 

    assuming you have 3+2 pools and one of the objects is allways on the
    other server room. some pg's will be active becouse you have 2
    objects on the working room.  but some pg's will be inactive until
    they can selfheal and backfill a second copy of the objects. 

    i assume you could have 4+2 replication to avoid this issue. 

    
    ofcourse the 4 osd's left working now want to selfheal by recreating
    all objects stored on the 4 split off osd's and have a huge recovery
    job. and you may risk that the osd's goes into too_full error,
    unless you have free space in your osd's to recreate all the data in
    the defective part of the cluster. or they will be stuck in recovery
    mode until you get the second room running, this depends on your
    crush map. 

    
    if you really need to split a cluster into separate rooms, i would
    have used 3 rooms, with redundant data paths between them. primary
    path between room A and C is direct. redundant path is via A-B-C.
    this should reduce the disaster if a single path is broken. 

    with 1 mon in each room. you can loose a whole room to powerloss,
    and still have a working cluster.  and you would only need 33%
    instead of 50%  cluster capacity as free space in your cluster to be
    able to selfheal 

    
    point in that slitting the cluster hurts. and if HA is the most
    important then you may  want to check out rbd mirror.

    
    kind

    Ronny Aasen

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com