Re: split-brain recovery automation, any plans?

Dmitry Melekhov <dm@xxxxxxxxxx> · Wed, 13 Jul 2016 08:13:01 +0400



    13.07.2016 07:44, Pranith Kumar
      Karampuri пишет:

    
          On Tue, Jul 12, 2016 at 9:27 PM,
            Dmitry Melekhov <dm@xxxxxxxxxx> wrote:

            
                12.07.2016 17:38, Pranith Kumar Karampuri пишет:

                
                    Did you wait for heals to complete 
                      before upgrading second node?

                    
                 no...
            
            
            So basically if you have operations in progress on the
              mount, you should wait for heals to complete before you
              upgrade second node. If you have all the operations on all
              the mounts stopped or you unmounted all the mounts for the
              volume, then you can upgrade all the servers one by one
              then clients. Otherwise it will lead to problems. That
              said in 3 way replica it shouldn't cause split-brains. So
              I would like to know exact steps that lead to this
              problem. 
          
        
    Thank you, this is all I can remember :-(

    
            We know of one issue which leads to split-brains in
              case of VM workloads where we take down bricks in cyclic
              manner without waiting for heals to complete. I wonder if
              the steps that lead to split-brain on your setup are
              similar. We are targetting this for future releases...

            
    I guess we hit this...

    
                        On Tue, Jul 12, 2016 at
                          3:08 PM, Dmitry Melekhov <dm@xxxxxxxxxx>
                          wrote:

                          
                              12.07.2016 13:31, Pranith Kumar
                                Karampuri пишет:

                              
                                      On Mon,
                                        Jul 11, 2016 at 2:26 PM, Dmitry
                                        Melekhov <dm@xxxxxxxxxx>
                                        wrote:

                                        11.07.2016
                                          12:47, Gandalf Corvotempesta
                                          пишет:

                                            
                                              2016-07-11 9:54 GMT+02:00
                                              Dmitry Melekhov <dm@xxxxxxxxxx>:

                                              
                                                We just got split-brain
                                                during update to 3.7.13
                                                ;-)

                                              
                                              This is an interesting
                                              point.

                                              Could you please tell me
                                              which replica count did
                                              you set ?

                                            
                                           3

                                             
                                              With replica "3" split
                                              brain should not occurs,
                                              right ?

                                            
                                           I guess we did
                                          something wrong :-)
                                        

                                        Or there is a bug we never
                                          found? Could you please share
                                          details about what you did?

                                        
                               upgraded to 3.7.13 from 3.7.11
                              using yum, while at least one VM is
                              running :-)

                              on all 3 servers, one by one:

                              
                              yum upgrade

                              systemctl stop glusterd 

                              than killed glusterfsd processes using
                              kill 

                              and systemctl start glusterd

                              
                              then next server....

                              
                              after this we tried to restart VM, but it
                              failed, because we forget to restart
                              libvirtd, and it used old libraries,

                              I guess this is point where we got this
                              problem.

                                
                                                I'm planning a new
                                                cluster and I would like
                                                to be protected against

                                                split brains.

                                              
_______________________________________________

                                              Gluster-users mailing list

                                              Gluster-users@xxxxxxxxxxx

                                              http://www.gluster.org/mailman/listinfo/gluster-users
                                          
                                        
                                      -- 

                                      
                                        Pranith

                                        
                        -- 

                        
                          Pranith

                          
          -- 

          
            Pranith

            
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users