Re: Gluster does not seem to detect a split-brain situation

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Sun, 07 Jun 2015 13:09:21 -0700



    (oops... I hate when I reply off-list)

    
    That warning should, imho, be an error. That's saying that the
    handle, which should be a hardlink to the file, doesn't have a
    matching inode. It should if it's a hardlink.

    
    If it were me, I would:

    
        find /export/sdb1/data/.glusterfs -type f -links 1 -print0 |
    xargs /bin/rm

    
    This would clean up any handles that are not hardlinked where they
    should be and will allow gluster to repair them.

    
    Btw, the self-heal errors would be in glustershd.log and/or the
    client mount log(s), not (usually) the brick logs.

    
    On 06/07/2015 12:21 PM, Sjors Gielen
      wrote:

    
      Oops! Accidentally ran the command as non-root on
        Curacao, that's why there was no output. The actual output is:

        
        curacao# getfattr
            -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
        getfattr:
            Removing leading '/' from absolute path names
        #
            file: export/sdb1/data/Case/21000355/studies.dat
        trusted.afr.data-client-0=0x000000000000000000000000
        trusted.afr.data-client-1=0x000000000000000000000000
        trusted.afr.dirty=0x000000000000000000000000
        trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
        

        For
            reference, the output on bonaire:
        

        bonaire# getfattr
            -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
        getfattr:
            Removing leading '/' from absolute path names
        #
            file: export/sdb1/data/Case/21000355/studies.dat
        trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
      
      
        Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen
          <sjors@xxxxxxxxxxxxxx>:

        
            I'm reading about quorums, I haven't set up anything
              like that yet.
            

            (In reply to Joe Julian, who responded off-list)
            

            The output of getfattr on bonaire:
            

            bonaire# getfattr -m . -d -e hex
              /export/sdb1/data/Case/21000355/studies.dat
            getfattr: Removing leading '/' from absolute path names
            # file: export/sdb1/data/Case/21000355/studies.dat
            trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
            

            On curacao, the command gives no output.
            

            From `gluster volume status`, it seems that while the
              "brick curacao:/export/sdb1/data" is online, it has no
              associated port number. Curacao can connect to the port
              number provided by Bonaire just fine. There are no
              firewalls on/between the two machines, they are on the
              same subnet connected by Ethernet cables and two switches.
            

            By the way, warning messages just started appearing to
              /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire
              saying "mismatching ino/dev between file X and handle Y",
              though, maybe only just now even though I started the full
              self-heal hours ago.
            

            [2015-06-07 19:10:39.624393] W
              [posix-handle.c:727:posix_handle_hard] 0-data-posix:
              mismatching ino/dev between file
              /export/sdb1/data/Archive/S21/21008971/studies.dat
              (9127104621/2065) and handle
              /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd
              (9190215976/2065)
            

            Thanks again!
            Sjors
            

                Op zo 7 jun. 2015 om 19:13 schreef Sjors
                  Gielen <sjors@xxxxxxxxxxxxxx>:

                
                  Hi all,
                    

                    I work at a small, 8-person company that uses
                      Gluster for its primary data storage. We have a
                      volume called "data" that is replicated over two
                      servers (details below). This worked perfectly for
                      over a year, but lately we've been noticing some
                      mismatches between the two bricks, so it seems
                      there has been some split-brain situation that is
                      not being detected or resolved. I have two
                      questions about this:
                    

                    1) I expected Gluster to (eventually) detect a
                      situation like this; why doesn't it?
                    2) How do I fix this situation? I've tried an
                      explicit 'heal', but that didn't seem to change
                      anything.
                    

                    Thanks a lot for your help!
                    Sjors
                    

                    ------8<------
                    

                    Volume & peer info: http://pastebin.com/PN7tRXdU
                    curacao# md5sum
                      /export/sdb1/data/Case/21000355/studies.dat
                    7bc2daec6be953ffae920d81fe6fa25c
                    /export/sdb1/data/Case/21000355/studies.dat

                    
                    bonaire# md5sum
                      /export/sdb1/data/Case/21000355/studies.dat
                    28c950a1e2a5f33c53a725bf8cd72681
                      /export/sdb1/data/Case/21000355/studies.dat

                    
                    # mallorca is one of the clients
                    mallorca# md5sum
                      /data/Case/21000355/studies.dat
                    7bc2daec6be953ffae920d81fe6fa25c 
                      /data/Case/21000355/studies.dat
                    

                    I expected an input/output error after reading
                      this file, because of the split-brain situation,
                      but got none. There are no entries in the
                      GlusterFS logs of either bonaire or curacao.
                    

                    bonaire# gluster volume heal data full
                    Launching heal operation to perform full self
                      heal on volume data has been successful
                    Use heal info commands to check status
                    bonaire# gluster volume heal data info
                    Brick bonaire:/export/sdb1/data/
                    Number of entries: 0

                    
                    Brick curacao:/export/sdb1/data/
                    Number of entries: 0
                    

                    (Same output on curacao, and hours after this,
                      the md5sums on both bricks still differ.)
                    

                    curacao# gluster --version
                    glusterfs 3.6.2 built on Mar  2 2015 14:05:34
                    Repository revision: git://git.gluster.com/glusterfs.git
                    (Same version on Bonaire)
                  
                
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users