Re: Self-heal doesn't appear to be happening

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Sun, 15 Mar 2015 12:39:06 -0700



    On 03/15/2015 11:16 AM, Jonathan Heese wrote:

    
        Hello all,
        

        I have a 2 node 2 brick replicate gluster volume that I'm
          having trouble making fault tolerant (a seemingly basic
          feature!) under CentOS 6.6 using EPEL packages.
        

        Both nodes are as close to identical hardware and software as
          possible, and I'm running the following packages:
        glusterfs-rdma-3.6.2-1.el6.x86_64

          glusterfs-fuse-3.6.2-1.el6.x86_64

          glusterfs-libs-3.6.2-1.el6.x86_64

          glusterfs-cli-3.6.2-1.el6.x86_64

          glusterfs-api-3.6.2-1.el6.x86_64

          glusterfs-server-3.6.2-1.el6.x86_64

          glusterfs-3.6.2-1.el6.x86_64

        
    3.6.2 is not considered production stable. Based on your expressed
    concern, you should probably be running 3.5.3.

    
        They both have dual-port Mellanox 20Gbps InfiniBand cards
          with a straight (i.e. "crossover") cable and opensm to
          facilitate the RDMA transport between them.
        

        Here are some data dumps to set the stage (and yes, the
          output of these commands looks the same on both nodes):

        
        [root@duchess ~]# gluster volume info

          
          Volume Name: gluster_disk

          Type: Replicate

          Volume ID: b1279e22-8589-407b-8671-3760f42e93e4

          Status: Started

          Number of Bricks: 1 x 2 = 2

          Transport-type: rdma

          Bricks:

          Brick1: duke-ib:/bricks/brick1

          Brick2: duchess-ib:/bricks/brick1

        
        [root@duchess ~]# gluster volume status

          Status of volume: gluster_disk

          Gluster process                                        
          Port    Online  Pid

------------------------------------------------------------------------------

          Brick duke-ib:/bricks/brick1                           
          49153   Y       9594

          Brick duchess-ib:/bricks/brick1                        
          49153   Y       9583

          NFS Server on localhost                                
          2049    Y       9590

          Self-heal Daemon on localhost                          
          N/A     Y       9597

          NFS Server on 10.10.10.1                               
          2049    Y       9607

          Self-heal Daemon on 10.10.10.1                         
          N/A     Y       9614

          
          Task Status of Volume gluster_disk

------------------------------------------------------------------------------

          There are no active volume tasks

        
        [root@duchess ~]# gluster peer status

          Number of Peers: 1

          
          Hostname: 10.10.10.1

          Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b

          State: Peer in Cluster (Connected)
        

        So before putting any real data on these guys (the data will
          eventually be a handful of large image files backing an iSCSI
          target via tgtd for ESXi datastores), I wanted to simulate the
          failure of one of the nodes. So I stopped glusterfsd and
          glusterd on duchess, waited about 5 minutes, then started them
          back up again, tail'ing /var/log/glusterfs/* and
          /var/log/messages. I'm not sure exactly what I'm looking for,
          but the logs quieted down after just a minute or so of
          restarting the daemons. I didn't see much indicating that
          self-healing was going on.

        
        Every now and then (and seemingly more often than not), when
          I run "gluster volume heal gluster_disk info", I get no output
          from the command, and the following dumps into my
          /var/log/messages:

        
        Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at
          7ff56068d020 ip 00007ff54f366d80 sp 00007ff54e22adf8 error 6
          in libmthca-rdmav2.so[7ff54f365000+7000]

        
    This a segfault in the mellanox driver. Please report it to the
    driver developers.
    
      
          Mar 15 13:59:17 duchess abrtd: Directory
          'ccpp-2015-03-15-13:59:16-10359' creation detected

          Mar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid
          10359 (/usr/sbin/glfsheal) to
          /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359 (225595392
          bytes)

          Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server'
          isn't signed with proper key

          Mar 15 13:59:25 duchess abrtd: 'post-create' on
          '/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1

          Mar 15 13:59:25 duchess abrtd: Deleting problem directory
          '/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'

          
        Other times, when I'm lucky, I get messages from the "heal
          info" command indicating that datastore1.img (the file that I
          intentionally changed while duchess was offline) is in need of
          healing:
        

        [root@duke ~]# gluster volume heal gluster_disk info

          Brick duke.jonheese.local:/bricks/brick1/

          /datastore1.img - Possibly undergoing heal

          
          Number of entries: 1

          
          Brick duchess.jonheese.local:/bricks/brick1/

          /datastore1.img - Possibly undergoing heal

          
          Number of entries: 1

        
        But watching df on the bricks and tailing glustershd.log
          doesn't seem to indicate that anything is actually happening
          -- and df indicates that brick on duke *is* different in file
          size from the brick on duchess. It's been over an hour now,
          and I'm not confident that the selfheal functionality is even
          working at all... Nor do I know how to do anything about it!
      
    
    File sizes are not necessarily any indication. If the changes you
    made were nulls, the change may be sparse. df --apparent is a little
    better indicator. Comparing hashes would be even better.

    
    The extended attributes on the file itself, on the bricks, can tell
    you the heal state. Look at "getfattr -m . -d -e hex $file". The
    trusted.afr attributes, if non-zero, show pending changes destined
    for the other server.

    
        Also, I find it a little bit troubling that I'm using the
          aliases (in /etc/hosts on both servers) duke-ib and duchess-ib
          for the gluster node configuration, but the "heal info"
          command refers to my nodes with their internal FQDNs, which
          resolve to their 1Gbps interface IPs... That doesn't mean that
          they're trying to communicate over those interfaces (the
          volume is configured with "transport rdma", as you can see
          above), does it?

        
    I'd call that a bug. It should report the hostnames as they're
    listed in the volume info.

    
        Can anyone throw out any ideas on how I can:
        1. Determine whether this is intentional behavior (or a
          bug?),
        2. Determine whether my data has been properly resync'd
          across the bricks, and
        3. Make it work correctly if not.
        

        Thanks in advance!
        

        Regards,
        Jon Heese

        
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users