Re: self-heal not working

mabi <mabi@xxxxxxxxxxxxx> · Wed, 23 Aug 2017 13:01:59 -0400

I just saw the following bug which was fixed in 3.8.15:

https://bugzilla.redhat.com/show_bug.cgi?id=1471613

Is it possible that the problem I described in this post is related to that bug?

-------- Original Message --------
Subject: Re: [Gluster-users] self-heal not working
Local Time: August 22, 2017 11:51 AM
UTC Time: August 22, 2017 9:51 AM
From: ravishankar@xxxxxxxxxx
To: mabi <mabi@xxxxxxxxxxxxx>
Ben Turner <bturner@xxxxxxxxxx>, Gluster Users <gluster-users@xxxxxxxxxxx>

On 08/22/2017 02:30 PM, mabi wrote:
Thanks for the additional hints, I have the following 2
        questions first:

- In order to launch the index heal is the following command
        correct:
gluster volume heal myvolume

Yes

- If I run a "volume start force" will it have any short
        disruptions on my clients which mount the volume through FUSE?
        If yes, how long? This is a production system that's why I am
        asking.

No. You can actually create a test volume on  your personal linux
    box to try these kinds of things without needing multiple machines.
    This is how we develop and test our patches :)
 'gluster volume create testvol replica 3
    /home/mabi/bricks/brick{1..3} force` and so on. 

 HTH,
 Ravi 

-------- Original Message --------
Subject: Re: [Gluster-users] self-heal not working
Local Time: August 22, 2017 6:26 AM
UTC Time: August 22, 2017 4:26 AM
From: ravishankar@xxxxxxxxxx
To: mabi <mabi@xxxxxxxxxxxxx>, Ben Turner <bturner@xxxxxxxxxx>
Gluster Users <gluster-users@xxxxxxxxxxx>

Explore the following:
- Launch index heal and look at the glustershd logs of all
          bricks for possible errors
- See if the glustershd in each node is connected to all
          bricks.
- If not try to restart shd by `volume start force`
- Launch index heal again and try.
- Try debugging the shd log by setting client-log-level to
          DEBUG temporarily.

On 08/22/2017 03:19 AM, mabi wrote:
Sure, it doesn't look like a split brain based on the
            output:

Brick node1.domain.tld:/data/myvolume/brick
Status: Connected
Number of entries in split-brain: 0

Brick node2.domain.tld:/data/myvolume/brick
Status: Connected
Number of entries in split-brain: 0

Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
Status: Connected
Number of entries in split-brain: 0

-------- Original Message --------
Subject: Re: [Gluster-users] self-heal not working
Local Time: August 21, 2017 11:35 PM
UTC Time: August 21, 2017 9:35 PM
From: bturner@xxxxxxxxxx
To: mabi <mabi@xxxxxxxxxxxxx>
Gluster Users <gluster-users@xxxxxxxxxxx>

Can you also provide:

gluster v heal <my vol> info split-brain

If it is split brain just delete the incorrect file
              from the brick and run heal again. I haven"t tried this
              with arbiter but I assume the process is the same.

-b

----- Original Message -----
> From: "mabi" <mabi@xxxxxxxxxxxxx>
> To: "Ben Turner" <bturner@xxxxxxxxxx>
> Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
> Sent: Monday, August 21, 2017 4:55:59 PM
> Subject: Re: [Gluster-users] self-heal not working
> 
> Hi Ben,
> 
> So it is really a 0 kBytes file everywhere (all
              nodes including the arbiter
> and from the client).
> Here below you will find the output you requested.
              Hopefully that will help
> to find out why this specific file is not
              healing... Let me know if you need
> any more information. Btw node3 is my arbiter
              node.
> 
> NODE1:
> 
> STAT:
> File:
>
‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
> Size: 0 Blocks: 38 IO Block: 131072 regular empty
              file
> Device: 24h/36d Inode: 10033884 Links: 2
> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
              ( 33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:11:46.407404779 +0200
> Change: 2017-08-14 17:11:46.407404779 +0200
> Birth: -
> 
> GETFATTR:
> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
> 
> NODE2:
> 
> STAT:
> File:
>
‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
> Size: 0 Blocks: 38 IO Block: 131072 regular empty
              file
> Device: 26h/38d Inode: 10031330 Links: 2
> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
              ( 33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:11:46.403704181 +0200
> Change: 2017-08-14 17:11:46.403704181 +0200
> Birth: -
> 
> GETFATTR:
> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
> 
> NODE3:
> STAT:
> File:
>
/srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> Size: 0 Blocks: 0 IO Block: 4096 regular empty
              file
> Device: ca11h/51729d Inode: 405208959 Links: 2
> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
              ( 33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:04:55.530681000 +0200
> Change: 2017-08-14 17:11:46.604380051 +0200
> Birth: -
> 
> GETFATTR:
> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>
trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
> 
> CLIENT GLUSTER MOUNT:
> STAT:
> File:
>
              "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
> Size: 0 Blocks: 0 IO Block: 131072 regular empty
              file
> Device: 1eh/30d Inode: 11897049013408443114 Links:
              1
> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid:
              ( 33/www-data)
> Access: 2017-08-14 17:04:55.530681000 +0200
> Modify: 2017-08-14 17:11:46.407404779 +0200
> Change: 2017-08-14 17:11:46.407404779 +0200
> Birth: -
> 
> > -------- Original Message --------
> > Subject: Re: [Gluster-users] self-heal not
              working
> > Local Time: August 21, 2017 9:34 PM
> > UTC Time: August 21, 2017 7:34 PM
> > From: bturner@xxxxxxxxxx
> > To: mabi <mabi@xxxxxxxxxxxxx>
> > Gluster Users <gluster-users@xxxxxxxxxxx>
> >
> > ----- Original Message -----
> >> From: "mabi" <mabi@xxxxxxxxxxxxx>
> >> To: "Gluster Users" <gluster-users@xxxxxxxxxxx>
> >> Sent: Monday, August 21, 2017 9:28:24 AM
> >> Subject: [Gluster-users] self-heal not
              working
> >>
> >> Hi,
> >>
> >> I have a replicat 2 with arbiter
              GlusterFS 3.8.11 cluster and there is
> >> currently one file listed to be healed as
              you can see below but never gets
> >> healed by the self-heal daemon:
> >>
> >> Brick
              node1.domain.tld:/data/myvolume/brick
> >>
              /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >> Status: Connected
> >> Number of entries: 1
> >>
> >> Brick
              node2.domain.tld:/data/myvolume/brick
> >>
              /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >> Status: Connected
> >> Number of entries: 1
> >>
> >> Brick
              node3.domain.tld:/srv/glusterfs/myvolume/brick
> >>
              /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >> Status: Connected
> >> Number of entries: 1
> >>
> >> As once recommended on this mailing list
              I have mounted that glusterfs
> >> volume
> >> temporarily through fuse/glusterfs and
              ran a "stat" on that file which is
> >> listed above but nothing happened.
> >>
> >> The file itself is available on all 3
              nodes/bricks but on the last node it
> >> has a different date. By the way this
              file is 0 kBytes big. Is that maybe
> >> the reason why the self-heal does not
              work?
> >
> > Is the file actually 0 bytes or is it just 0
              bytes on the arbiter(0 bytes
> > are expected on the arbiter, it just stores
              metadata)? Can you send us the
> > output from stat on all 3 nodes:
> >
> > $ stat <file on back end brick>
> > $ getfattr -d -m - <file on back end
              brick>
> > $ stat <file from gluster mount>
> >
> > Lets see what things look like on the back
              end, it should tell us why
> > healing is failing.
> >
> > -b
> >
> >>
> >> And how can I now make this file to heal?
> >>
> >> Thanks,
> >> Mabi
> >>
> >>
> >>
> >>
> >>
              _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users@xxxxxxxxxxx
> >> http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users