On Fri, Feb 9, 2018 at 11:46 AM, Karthik Subrahmanya <ksubrahm@xxxxxxxxxx> wrote:
KarthikRegards,3. getfattr -d -e hex -m . <filepath-on-brick> output of any one of the file which is pending heal from all the bricks2. Output of gluster volume heal <volname> info summary or gluster volume heal <volname> infoHey,Did the heal completed and you still have some entries pending heal?
If yes then can you provide the following informations to debug the issue.
1. Which version of gluster you are runningOn Thu, Feb 8, 2018 at 12:48 PM, Seva Gluschenko <gvs@xxxxxxxxxxxxx> wrote:______________________________Hi folks,
I'm troubled moving an arbiter brick to another server because of I/O load issues. My setup is as follows:
# gluster volume info
Volume Name: myvol
Type: Distributed-Replicate
Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: gv0:/data/glusterfs
Brick2: gv1:/data/glusterfs
Brick3: gv4:/data/gv01-arbiter (arbiter)
Brick4: gv2:/data/glusterfs
Brick5: gv3:/data/glusterfs
Brick6: gv1:/data/gv23-arbiter (arbiter)
Brick7: gv4:/data/glusterfs
Brick8: gv5:/data/glusterfs
Brick9: pluto:/var/gv45-arbiter (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.owner-gid: 1000
storage.owner-uid: 1000
cluster.self-heal-daemon: enable
The gv23-arbiter is the brick that was recently moved from other server (chronos) using the following command:
# gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter gv1:/data/gv23-arbiter commit force
volume replace-brick: success: replace-brick commit force operation successful
It's not the first time I was moving an arbiter brick, and the heal-count was zero for all the bricks before the change, so I didn't expect much trouble then. What was probably wrong is that I then forced chronos out of cluster with gluster peer detach command. All since that, over the course of the last 3 days, I see this:
# gluster volume heal myvol statistics heal-count
Gathering count of entries to be healed on volume myvol has been successful
Brick gv0:/data/glusterfs
Number of entries: 0
Brick gv1:/data/glusterfs
Number of entries: 0
Brick gv4:/data/gv01-arbiter
Number of entries: 0
Brick gv2:/data/glusterfs
Number of entries: 64999
Brick gv3:/data/glusterfs
Number of entries: 64999
Brick gv1:/data/gv23-arbiter
Number of entries: 0
Brick gv4:/data/glusterfs
Number of entries: 0
Brick gv5:/data/glusterfs
Number of entries: 0
Brick pluto:/var/gv45-arbiter
Number of entries: 0
According to the /var/log/glusterfs/glustershd.log, the self-healing is undergoing, so it might be worth just sit and wait, but I'm wondering why this 64999 heal-count persists (a limitation on counter? In fact, gv2 and gv3 bricks contain roughly 30 million files), and I feel bothered because of the following output:
# gluster volume heal myvol info heal-failed
Gathering list of heal failed entries on volume myvol has been unsuccessful on bricks that are down. Please check if all brick processes are running.
I attached the chronos server back to the cluster, with no noticeable effect. Any comments and suggestions would be much appreciated.
--
Best Regards,
Seva Gluschenko
CTO @ http://webkontrol.ru
_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users