Hi folks,
I'm troubled moving an arbiter brick to another server because of I/O load issues. My setup is as follows:
# gluster volume info
Volume Name: myvol
Type: Distributed-Replicate
Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: gv0:/data/glusterfs
Brick2: gv1:/data/glusterfs
Brick3: gv4:/data/gv01-arbiter (arbiter)
Brick4: gv2:/data/glusterfs
Brick5: gv3:/data/glusterfs
Brick6: gv1:/data/gv23-arbiter (arbiter)
Brick7: gv4:/data/glusterfs
Brick8: gv5:/data/glusterfs
Brick9: pluto:/var/gv45-arbiter (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.owner-gid: 1000
storage.owner-uid: 1000
cluster.self-heal-daemon: enable
The gv23-arbiter is the brick that was recently moved from other server (chronos) using the following command:
# gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter gv1:/data/gv23-arbiter commit force
volume replace-brick: success: replace-brick commit force operation successful
It's not the first time I was moving an arbiter brick, and the heal-count was zero for all the bricks before the change, so I didn't expect much trouble then. What was probably wrong is that I then forced chronos out of cluster with gluster peer detach command. All since that, over the course of the last 3 days, I see this:
# gluster volume heal myvol statistics heal-count
Gathering count of entries to be healed on volume myvol has been successful
Brick gv0:/data/glusterfs
Number of entries: 0
Brick gv1:/data/glusterfs
Number of entries: 0
Brick gv4:/data/gv01-arbiter
Number of entries: 0
Brick gv2:/data/glusterfs
Number of entries: 64999
Brick gv3:/data/glusterfs
Number of entries: 64999
Brick gv1:/data/gv23-arbiter
Number of entries: 0
Brick gv4:/data/glusterfs
Number of entries: 0
Brick gv5:/data/glusterfs
Number of entries: 0
Brick pluto:/var/gv45-arbiter
Number of entries: 0
According to the /var/log/glusterfs/glustershd.log, the self-healing is undergoing, so it might be worth just sit and wait, but I'm wondering why this 64999 heal-count persists (a limitation on counter? In fact, gv2 and gv3 bricks contain roughly 30 million files), and I feel bothered because of the following output:
# gluster volume heal myvol info heal-failed
Gathering list of heal failed entries on volume myvol has been unsuccessful on bricks that are down. Please check if all brick processes are running.
I attached the chronos server back to the cluster, with no noticeable effect. Any comments and suggestions would be much appreciated.
--
Best Regards,
Seva Gluschenko
CTO @ http://webkontrol.ru
I'm troubled moving an arbiter brick to another server because of I/O load issues. My setup is as follows:
# gluster volume info
Volume Name: myvol
Type: Distributed-Replicate
Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: gv0:/data/glusterfs
Brick2: gv1:/data/glusterfs
Brick3: gv4:/data/gv01-arbiter (arbiter)
Brick4: gv2:/data/glusterfs
Brick5: gv3:/data/glusterfs
Brick6: gv1:/data/gv23-arbiter (arbiter)
Brick7: gv4:/data/glusterfs
Brick8: gv5:/data/glusterfs
Brick9: pluto:/var/gv45-arbiter (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.owner-gid: 1000
storage.owner-uid: 1000
cluster.self-heal-daemon: enable
The gv23-arbiter is the brick that was recently moved from other server (chronos) using the following command:
# gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter gv1:/data/gv23-arbiter commit force
volume replace-brick: success: replace-brick commit force operation successful
It's not the first time I was moving an arbiter brick, and the heal-count was zero for all the bricks before the change, so I didn't expect much trouble then. What was probably wrong is that I then forced chronos out of cluster with gluster peer detach command. All since that, over the course of the last 3 days, I see this:
# gluster volume heal myvol statistics heal-count
Gathering count of entries to be healed on volume myvol has been successful
Brick gv0:/data/glusterfs
Number of entries: 0
Brick gv1:/data/glusterfs
Number of entries: 0
Brick gv4:/data/gv01-arbiter
Number of entries: 0
Brick gv2:/data/glusterfs
Number of entries: 64999
Brick gv3:/data/glusterfs
Number of entries: 64999
Brick gv1:/data/gv23-arbiter
Number of entries: 0
Brick gv4:/data/glusterfs
Number of entries: 0
Brick gv5:/data/glusterfs
Number of entries: 0
Brick pluto:/var/gv45-arbiter
Number of entries: 0
According to the /var/log/glusterfs/glustershd.log, the self-healing is undergoing, so it might be worth just sit and wait, but I'm wondering why this 64999 heal-count persists (a limitation on counter? In fact, gv2 and gv3 bricks contain roughly 30 million files), and I feel bothered because of the following output:
# gluster volume heal myvol info heal-failed
Gathering list of heal failed entries on volume myvol has been unsuccessful on bricks that are down. Please check if all brick processes are running.
I attached the chronos server back to the cluster, with no noticeable effect. Any comments and suggestions would be much appreciated.
Best Regards,
Seva Gluschenko
CTO @ http://webkontrol.ru
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users