I am running: glusterfs 3.5.9 built on Mar 28 2016 07:10:17 Other volume info: Type: Distributed-Replicate Number of Bricks: 8 x 3 = 24 Transport-type: tcp Options Reconfigured: performance.cache-refresh-timeout: 30 performance.cache-size: 768MB cluster.quorum-type: auto cluster.server-quorum-type: server cluster.server-quorum-ratio: 51 When I try to manipulate a file (def/ghi.gz) on the mounted glusterfs folder (abc) I get an Errno 5 input/output error. Most of the files work, but there are lots that have this same problem. I visited each brick in my volume to see what the extended file attributes are for this file. On my_volume-replicate-0 there is an empty file with the filename. When I run “ls -al” it looks like this: ---------T 2 root root 0 Mar 1 14:56 ghi.gz On the first two bricks (bricks 0 and 1) of my_volume-replicate-0 when I run “getfattr -d -m. -e hex ghi.gz” I get the following results: # file: ghi.gz trusted.afr.my_volume-client-0=0x000000000000000000000000 trusted.afr.my_volume-client-1=0x000000000000000000000000 trusted.afr.my_volume-client-2=0x000000020000000200000000 trusted.gfid=0xabb0369b05844390add6ea72ce7e107a trusted.glusterfs.dht.linkto=0x686f7374696e672d7265706c69636174652d3400 The link to looks like the following when I use text encoding instead of hex encoding: trusted.glusterfs.dht.linkto="my_volume-replicate-4" The third brick (brick 2) of my_volume-replicate-0 has these extended attributes: # file: ghi.gz trusted.gfid=0xc5c99fe21c3f4582b48e6f69ff76e33b trusted.glusterfs.dht.linkto=0x686f7374696e672d7265706c69636174652d3400 So the third brick has a DIFFERENT trusted.gfid. The first two bricks have trusted.afr.my_volume-client-2=0x000000020000000200000000. Does that mean that the first two bricks think that the third brick (brick 2) has differences? All three bricks are linking to my_volume-replicate-4. All three bricks (bricks 12, 13, and 14) of my_volume-replicate-4 all have the actual file with these extended attributes: # file: ghi.gz trusted.afr.my_volume-client-12=0x000000000000000000000000 trusted.afr.my_volume-client-13=0x000000000000000000000000 trusted.afr.my_volume-client-14=0x000000000000000000000000 trusted.gfid=0xabb0369b05844390add6ea72ce7e107a So, my_volume-replicate-4’s trusted.gfid matches bricks 0 and 1 of my_volume-replicate-0. And they all have 0x000000000000000000000000 for all three trusted.afr.my_volume-client-## attribute. I assume this means that the file is the same on all three bricks of my_volume-replicate-4. No other bricks in the system have the ghi.gz file on them. When I go to .glusterfs/indices/xattrop of bricks 0 and 1 there is a file there named abb0369b-0584-4390-add6-ea72ce7e107a. This means that this file id is in need of healing, correct? There is NOT a file named abb0369b-0584-4390-add6-ea72ce7e107a on brick 2. When I run “gluster volume heal my_volume info heal-failed” it lists <gfid:abb0369b-0584-4390-add6-ea72ce7e107a> four times. I have tried to do a full heal and a rebalance of the system, but it does not fix this problem. How do I fix this problem? Is there an easy way that I can fix all of the files with the problem in bulk? Thank you very much for any insights or help you may have!! Dave |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users