Hi, I have the following volume: Volume Name: virt_images Type: Replicate Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594 Status: Started Snapshot Count: 2 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: virt3:/data/virt_images/brick Brick2: virt2:/data/virt_images/brick Brick3: printserver:/data/virt_images/brick (arbiter) Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on features.barrier: disable features.scrub: Active features.bitrot: on nfs.rpc-auth-allow: on server.allow-insecure: on user.cifs: off features.shard: off cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.disable: on transport.address-family: inet server.outstanding-rpc-limit: 512 After a server reboot (brick 1) a single file has become unavailable: # touch fedora27.qcow2 touch: setting times of 'fedora27.qcow2': Input/output error Looking at the split brain status from the client side cli: # getfattr -n replica.split-brain-status fedora27.qcow2 # file: fedora27.qcow2 replica.split-brain-status="The file is not under data or metadata split-brain" However, in the client side log, a split brain is mentioned: [2017-12-20 18:05:23.570762] E [MSGID: 108008] [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-virt_images-replicate-0: Failing SETATTR on gfid 7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed. [Input/output error] [2017-12-20 18:05:23.576046] W [MSGID: 108027] [afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no read subvols for /fedora27.qcow2 [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk] 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output error) = Server side No mention of a possible split brain: # gluster volume heal virt_images info split-brain Brick virt3:/data/virt_images/brick Status: Connected Number of entries in split-brain: 0 Brick virt2:/data/virt_images/brick Status: Connected Number of entries in split-brain: 0 Brick printserver:/data/virt_images/brick Status: Connected Number of entries in split-brain: 0 The info command shows the file: ]# gluster volume heal virt_images info Brick virt3:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick virt2:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick printserver:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 The heal and heal full commands does nothing, and I can't find anything in the logs about them trying and failing to fix the file. Trying to manually resolve the split brain from cli gives the following: # gluster volume heal virt_images split-brain source-brick virt3:/data/virt_images/brick /fedora27.qcow2 Healing /fedora27.qcow2 failed: File not in split-brain. Volume heal failed. The attrs from virt2 and virt3 are as follows: [root@virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2 # file: fedora27.qcow2 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.virt_images-client-1=0x000002280000000000000000 trusted.afr.virt_images-client-3=0x000000000000000000000000 trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563 trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001 trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 # file: fedora27.qcow2 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.virt_images-client-2=0x000003ef0000000000000000 trusted.afr.virt_images-client-3=0x000000000000000000000000 trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001 trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 I don't know how to find similar information from the arbiter... Versions are the same on all three systems: # glusterd --version glusterfs 3.12.2 # gluster volume get all cluster.op-version Option Value ------ ----- cluster.op-version 31202 I might try upgrading to version 3.13.0 tomorrow, but I want to hear you out first. How do I fix this? Do I have to manually change the file attributes? Also, in the guides for manual resolution through setfattr, all the bricks are listed with a "trusted.afr.<volume>-client-<brick>". But in my system (as can be seen above), I only see the other bricks? So which attributes should be changes into what? I hope someone might know a solution. If you need any more information I'll try and provide it. I can probably change the virtual machine to another image for now. Best regards, Henrik Juul Pedersen LIAB ApS _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users