Hi Karthik, Thanks for the info. Maybe the documentation should be updated to explain the different AFR versions, I know I was confused. Also, when looking at the changelogs from my three bricks before fixing: Brick 1: trusted.afr.virt_images-client-1=0x000002280000000000000000 trusted.afr.virt_images-client-3=0x000000000000000000000000 Brick 2: trusted.afr.virt_images-client-2=0x000003ef0000000000000000 trusted.afr.virt_images-client-3=0x000000000000000000000000 Brick 3 (arbiter): trusted.afr.virt_images-client-1=0x000002280000000000000000 I would think that the changelog for client 1 should win by majority vote? Or how does the self-healing process work? I assumed this as the correct version, and reset client 2 on brick 2: # setfattr -n trusted.afr.virt_images-client-2 -v 0x000000000000000000000000 fedora27.qcow2 I then did a directory listing, which might have started a heal, but heal statistics show (i also did a full heal): Starting time of crawl: Fri Dec 22 11:34:47 2017 Ending time of crawl: Fri Dec 22 11:34:47 2017 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 1 Starting time of crawl: Fri Dec 22 11:39:29 2017 Ending time of crawl: Fri Dec 22 11:39:29 2017 Type of crawl: FULL No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 1 I was immediately able to touch the file, so gluster was okay about it, however heal info still showed the file for a while: # gluster volume heal virt_images info Brick virt3:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick virt2:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick printserver:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Now heal info shows 0 entries, and the two data bricks have the same md5sum, so it's back in sync. I have a few questions after all of this: 1) How can a split brain happen in a replica 3 arbiter 1 setup with both server- and client quorum enabled? 2) Why was it not able to self heal, when tro bricks seemed in sync with their changelogs? 3) Why could I not see the file in heal info split-brain? 4) Why could I not fix this through the cli split-brain resolution tool? 5) Is it possible to force a sync in a volume? Or maybe test sync status? It might be smart to be able to "flush" changes when taking a brick down for maintenance. 6) How am I supposed to monitor events like this? I have a gluster volume with ~500.000 files, I need to be able to guarantee data integrity and availability to the users. 7) Is glusterfs "production ready"? Because I find it hard to monitor and thus trust in these setups. Also performance with small / many files seems horrible at best - but that's for another discussion. Thanks for all of your help, Ill continue to try and tweak some performance out of this. :) Best regards, Henrik Juul Pedersen LIAB ApS On 22 December 2017 at 07:26, Karthik Subrahmanya <ksubrahm@xxxxxxxxxx> wrote: > Hi Henrik, > > Thanks for providing the required outputs. See my replies inline. > > On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen <hjp@xxxxxxx> wrote: >> >> Hi Karthik and Ben, >> >> I'll try and reply to you inline. >> >> On 21 December 2017 at 07:18, Karthik Subrahmanya <ksubrahm@xxxxxxxxxx> >> wrote: >> > Hey, >> > >> > Can you give us the volume info output for this volume? >> >> # gluster volume info virt_images >> >> Volume Name: virt_images >> Type: Replicate >> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594 >> Status: Started >> Snapshot Count: 2 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: virt3:/data/virt_images/brick >> Brick2: virt2:/data/virt_images/brick >> Brick3: printserver:/data/virt_images/brick (arbiter) >> Options Reconfigured: >> features.quota-deem-statfs: on >> features.inode-quota: on >> features.quota: on >> features.barrier: disable >> features.scrub: Active >> features.bitrot: on >> nfs.rpc-auth-allow: on >> server.allow-insecure: on >> user.cifs: off >> features.shard: off >> cluster.shd-wait-qlength: 10000 >> cluster.locking-scheme: granular >> cluster.data-self-heal-algorithm: full >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> cluster.eager-lock: enable >> network.remote-dio: enable >> performance.low-prio-threads: 32 >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> nfs.disable: on >> transport.address-family: inet >> server.outstanding-rpc-limit: 512 >> >> > Why are you not able to get the xattrs from arbiter brick? It is the >> > same >> > way as you do it on data bricks. >> >> Yes I must have confused myself yesterday somehow, here it is in full >> from all three bricks: >> >> Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2 >> # file: fedora27.qcow2 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.virt_images-client-1=0x000002280000000000000000 >> trusted.afr.virt_images-client-3=0x000000000000000000000000 >> trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563 >> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba >> >> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 >> >> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001 >> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 >> >> Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2 >> # file: fedora27.qcow2 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.virt_images-client-2=0x000003ef0000000000000000 >> trusted.afr.virt_images-client-3=0x000000000000000000000000 >> trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a >> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba >> >> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 >> >> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001 >> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 >> >> Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex fedora27.qcow2 >> # file: fedora27.qcow2 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.virt_images-client-1=0x000002280000000000000000 >> trusted.bit-rot.version=0x31000000000000005a39237200073206 >> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba >> >> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732 >> >> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000000000000000000000000001 >> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001 >> >> I was expecting trusted.afr.virt_images-client-{1,2,3} on all bricks? > > From AFR-V2 we do not have self blaming attrs. So you will see a brick > blaming other bricks only. > For example brcik1 can blame brick2 & brick 3, not itself. >> >> >> > The changelog xattrs are named trusted.afr.virt_images-client-{1,2,3} in >> > the >> > getxattr outputs you have provided. >> > Did you do a remove-brick and add-brick any time? Otherwise it will be >> > trusted.afr.virt_images-client-{0,1,2} usually. >> >> Yes, the bricks was moved around initially; brick 0 was re-created as >> brick 2, and the arbiter was added later on as well. >> >> > >> > To overcome this scenario you can do what Ben Turner had suggested. >> > Select >> > the source copy and change the xattrs manually. >> >> I won't mind doing that, but again, the guides assume that I have >> trusted.afr.virt_images-client-{1,2,3} on all bricks, so I'm not sure >> what to change to what, where. >> >> >> > I am suspecting that it has hit the arbiter becoming source for data >> > heal >> > bug. But to confirm that we need the xattrs on the arbiter brick also. >> > >> > Regards, >> > Karthik >> > >> > >> > On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner@xxxxxxxxxx> wrote: >> >> >> >> Here is the process for resolving split brain on replica 2: >> >> >> >> >> >> >> >> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html >> >> >> >> It should be pretty much the same for replica 3, you change the xattrs >> >> with something like: >> >> >> >> # setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 >> >> /gfs/brick-b/a >> >> >> >> When I try to decide which copy to use I normally run things like: >> >> >> >> # stat /<path to brick>/pat/to/file >> >> >> >> Check out the access and change times of the file on the back end >> >> bricks. >> >> I normally pick the copy with the latest access / change times. I'll >> >> also >> >> check: >> >> >> >> # md5sum /<path to brick>/pat/to/file >> >> >> >> Compare the hashes of the file on both bricks to see if the data >> >> actually >> >> differs. If the data is the same it makes choosing the proper replica >> >> easier. >> >> The files on the bricks differ, so there was something changed, and >> not replicated. >> >> Thanks for the input, I've looked at that, but couldn't get it to fit, >> as I dont have trusted.afr.virt_images-client-{1,2,3} on all bricks. > > You can choose any one of the copy as good based on the latest ctime/mtime. > Before doing anything keep the backup of both the copies, so that if > something bad happens, > you will have the data safe. > Now choose one copy as good (based on timestamps/size/choosing a brick as > source), > and reset the xattrs set for that on other brick. Then do lookup on that > file from the mount. > That should resolve the issue. > Once you are done, please let us know the result. > > Regards, > Karthik >> >> >> >> >> >> Any idea how you got in this situation? Did you have a loss of NW >> >> connectivity? I see you are using server side quorum, maybe check the >> >> logs >> >> for any loss of quorum? I wonder if there was a loos of quorum and >> >> there >> >> was some sort of race condition hit: >> >> >> >> >> >> >> >> http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls >> >> >> >> "Unlike in client-quorum where the volume becomes read-only when quorum >> >> is >> >> lost, loss of server-quorum in a particular node makes glusterd kill >> >> the >> >> brick processes on that node (for the participating volumes) making >> >> even >> >> reads impossible." >> >> I might have had a loss of server quorum, but I cant seem to see >> exactly why or when from the logs: >> >> Times are synchronized between servers. Virt 3 was rebooted for >> service at 17:29:39. The shutdown logs show an issue with unmounting >> the bricks, probably because glusterd was still running: >> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/virt_images. >> Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount process >> exited, code=exited status=32 >> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/filserver. >> Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images. >> Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is Online. >> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered >> file-system server... >> Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name Resolution... >> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered >> file-system server. >> >> I believe it was around this time, the virtual machine (running on >> virt2) was stopped by qemu. >> >> >> Brick 1 (virt2) only experienced loss of quorum when starting gluster >> (glusterd.log confirms this): >> Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered >> file-system server... >> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472] C >> [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume filserver. Stopping local >> bricks. >> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666] C >> [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume virt_images. Stopping >> local bricks. >> Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered >> file-system server. >> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume filserver. Starting >> local bricks. >> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume virt_images. Starting >> local bricks. >> -- Reboot -- >> Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered >> file-system server... >> Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered >> file-system server. >> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume filserver. Starting >> local bricks. >> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume virt_images. Starting >> local bricks. >> >> >> Brick 2 (virt3) shows a network outage on the 19th, but everything >> worked fine afterwards: >> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.382207] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume filserver. Starting >> local bricks. >> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.387324] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume virt_images. Starting >> local bricks. >> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered >> file-system server... >> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered >> file-system server. >> -- Reboot -- >> Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered >> file-system server... >> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828] C >> [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume filserver. Stopping local >> bricks. >> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188] C >> [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume virt_images. Stopping >> local bricks. >> Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered >> file-system server. >> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume filserver. Starting >> local bricks. >> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume virt_images. Starting >> local bricks. >> Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered >> file-system server... >> Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered >> file-system server. >> -- Reboot -- >> Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered >> file-system server... >> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818] C >> [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume filserver. Stopping local >> bricks. >> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168] C >> [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume virt_images. Stopping >> local bricks. >> Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered >> file-system server. >> Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180] C >> [MSGID: 106001] >> [glusterd-volume-ops.c:1534:glusterd_op_stage_start_volume] >> 0-management: Server quorum not met. Rejecting operation. >> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume filserver. Starting >> local bricks. >> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203] C >> [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume virt_images. Starting >> local bricks. >> >> Brick 3 - arbiter (printserver) shows no loss of quorum at that time >> (again, glusterd.log confirms): >> Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a >> clustered file-system server... >> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19 >> 14:33:26.432369] C [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume filserver. Stopping local >> bricks. >> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19 >> 14:33:26.432606] C [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume virt_images. Stopping >> local bricks. >> Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a clustered >> file-system server. >> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19 >> 14:34:18.158756] C [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume filserver. Starting >> local bricks. >> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19 >> 14:34:18.162242] C [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume virt_images. Starting >> local bricks. >> Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a >> clustered file-system server... >> Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a clustered >> file-system server. >> -- Reboot -- >> Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a >> clustered file-system server... >> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20 >> 17:30:42.441675] C [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume filserver. Stopping local >> bricks. >> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20 >> 17:30:42.441929] C [MSGID: 106002] >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action] >> 0-management: Server quorum lost for volume virt_images. Stopping >> local bricks. >> Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a clustered >> file-system server. >> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20 >> 17:33:49.005534] C [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume filserver. Starting >> local bricks. >> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20 >> 17:33:49.008010] C [MSGID: 106003] >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action] >> 0-management: Server quorum regained for volume virt_images. Starting >> local bricks. >> >> >> >> >> I wonder if the killing of brick processes could have led to some sort >> >> of >> >> race condition where writes were serviced on one brick / the arbiter >> >> and not >> >> the other? >> >> >> >> If you can find a reproducer for this please open a BZ with it, I have >> >> been seeing something similar(I think) but I haven't been able to run >> >> the >> >> issue down yet. >> >> >> >> -b >> >> I'm not sure if I can replicate this, a lot has been going on in my >> setup the past few days (trying to tune some horrible small-file and >> file creation/deletion performance). >> >> Thanks for looking into this with me. >> >> Best regards, >> Henrik Juul Pedersen >> LIAB ApS > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users