Sorry, I had manually sync due to imminent server upgrades. 50 min. after the initial sync I was asked to bring the servers in a safe state for an upgrade and did a manual "touch-on-server13-client-mountpoint" which triggered an immediate self-heal on the rest of the files. All files were in sync across all four server after this action. Will run this command next time!! Best, Martin Am 29.04.2011 19:30, schrieb Pranith Kumar. Karampuri: > hi Martin, > Could you please send the output of -m "trusted*" instead of "trusted.afr" for the remaining 24 files from both the servers. I would like to see the gfids of these files on both the machines. > > Pranith. > ----- Original Message ----- > From: "Martin Schenker"<martin.schenker at profitbricks.com> > To: gluster-users at gluster.org > Sent: Friday, April 29, 2011 8:39:46 PM > Subject: Server outage, file sync/self-heal doesn't sync ALL files?! > > Hi all! > > We have another incident over here. > > One of the servers (pserver12) in a pair (12& 13) has been rebooted. > pserver13 showed 63 files not in sync after the outage for 2h. > > Both server are clients as well. > > Starting pserver12 brought up the self-heal mechanism, but only 39 files > were triggered within the first 10 min. Now the system seems dormant and > 24 files are left hanging. > > On the other three servers no inconsistencies are seen. > > tail of client log file: > > 2011-04-29 14:48:23.820022] I > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > 0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of > 22736 were different (8.62%) > [2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain] > 0-storage0-replicate-2: invalid argument: inode > [2011-04-29 14:48:23.887740] I > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > 0-storage0-replicate-2: background data self-heal completed on > /pserver13-17 > [2011-04-29 14:48:24.272220] I > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > 0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of > 22744 were different (8.62%) > [2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain] > 0-storage0-replicate-2: invalid argument: inode > [2011-04-29 14:48:24.341959] I > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > 0-storage0-replicate-2: background data self-heal completed on > /pserver13-19 > [2011-04-29 14:48:24.758131] I > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > 0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of > 22752 were different (8.58%) > [2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain] > 0-storage0-replicate-2: invalid argument: inode > [2011-04-29 14:48:24.766137] I > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > 0-storage0-replicate-2: background data self-heal completed on > /pserver13-23 > [2011-04-29 14:48:24.884613] I > [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done] > 0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of > 22760 were different (8.58%) > [2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain] > 0-storage0-replicate-2: invalid argument: inode > [2011-04-29 14:48:24.895721] I > [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk] > 0-storage0-replicate-2: background data self-heal completed on > /pserver13-10 > 0 root at pserver13:/var/log/glusterfs # date > Fri Apr 29 15:08:18 UTC 2011 > > > Search for mismatch: > > 0 root at pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." > /mnt/gluster/brick?/storage | grep -v 0x000000000000000000000000 | grep > -B1 -A1 trusted | grep -c file > getfattr: Removing leading '/' from absolute path names > *24* > > > 0 root at pserver13:~ # getfattr -R -d -e hex -m "trusted.afr." > /mnt/gluster/brick?/storage | grep -v 0x000000000000000000000000 | grep > -B1 trusted > getfattr: Removing leading '/' from absolute path names > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-33 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-26 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: > mnt/gluster/brick0/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images > trusted.afr.storage0-client-4=0x000000000000001600000001 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-24 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-8 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-21 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-22 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-30 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-20 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-9 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-38 > trusted.afr.storage0-client-4=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-18 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-2 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-23 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-4 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-3 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-34 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-37 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-12 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-27 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: > mnt/gluster/brick1/storage/images/1831/9a039a81-60fe-5fa3-f562-8f6d3828382b/hdd-images/13169 > trusted.afr.storage0-client-6=0x100000020000000000000000 > -- > # file: > mnt/gluster/brick1/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images > trusted.afr.storage0-client-6=0x000000000000001600000002 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-25 > trusted.afr.storage0-client-6=0x270000010000000000000000 > -- > # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-7 > trusted.afr.storage0-client-6=0x270000010000000000000000 > > > > I could trigger manually but why isn't the sync/self-heal not working on > all files shown as inconsistent? Or am I assuming something wrongly here?!? > > Best, Martin > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >