Hi Stephan, GlusterFS keeps track if an operation happened on one copy but not on the replica, in case a replica was not accessible. From the attributes remote1 and remote2, it shows that there is no pending operation on the other replica. >From the attributes you have shown it seems that you have gone to the backend directly, bypassed glusterfs, and hand crafted such a situation. The way the code is written, we do not think that we can reach the state you have shown in your example. The remote1 and remote2 attributes show all zeroes which means that there were no operations pending on any server. If not hand crafted, then please give the detailed testcase which can lead to this situation based on just filesize. If this situation was handcrafted then it would be akin to overwriting the section of a disk which carries the metadata of a filesystem and then claiming that the FS is getting corrupted. Please see the other code around the one you have pointed in the other mail and you can see the other higher order checks that are made. Regards, Tejas. ----- Original Message ----- From: "Stephan von Krawczynski" <skraw@xxxxxxxxxx> To: gluster-devel@xxxxxxxxxx Sent: Tuesday, March 23, 2010 7:33:17 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: self heal problem Let me show you this further information for one file falsly self-healed: server1: # getfattr -d -m '.*' -e hex <filename> getfattr: Removing leading '/' from absolute path names # file: <filename> trusted.afr.remote1=0x000000000000000000000000 trusted.afr.remote2=0x000000000000000000000000 trusted.posix.gen=0x4b9bb33c00001be6 # stat <filename> File: <filename> Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei Device: 804h/2052d Inode: 16560280 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2010-03-23 11:10:36.000000000 +0100 Modify: 2010-03-23 00:32:25.000000000 +0100 Change: 2010-03-23 12:36:40.000000000 +0100 server2: # getfattr -d -m '.*' -e hex <filename> getfattr: Removing leading '/' from absolute path names # file: <filename> trusted.afr.remote1=0x000000000000000000000000 trusted.afr.remote2=0x000000000000000000000000 trusted.posix.gen=0x4b9bb2f600001be6 # stat <filename> File: <filename> Size: 4024 Blocks: 8 IO Block: 4096 reguläre Datei Device: 804h/2052d Inode: 42762291 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2010-03-23 11:10:36.000000000 +0100 Modify: 2010-03-23 14:32:23.000000000 +0100 Change: 2010-03-23 14:32:23.000000000 +0100 As you can see the latest file version is on server2 (modify date) and is _smaller_ in size. Now on client 2 a ls shows interesting values: # ls -l <filename> -rw-r--r-- 1 root root 4509 Mar 23 14:37 <filename> As you can see here, the file date looks increased and the size clearly shows that self-heal went wrong. Consequently the server2 copy now looks like: # stat <filename> File: <filename> Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei Device: 804h/2052d Inode: 42762291 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2010-03-23 11:10:36.000000000 +0100 Modify: 2010-03-23 00:32:25.000000000 +0100 Change: 2010-03-23 14:41:13.000000000 +0100 Modification date went back and file size is increased, so the older file version was choosen to overwrite the newer one. -- Regards, Stephan _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel