When searching for possible causes of the wrong self healing I ran into this code: /xlators/cluster/afr/src/afr-self-heal-common.c: if (type == AFR_SELF_HEAL_DATA) { size_differs = afr_sh_mark_if_size_differs (sh, child_count); } if (afr_sh_all_nodes_innocent (characters, child_count)) { if (size_differs) { nsources = afr_sh_mark_biggest_as_source (sh, child_count); } } else if (afr_sh_wise_nodes_exist (characters, child_count)) { afr_sh_compute_wisdom (pending_matrix, characters, child_count); if (afr_sh_wise_nodes_conflict (characters, child_count)) { /* split-brain */ nsources = -1; goto out; } else { nsources = afr_sh_mark_wisest_as_sources (sources, characters, child_count); } } else { nsources = afr_sh_mark_biggest_fool_as_source (sh, characters, child_count); } afr_sh_mark_biggest_as_source seems to be doing exactly what it says, it looks at the filesize. Can someone with more brain please elaborate what kind of a healing case can depend on the file size? Really, I can see no way how this can work out. The latest copy of a file can be either bigger or smaller in size, nevertheless the only valid way of choosing is its modification date, and never ever the size. Is there some general misunderstanding in my thinking and reading the code? -- Regards, Stephan On Tue, 23 Mar 2010 15:03:17 +0100 Stephan von Krawczynski <skraw@xxxxxxxxxx> wrote: > Let me show you this further information for one file falsly self-healed: > > server1: > > # getfattr -d -m '.*' -e hex <filename> > getfattr: Removing leading '/' from absolute path names > # file: <filename> > trusted.afr.remote1=0x000000000000000000000000 > trusted.afr.remote2=0x000000000000000000000000 > trusted.posix.gen=0x4b9bb33c00001be6 > > # stat <filename> > File: <filename> > Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei > Device: 804h/2052d Inode: 16560280 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > Access: 2010-03-23 11:10:36.000000000 +0100 > Modify: 2010-03-23 00:32:25.000000000 +0100 > Change: 2010-03-23 12:36:40.000000000 +0100 > > > server2: > > # getfattr -d -m '.*' -e hex <filename> > getfattr: Removing leading '/' from absolute path names > # file: <filename> > trusted.afr.remote1=0x000000000000000000000000 > trusted.afr.remote2=0x000000000000000000000000 > trusted.posix.gen=0x4b9bb2f600001be6 > > # stat <filename> > File: <filename> > Size: 4024 Blocks: 8 IO Block: 4096 reguläre Datei > Device: 804h/2052d Inode: 42762291 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > Access: 2010-03-23 11:10:36.000000000 +0100 > Modify: 2010-03-23 14:32:23.000000000 +0100 > Change: 2010-03-23 14:32:23.000000000 +0100 > > > As you can see the latest file version is on server2 (modify date) and is _smaller_ in size. > > Now on client 2 a ls shows interesting values: > > # ls -l <filename> > -rw-r--r-- 1 root root 4509 Mar 23 14:37 <filename> > > As you can see here, the file date looks increased and the size clearly shows that self-heal went wrong. > > Consequently the server2 copy now looks like: > > # stat <filename> > File: <filename> > Size: 4509 Blocks: 16 IO Block: 4096 reguläre Datei > Device: 804h/2052d Inode: 42762291 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > Access: 2010-03-23 11:10:36.000000000 +0100 > Modify: 2010-03-23 00:32:25.000000000 +0100 > Change: 2010-03-23 14:41:13.000000000 +0100 > > Modification date went back and file size is increased, so the older file version was choosen to overwrite the newer one. > > -- > Regards, > Stephan > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >