Re: self heal problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When searching for possible causes of the wrong self healing I ran into this
code:

/xlators/cluster/afr/src/afr-self-heal-common.c:

       if (type == AFR_SELF_HEAL_DATA) {
                size_differs = afr_sh_mark_if_size_differs (sh, child_count);
        }

        if (afr_sh_all_nodes_innocent (characters, child_count)) {
                if (size_differs) {
                        nsources = afr_sh_mark_biggest_as_source (sh,
                                                                  child_count);
                }

        } else if (afr_sh_wise_nodes_exist (characters, child_count)) {
                afr_sh_compute_wisdom (pending_matrix, characters,
child_count);

                if (afr_sh_wise_nodes_conflict (characters, child_count)) {
                        /* split-brain */
 
                        nsources = -1;
                        goto out;

                } else {
                        nsources = afr_sh_mark_wisest_as_sources (sources,
                                                                  characters,
                                                                  child_count);
                }
        } else {
                nsources = afr_sh_mark_biggest_fool_as_source (sh, characters,
                                                               child_count);
        }

afr_sh_mark_biggest_as_source seems to be doing exactly what it says, it looks
at the filesize. Can someone with more brain please elaborate what kind of a
healing case can depend on the file size? Really, I can see no way how this
can work out. The latest copy of a file can be either bigger or smaller in
size, nevertheless the only valid way of choosing is its modification date, and
never ever the size. Is there some general misunderstanding in my thinking and
reading the code?

--
Regards,
Stephan





On Tue, 23 Mar 2010 15:03:17 +0100
Stephan von Krawczynski <skraw@xxxxxxxxxx> wrote:

> Let me show you this further information for one file falsly self-healed:
> 
> server1:
> 
> # getfattr -d -m '.*' -e hex <filename>
> getfattr: Removing leading '/' from absolute path names
> # file: <filename>
> trusted.afr.remote1=0x000000000000000000000000
> trusted.afr.remote2=0x000000000000000000000000
> trusted.posix.gen=0x4b9bb33c00001be6
> 
> # stat <filename>
>   File: <filename>
>   Size: 4509            Blocks: 16         IO Block: 4096   reguläre Datei
> Device: 804h/2052d      Inode: 16560280    Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2010-03-23 11:10:36.000000000 +0100
> Modify: 2010-03-23 00:32:25.000000000 +0100
> Change: 2010-03-23 12:36:40.000000000 +0100
> 
> 
> server2:
> 
> # getfattr -d -m '.*' -e hex <filename>
> getfattr: Removing leading '/' from absolute path names
> # file: <filename>
> trusted.afr.remote1=0x000000000000000000000000
> trusted.afr.remote2=0x000000000000000000000000
> trusted.posix.gen=0x4b9bb2f600001be6
> 
> # stat <filename>
>   File: <filename>
>   Size: 4024            Blocks: 8          IO Block: 4096   reguläre Datei
> Device: 804h/2052d      Inode: 42762291    Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2010-03-23 11:10:36.000000000 +0100
> Modify: 2010-03-23 14:32:23.000000000 +0100
> Change: 2010-03-23 14:32:23.000000000 +0100
> 
> 
> As you can see the latest file version is on server2 (modify date) and is _smaller_ in size.
> 
> Now on client 2 a ls shows interesting values:
> 
> # ls -l <filename>
> -rw-r--r--  1 root root 4509 Mar 23 14:37 <filename>
> 
> As you can see here, the file date looks increased and the size clearly shows that self-heal went wrong.
> 
> Consequently the server2 copy now looks like:
> 
> # stat <filename>
>   File: <filename>
>   Size: 4509            Blocks: 16         IO Block: 4096   reguläre Datei
> Device: 804h/2052d      Inode: 42762291    Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2010-03-23 11:10:36.000000000 +0100
> Modify: 2010-03-23 00:32:25.000000000 +0100
> Change: 2010-03-23 14:41:13.000000000 +0100
> 
> Modification date went back and file size is increased, so the older file version was choosen to overwrite the newer one.
> 
> -- 
> Regards,
> Stephan
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 





[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux