Self-heal's behavior: problem on "replace" -- it leaves garbage.

keith at nttpc.co.jp (Keisuke TAKAHASHI) · Tue, 16 Dec 2008 15:12:21 +0900

Hi, Mr.Freedman.
Thanks for replying.

>At 09:26 PM 12/15/2008, Keisuke TAKAHASHI wrote:
>>Hi.
>>I'm using GlusterFS v1.3.12 (glusterfs-1.3.12.tar.gz) via FUSE 
>>(fuse-2.7.3glfs10.tar.gz) on CentOS 5.2 x86_64 (Linux kernel 
>>2.6.18-92.el5) now.
>>The nodes are HP Proliant DL360 G5 (as GlusterFS Client) and DL180 
>>G5 (as GlusterFS Servers).
>>And the connections are all TCP/IP on Gigabit ethernet.
>>
>>Then, I tested self-heal and I found a technical problem about 
>>"replace" -- self-heal after a node's fault and others' 
>>file-contents decreasing leaves garbage.
>>I would like you to show me ideas to resolve or avoid it.
>>
>>First, my GlusterFS's construction is following:
>>   - 1 GlusterFS Client (client) and 3 GlusterFS Servers 
>> (server1,server2,server3)
>>   - using cluster/unify to add GlusterFS Servers
>>   - using cluster/afr between 3 GlsuterFS Servers underneath the 
>> cluster/unify
>>   - namespace volume is on the GlusterFS Client
>>
>>So, self-heal will behave between server1, server2 and server3.
>>
>>Now, my self-healing procedure of fault scenario is following:
>>   (1) Each node is active and mount point on client is 
>> /mnt/glusterfs. The operating user on client is root.
>>   (2) Root creates fileA and fileBC on the client local directory 
>> (not on the mount point of FUSE)
>>       - fileA contains strings "aaa"
>>       - fileBC contains strings "bbb\nccc" (\n is line break.)
>>   (3) Root copies fileBC on /mnt/glusterfs.
>>   (4) Make server2 down. (# ifdown eth0)
>>   (5) Root redirects fileA into fileBC (# cat fileA > fileBC)
>>   (6) Make server2 up. (# ifup eth0)
>>   (7) Now, the status of fileBC on servers is below:
>>       - server1: fileBC contains "aaa", trusted.glusterfs.version is 3
>>       - server2: fileBC contains "bbb\nccc", trusted.glusterfs.version is 2
>>       - server3: fileBC contains "aaa", trusted.glusterfs.version is 3
>>   (8) Execute self-heal. (# find /mnt/glusterfs -type f -print0 | 
>> xargs -0 head -c1 >/dev/null)
>
>on which server did you run this.  it seems to matter for some reason 
>from what I can tell.  if it's run from the server that has the new 
>version alls well but otherwise, sometimes afr doesnt work (although 
>this is likely fixed in the newer versions, I haven't specifically tested)
>

I did it on client.
So (9) fileBC on server2 was self-healed.

>>   (9) Then, the status of fileBC on servers is below:
>>       - server1: fileBC contains "aaa", trusted.glusterfs.version is 3
>>       - server2: fileBC contains "aaa\nccc", trusted.glusterfs.version is 3
>>       - server3: fileBC contains "aaa", trusted.glusterfs.version is 3
>>
>>All right, fileBC on server2 was overwritten by others, but the 
>>result of "replace" seems in bit sequence (because original fileBC's 
>>"bbb" was replaced by "aaa" but "\nccc" was left).
>>In this case, the part of contents "\nccc" in fileBC on server2 looks 
>>garbage.
>>I would like self-heal to replace old file(s) with new file(s) completely.
>
>you actually wouldn't want this..  Imagine of the file were a 30GB 
>log file and all you really care about are the new bits.   what's 
>better is if it does an rsync like update of the file which it seems 
>to be doing but then forgetting to mark the end of file position.
>

I really understand it.
But, on my GlusterFS, intended data type or size, or usage, are not cut-and-dried now.
So I should estimate the case like this.

>>Can self-heal do it? Or is there any good idea to resolve it?
>
>I'd run your test with 1.4rc2 and see if you have the same problem.
>

Thanks a lot.
I also try it.

Regards,
Keisuke Takahashi

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Keisuke TAKAHASHI / NTTPC Communications,Inc.
   E-Mail: keith at NOSPAM.nttpc.co.jp
   http://www.nttpc.co.jp/english/index.html
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/