Ok, a little more testing .. wait, it gets better! I now have a x10 stripe. 10 stripes on node a - 10 AFR's - 1 x10 Stripe 10 stripes on node b - (for self heal) (to divide the heal chunk size) Sample; a. create 500M file b. take down one glusterfsd process c. append 2M to file d. bring glusterfsd back up e. head -c1 on file Problem #1; On a self-heal, it does a self-heal on every stripe, regardless of the fact I only appended 2M. Problem #2; Self-heal ignored the fact the file is sparse and copies the entire sripe .. so for a 500M file, the healing process actually copies 5Gb!! Help! Bug! Here's a client config summary; (server condig is fairly obvious) ... volume stripes-stripe type cluster/stripe subvolumes afr-1 afr-2 afr-3 afr-4 afr-5 afr-6 afr-7 afr-8 afr-9 afr-10 option block-size *:1MB end-volume ... volume afr-1 type cluster/afr subvolumes node1A node1B option replicate *:2 option scheduler rr end-volume ... volume node1A type protocol/client option transport-type tcp/client option remote-host nodea option remote-subvolume stripes-1A end-volume volume node1B type protocol/client option transport-type tcp/client option remote-host nodez option remote-subvolume stripes-1B end-volume ----- Original Message ----- From: "Csibra Gergo" <gergo@xxxxxxxxx> To: "Gareth Bult" <gareth@xxxxxxxxxxxxx> Sent: Friday, December 28, 2007 3:47:03 PM (GMT) Europe/London Subject: Re: Choice of Translator question Friday, December 28, 2007, 3:57:52 PM, Gareth Bult wrote: >>Oh. I don't understand this, can you explain why need to change configs regularly? > To add new systems, Add new systems to...? To the server? Adding new clients? > install newer versions of fuse and glusterfs, This and the upper is not a reason to rebuild/remirror afr-ed files. If a new version of glusterfs or fuse comes out or need to add new systems to anywhere in glusterfs system, you make a shutdown (unmount , kill glusterfsd (this make actually a regular sgutdown)) install new version and restart the glusterfs. The xattrs (they contains version informations) of mirrored files will be the same before and after version changes, so they will not remirrored. > to recover from fuse kernel lockups .. Yes. If this happens in write, yes. In this situation AFR-ed files need to be healed. > .. I've not yet seen raid/afr comments from anyone who actually > understands the problem, so I'm not likely to see a fix (?!) I understand this problem, but this because of bugs. Peoples in Z research working hard to fix them, and you gives lot information if you gives bugreports :) -- Best regards, Csibra Gergo mailto:gergo@xxxxxxxxx