Re: afr logic

Kevan Benson <kbenson@xxxxxxxxxxxxxxx> · Wed, 17 Oct 2007 09:38:25 -0700

Alexey Filin wrote:
Hi Kevan,

consistency of afr'ed files is important question as of failures in 
backend fs too, afr is a medicine against node failures not backend fs 
ones (at least not directly), in the last case files can be changed 
"legally" in bypass glusterfs by fsck after a hw/sw failure and the 
changes have to be handled for corrupted replica, else reading of the 
same file can give different data (especialy for forthcoming load 
balanced read of replicas). Fortunately rsync'ing of original must 
create consistent replica in the case too (if cluster/stripe under afr 
works equally with replicas), unfortunately extended attributes aren't 
rsync'ed (I tested it) what can be required during repairing.

It seems glusterfs could try to handle hw/sw failures in backend fs 
with checksums in extended attributes and checksums are to be 
calculated for file chunks (because one checksum requires full 
recalculation after appending/changing of one byte to/in a gigabyte 
file) in the case glusterfs has to recalculate checksums of all files 
on corrupted fs (may be toooo long, it is the same case with 
rsync'ing) or get list of corrupted files from backend fs in some way 
(e.g. with a flag set by fsck in extended attributes). May be some 
kind of distributed raid is a better solution, first step in the 
direction was done already by cluster/stripe (unfortunately one of 
implementations, DDRaid http://sources.redhat.com/cluster/ddraid/ by 
Daniel Phillips seems to be suspended), perhaps it is too 
computational/network intensive and raid under backend fs is the best 
solution even taking into account disk space overhead.

I'm very interested to hear thoughts about it from glusterfs 
developers to clear my misunderstanding.

The rsync case can probably be handled through a separate find of the 
appropriate attributes on the source and set on the target.  A simple 
bash/perl script could handle this in a few lines.

The fsck case is more interesting, but if you could get fsck to report 
file/directory names that have problems and not fix them, it's easy to 
pipe that to a script to remove the trusted.afr.version attribute on the 
files and then the AFR will heal itself.

Checksums would of course give you much better tracking of corrupted 
files, but I imagine the cpu strain and speed decrease would make it 
non-feasible.

--

-Kevan Benson
-A-1 Networks