Alexey Filin wrote:
Hi Kevan,
consistency of afr'ed files is important question as of failures in
backend fs too, afr is a medicine against node failures not backend fs
ones (at least not directly), in the last case files can be changed
"legally" in bypass glusterfs by fsck after a hw/sw failure and the
changes have to be handled for corrupted replica, else reading of the
same file can give different data (especialy for forthcoming load
balanced read of replicas). Fortunately rsync'ing of original must
create consistent replica in the case too (if cluster/stripe under afr
works equally with replicas), unfortunately extended attributes aren't
rsync'ed (I tested it) what can be required during repairing.
It seems glusterfs could try to handle hw/sw failures in backend fs
with checksums in extended attributes and checksums are to be
calculated for file chunks (because one checksum requires full
recalculation after appending/changing of one byte to/in a gigabyte
file) in the case glusterfs has to recalculate checksums of all files
on corrupted fs (may be toooo long, it is the same case with
rsync'ing) or get list of corrupted files from backend fs in some way
(e.g. with a flag set by fsck in extended attributes). May be some
kind of distributed raid is a better solution, first step in the
direction was done already by cluster/stripe (unfortunately one of
implementations, DDRaid http://sources.redhat.com/cluster/ddraid/ by
Daniel Phillips seems to be suspended), perhaps it is too
computational/network intensive and raid under backend fs is the best
solution even taking into account disk space overhead.
I'm very interested to hear thoughts about it from glusterfs
developers to clear my misunderstanding.
The rsync case can probably be handled through a separate find of the
appropriate attributes on the source and set on the target. A simple
bash/perl script could handle this in a few lines.
The fsck case is more interesting, but if you could get fsck to report
file/directory names that have problems and not fix them, it's easy to
pipe that to a script to remove the trusted.afr.version attribute on the
files and then the AFR will heal itself.
Checksums would of course give you much better tracking of corrupted
files, but I imagine the cpu strain and speed decrease would make it
non-feasible.
--
-Kevan Benson
-A-1 Networks