Hi Kevan, consistency of afr'ed files is important question as of failures in backend fs too, afr is a medicine against node failures not backend fs ones (at least not directly), in the last case files can be changed "legally" in bypass glusterfs by fsck after a hw/sw failure and the changes have to be handled for corrupted replica, else reading of the same file can give different data (especialy for forthcoming load balanced read of replicas). Fortunately rsync'ing of original must create consistent replica in the case too (if cluster/stripe under afr works equally with replicas), unfortunately extended attributes aren't rsync'ed (I tested it) what can be required during repairing. It seems glusterfs could try to handle hw/sw failures in backend fs with checksums in extended attributes and checksums are to be calculated for file chunks (because one checksum requires full recalculation after appending/changing of one byte to/in a gigabyte file) in the case glusterfs has to recalculate checksums of all files on corrupted fs (may be toooo long, it is the same case with rsync'ing) or get list of corrupted files from backend fs in some way (e.g. with a flag set by fsck in extended attributes). May be some kind of distributed raid is a better solution, first step in the direction was done already by cluster/stripe (unfortunately one of implementations, DDRaid http://sources.redhat.com/cluster/ddraid/ by Daniel Phillips seems to be suspended), perhaps it is too computational/network intensive and raid under backend fs is the best solution even taking into account disk space overhead. I'm very interested to hear thoughts about it from glusterfs developers to clear my misunderstanding. Regards, Alexey. On 10/16/07, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote: > > > When an afr encounters a file that exists on multiple shares that > doesn't have the trusted.afr.version set, it sets that attribute for sll > the files and assumes they contain the same data. > > I.e. if you manually create the files on the servers directly and with > different content, appending to the file through the client will set the > trusted.afr.version for both files, and append to both files, but the > files still contain different content (the content from before the > append). > > Now, this would be really hard to replicate without this arbitrary > example, it would probably require a write fail to all afr subvolumes, > possibly at different times of the write operation, in which case the > file content can't be trusted anyways, so it's really not a big deal. I > only mention it in case it might not be the desired behavior, and > because it might be useful to have the first specified afr subvolume > supply the file to the others in the case that none has the > trusted.afr.version attribute set in cases of pre-populating the share > (such as rsyncs from a dynamic source). The problem is easily mitigated > (rsync to a single share and trigger a self-heal or rsync to the client > mount point), I just figured I'd mention it, and that's only required if > you really NEED pre-population of data. > > -- > > -Kevan Benson > -A-1 Networks > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >