Re: afr logic

"Leonardo Rodrigues de Mello" <l@xxxxxxxxxxxxx> · Wed, 17 Oct 2007 13:09:56 -0300

distributed raid was abandoned by redhat.
the only available version is for kernel 2.6.11

best regards

2007/10/17, Alexey Filin <alexey.filin@xxxxxxxxx>:
> Hi Kevan,
>
> consistency of afr'ed files is important question as of failures in backend
> fs too, afr is a medicine against node failures not backend fs ones (at
> least not directly), in the last case files can be changed "legally" in
> bypass glusterfs by fsck after a hw/sw failure and the changes have to be
> handled for corrupted replica, else reading of the same file can give
> different data (especialy for forthcoming load balanced read of replicas).
> Fortunately rsync'ing of original must create consistent replica in the case
> too (if cluster/stripe under afr works equally with replicas), unfortunately
> extended attributes aren't rsync'ed (I tested it) what can be required
> during repairing.
>
> It seems glusterfs could try to handle hw/sw failures in backend fs with
> checksums in extended attributes and checksums are to be calculated for file
> chunks (because one checksum requires full recalculation after
> appending/changing of one byte to/in a gigabyte file) in the case glusterfs
> has to recalculate checksums of all files on corrupted fs (may be toooo
> long, it is the same case with rsync'ing) or get list of corrupted files
> from backend fs in some way (e.g. with a flag set by fsck in extended
> attributes). May be some kind of distributed raid is a better solution,
> first step in the direction was done already by cluster/stripe
> (unfortunately one of implementations, DDRaid
> http://sources.redhat.com/cluster/ddraid/ by Daniel Phillips seems to be
> suspended), perhaps it is too computational/network intensive and raid under
> backend fs is the best solution even taking into account disk space
> overhead.
>
> I'm very interested to hear thoughts about it from glusterfs developers to
> clear my misunderstanding.
>
> Regards, Alexey.
>
> On 10/16/07, Kevan Benson <kbenson@xxxxxxxxxxxxxxx> wrote:
> >
> >
> > When an afr encounters a file that exists on multiple shares that
> > doesn't have the trusted.afr.version set, it sets that attribute for sll
> > the files and assumes they contain the same data.
> >
> > I.e. if you manually create the files on the servers directly and with
> > different content, appending to the file through the client will set the
> > trusted.afr.version for both files, and append to both files, but the
> > files still contain different content (the content from before the
> > append).
> >
> > Now, this would be really hard to replicate without this arbitrary
> > example, it would probably require a write fail to all afr subvolumes,
> > possibly at different times of the write operation, in which case the
> > file content can't be trusted anyways, so it's really not a big deal.  I
> > only mention it in case it might not be the desired behavior, and
> > because it might be useful to have the first specified afr subvolume
> > supply the file to the others in the case that none has the
> > trusted.afr.version attribute set in cases of pre-populating the share
> > (such as rsyncs from a dynamic source).  The problem is easily mitigated
> > (rsync to a single share and trigger a self-heal or rsync to the client
> > mount point), I just figured I'd mention it, and that's only required if
> > you really NEED pre-population of data.
> >
> > --
> >
> > -Kevan Benson
> > -A-1 Networks
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxx
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
Leonardo Rodrigues de Mello
jabber: l@xxxxxxxxxxxxx