Re: Bit-rot functionality

Jeff Darcy <jdarcy@xxxxxxxxxx> · Mon, 23 Jul 2012 09:30:00 -0400

On 07/23/2012 05:37 AM, Fred van Zwieten wrote:
> Bit-rot detection can be done through check-summing. It should be a very low
> priority job running on one of the bricks. The job walks the complete file
> system and, per file, calculates the check-sum, compares it with the stored
> check-sum (if present, otherwise it stores the check-sum on all involved
> bricks, because it hasn't been checked before).

I think this is a basically good idea, but I think it could be implemented more
efficiently if we ran processes on *all* bricks, each one calculating checksums
for the files in that brick.  That way all disk accesses are local, which is
important because this kind of "crawl" can take a long time.  We could also
take advantage of the marker/xtime framework to reduce the number of files we
have to check, just like we already use that framework in gsyncd to reduce the
number of files that must be replicated.  Another possibility would be to have
a translator queue a check when a file is closed.

> Bit-rot restoration could be implemented by comparing the check-sums of the
> replicas. If there is a mismatch, a more thorough check must be performed, like
> running a check-sum on all replica's for that file again, do
> a bit-wise compare, or whatever. If the files are still the same,
> the check-sum(s) must be replaced. If not, there is actual bit-rot detected.
> Now what to do? Which replica holds the clean version (the thruth?). With an
> uneven number of replicas one could simply make it a democratic process and
> have it fully automated. It should however save the to be replaced version in a
> separate store and notify the admin for verification. Another method would be
> to just notify the admin and do nothing.

If we detect bit-rot on a file, it's almost the same as if we detect pending
operations, and many of the same resolution strategies would apply.  If we have
another replica that's "clean" in either sense we can use it as the source.  If
all replicas have rotted, then it's equivalent to split brain.