On 06/27/2015 02:32 PM, Raghavendra Bhat wrote:
Hi,
There is a patch that is submitted for review to deny access to
objects which are marked as bad by scrubber (i.e. the data of the
object might have been corrupted in the backend).
http://review.gluster.org/#/c/11126/10
http://review.gluster.org/#/c/11389/4
The above 2 patch sets solve the problem of denying access to the bad
objects (they have passed regression and received a +1 from venky).
But in our testing we found that there is a race window (depending
upon the scrubber frequency the race window can be larger) where there
is a possibility of self-heal daemon healing the contents of the bad
file before scrubber can mark it as bad.
I am not sure if the data truly gets corrupted in the backend, there
is a chance of hitting this issue. But in our testing to simulate
backend corruption we modify the contents of the file directly in the
backend. Now in this case, before the scrubber can mark the object as
bad, the self-heal daemon kicks in and heals the contents of the bad
file to the good copy. Or before the scrubber marks the file as bad,
if the client accesses it AFR finds that there is a mismatch in
metadata (since we modified the contents of the file in the backend)
and does data and metadata self-healing, thus copying the contents of
the bad copy to good copy. And from now onwards the clients accessing
that object always gets bad data.
I understand from Ravi (ranaraya@) that AFR-v2 would chose the "biggest"
file as the source, provided that afr xattrs are "clean" (AFR-v1 would
give back EIO). If a file is modified directly from the brick but leaves
the size unchanged, contents can be served from either copy. For
self-heal to detect anomalies, there needs to be verification
(checksum/signature) at each stage of it's operation. But this might be
too heavy on the I/O side. We could still cache mtime [but update on
client I/O] after pre-check, but this still would not catch bit flips
(unless a filesystem scrub is done).
Thoughts?
Pranith?Do you have any solution for this? Venky and me are trying to
come up with a solution for this.
But does this issue block the above patches in anyway? (Those 2
patches are still needed to deny access to objects once they are
marked as bad by scrubber).
Regards,
Raghavendra Bhat
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel