On 03/20/2017 06:31 PM, Bernhard Dübi
wrote:
Hi Ravi,
thank you very much for looking into this
The gluster volumes are used by CommVault Simpana to store
backup data. Nothing/Nobody should access the underlying
infrastructure.
while looking at the xattrs of the files, I noticed that the
only difference was the bit-rot.version. So, I assume that
something in the synchronization of the bit-rot data went
wrong and having different bit-rot.versions is considered like
a split-brain situation and access is denied because there is
no guarantee of correctness. this is just a wild guess.
Hi Bernhard,
bit-rot version can be different between bricks of the replica when
I/O is successful only on one brick of the replica when the other
brick was down. (though AFR self-heal will later heal the contents,
but not modify bitrot xattrs). So that is not a problem.
over the weekend I identified hundreds of files with
input/output errors. I compared the sha256sum of both bricks,
they were always the same. I then deleted the affected files
from gluster and recreated them. this should have fixed the
issue. Verification is still running.
if you're interested in the root cause, I can send you more
log files and the xattrs of some files
If you did not access the underlying bricks directly like you said
then it could possibly be a bitrot bug. If you don't mind please
raise a BZ under the bitrot component and the appropriate gluster
version with all client and brick logs attached.
Also if you do have some kind of reproducer, that would help a lot.
-Ravi
|
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users