On 07/08/2015 12:06 PM, Ravishankar N wrote:
On 07/08/2015 11:42 AM, Raghavendra Bhat wrote:
Adding the correct gluster-devel id.
Regards,
Raghavendra Bhat
On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:
Hi,
In bit-rot feature, the scrubber marks the corrupted (objects whose
data has gone bad) as bad objects (via extended attribute). If the
volume is a replicate volume and a object in one of the replicas
goes bad. In this case, the client is able to see the data via the
good copy present in the other replica. But as of now, the self-heal
does not heal the bad objects. So the method to heal the bad object
is to remove the bad object directly from the backend and let
self-heal take care of healing it from the good copy.
The above method has a problem. The bit-rot-stub xlator sitting in
the brick graph, remembers an object as bad in its inode context
(either when the object was being marked bad by scrubber, or during
the first lookup of the object if it was already marked bad).
Bit-rot-stub uses that info to block any read/write operations on
such bad objects. So it blocks any kind of operation attempted by
self-heal as well to correct the object (the object was deleted
directly in the backend, so the in memory inode will still be
present and considered valid).
There are 2 methods that I think can solve the issue.
1) In server_lookup_cbk, if the lookup of a object fails due to
ENOENT *AND* the lookup is a revalidate lookup, then forget the
inode associated with that object (not just unlinking the dentry,
forget the inode as well iff there are no more dentries associated
with the inode). Atleast this way, the inode would be forgotten, and
later when self-heal wants to correct the object, it has to create a
new object (the object was removed directly from the backend), which
has to happen with the creation of a new in memory inode and
read/write operations by self-heal daemon will not be blocked.
I have sent a patch for review for the above method:
http://review.gluster.org/#/c/11489/
OR
2) Do not block write operations coming on the bad object if the
operation is coming from self-heal and allow it to completely heal
the file and once healing is done, remove the bad-object information
from the inode context.
The requests coming from self-heal demon can be identified by
checking the pid of it (it has -ve pid). But if the self-heal
happening from the glusterfs client itself, I am not sure whether
self-heal happens with a -ve pid for the frame or the same pid as
that of the frame of the original fop which triggered the self-heal.
Pranith? Can you clarify this?
For afr-v2, the heals that happen via the client happen in a synctask
with the same negative pid (GF_CLIENT_PID_AFR_SELF_HEALD) as the
selfheal daemon.
I think approach 1 is better as it is independent of who does the heal
(not sure if what the pid/ behavior is with disperse volume heals) and
it makes sense to forget the inode when the corresponding file is no
longer present in the back-end.
+1 for approach #1. Also, it might be beneficial for other xlators which
might want to freshly initiate inode context in such scenarios.
Thanks,
Ravi
Please provide feedback.
Regards,
Raghavendra Bhat
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel