Re: healing of bad objects (marked by scrubber)

Venky Shankar <vshankar@xxxxxxxxxx> · Wed, 08 Jul 2015 14:29:13 +0530

On 07/08/2015 12:06 PM, Ravishankar N wrote:

On 07/08/2015 11:42 AM, Raghavendra Bhat wrote:
Adding the correct gluster-devel id.

Regards,
Raghavendra Bhat

On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:

Hi,

In bit-rot feature, the scrubber marks the corrupted (objects whose 
data has gone bad) as bad objects (via extended attribute). If the 
volume is a replicate volume and a object in one of the replicas 
goes bad. In this case, the client is able to see the data via the 
good copy present in the other replica. But as of now, the self-heal 
does not heal the bad objects.  So the method to heal the bad object 
is to remove the bad object directly from the backend and let 
self-heal take care of healing it from the good copy.

The above method has a problem. The bit-rot-stub xlator sitting in 
the brick graph, remembers an object as bad in its inode context 
(either when the object was being marked bad by scrubber, or during 
the first lookup of the object if it was already marked bad). 
Bit-rot-stub uses that info to block any read/write operations on 
such bad objects. So it blocks any kind of operation attempted by 
self-heal as well to correct the object (the object was deleted 
directly in the backend, so the in memory inode will still be 
present and considered valid).

There are 2 methods that I think can solve the issue.

1) In server_lookup_cbk, if the lookup of a object fails due to 
ENOENT  *AND*  the lookup is a revalidate lookup, then forget the 
inode associated with that object (not just unlinking the dentry, 
forget the inode as well iff there are no more dentries associated 
with the inode). Atleast this way, the inode would be forgotten, and 
later when self-heal wants to correct the object, it has to create a 
new object (the object was removed directly from the backend), which 
has to happen with the creation of a new in memory inode and 
read/write operations by self-heal daemon will not be blocked.
I have sent a patch for review for the above method:
http://review.gluster.org/#/c/11489/

OR

2) Do not block write operations coming on the bad object if the 
operation is coming from self-heal and allow it to completely heal 
the file and once healing is done, remove the bad-object information 
from the inode context.
The requests coming from self-heal demon can be identified by 
checking the pid of it (it has -ve pid). But if the self-heal 
happening from the glusterfs client itself, I am not sure whether 
self-heal happens with a -ve pid for the frame or the same pid as 
that of the frame of the original fop which triggered the self-heal. 
Pranith? Can you clarify this?

For afr-v2, the heals that happen via the client happen in  a synctask 
with the same negative pid (GF_CLIENT_PID_AFR_SELF_HEALD) as the 
selfheal daemon.
I think approach 1 is better as it is independent of who does the heal 
(not sure if what the pid/ behavior is with disperse volume heals) and 
it makes sense to forget the inode when the corresponding file is no 
longer present in the back-end.

+1 for approach #1. Also, it might be beneficial for other xlators which 
might want to freshly initiate inode context in such scenarios.

Thanks,
Ravi

Please provide feedback.

Regards,
Raghavendra Bhat

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel