Re: question on glustershd

Ravishankar N <ravishankar@xxxxxxxxxx> · Wed, 03 Dec 2014 12:43:29 +0530

On 12/03/2014 12:09 PM, Krutika Dhananjay wrote:
> 
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>     *From: *"Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
>     *To: *"Emmanuel Dreyfus" <manu@xxxxxxxxxx>
>     *Cc: *"Gluster Devel" <gluster-devel@xxxxxxxxxxx>
>     *Sent: *Wednesday, December 3, 2014 11:54:03 AM
>     *Subject: *Re:  question on glustershd
> 
> 
> 
>     --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>         *From: *"Emmanuel Dreyfus" <manu@xxxxxxxxxx>
>         *To: *"Ravishankar N" <ravishankar@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
>         *Sent: *Wednesday, December 3, 2014 10:14:22 AM
>         *Subject: *Re:  question on glustershd
> 
>         Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
> 
>         > afr_shd_full_healer() is run only when we run 'gluster vol heal <volname>
>         > full`, doing a full brick traversal (readdirp) from the root and
>         > attempting heal for each entry.
> 
>         Then we agree that "gluster vol heal $volume full" may fail to heal some
>         files because of inode lock contention, right?
> 
>         If that is expected behavior, then the tests are wrong. For instance in
>         tests/basic/afr/entry-self-heal.t we do "gluster vol heal $volume full"
>         and we check that no unhealed files are left behind.
> 
>         Did I miss something, or do we have to either fix afr_shd_full_healer()
>         or tests/basic/afr/entry-self-heal.t ?
> 
> 
>     Typical use of "heal full" is  in the event of a disk replacement where one of the bricks in the replica set is totally empty.
>     And in a volume where both (assuming 2 way replication to keep the discussion simple) children of AFR are on the same node, SHD would launch two healers.
>     Each healer does readdirp() only on the brick associated with it (see how @subvol is initialised in afr_shd_full_sweep()).
>     I guess in such scenarios, the healer associated with the brick that was empty would have no entries to read, and as a result, nothing to heal from it to the other brick.
>     In that case, there is no question of lock contention of the kind that you explained above?
> 
> Come to think of it, it does not really matter whether the two bricks are on the same node or not.
> In either case, there may not be a lock contention between healers associated with different bricks, irrespective of whether they are part of the same SHD or SHDs on different nodes.
> -Krutika
> 

Actually, there is a bug with full heal in afr-v2.
When full heal is triggered, glusterd sends the heal op to only one shd of the replica pair:the one whose node has the highest uuid.
And that shd triggers heal on the bricks only that are local to it. So in a 1x2 volume where the bricks are on different nodes, only
one shd gets the op and it triggers readdirp + heal on its local client (brick) only. (See BZ 1112158)

In afr-v1, also, only one shd receives the heal full op, but the readdirp is done at the afr-level (as opposed to the client xlator level in v2),
doing a conservative merge.

>     -Krutika
> 
>         -- 
>         Emmanuel Dreyfus
>         http://hcpnet.free.fr/pubz
>         manu@xxxxxxxxxx
>         _______________________________________________
>         Gluster-devel mailing list
>         Gluster-devel@xxxxxxxxxxx
>         http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel@xxxxxxxxxxx
>     http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel