Re: Query regards to heal xattr heal in dht

Raghavendra G <raghavendra@xxxxxxxxxxx> · Thu, 15 Sep 2016 15:01:13 +0530

On Thu, Sep 15, 2016 at 12:02 PM, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:

On 8 September 2016 at 12:02, Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Hi All,

   I have one another solution to heal user xattr but before implement it i would like to discuss with you.

   Can i call function (dht_dir_xattr_heal internally it is calling syncop_setxattr) to heal xattr in dht_getxattr_cbk in last 
   after make sure we have a valid xattr.
   In function(dht_dir_xattr_heal) it will copy blindly all user xattr on all subvolume or i can compare subvol xattr with valid xattr if there is any mismatch then i will call syncop_setxattr otherwise no need to call. syncop_setxattr.

This can be problematic if a particular xattr is being removed - it might still exist on some subvols. IIUC, the heal would go and reset it again? 

One option is to use the hash subvol for the dir as the source - so perform xattr op on hashed subvol first and on the others only if it succeeds on the hashed. This does have the problem of being unable to set xattrs if the hashed subvol is unavailable. This might not be such a big deal in case of distributed replicate or distribute disperse volumes but will affect pure distribute. However, this way we can at least be reasonably certain of the correctness (leaving rebalance out of the picture).

* What is the behavior of getxattr when hashed subvol is down? Should we succeed with values from non-hashed subvols or should we fail getxattr? With hashed-subvol as source of truth, its difficult to determine correctness of xattrs and their values when it is down.

* setxattr is an inode operation (as opposed to entry operation). So, we cannot calculate hashed-subvol as in (get)(set)xattr, parent layout and "basename" is not available. This forces us to store hashed subvol in inode-ctx. Now, when the hashed-subvol changes we need to update these inode-ctxs too.

What do you think about a Quorum based solution to this problem?

1. setxattr succeeds only if it is successful on at least (n/2 + 1) number of subvols.
2. getxattr succeeds only if it is successful and values match on at least (n/2 + 1) number of subvols.

The flip-side of this solution is we are increasing the probability of failure of (get)(set)xattr operations as opposed to the hashed-subvol as source of truth solution. Or are we - how do we compare probability of hashed-subvol going down with probability of (n/2 + 1) nodes going down simultaneously? Is it 1/n vs (1/n*1/n*... (n/2+1 times)?. Is 1/n correct probability for _a specific subvol (hashed-subvol)_ going down (as opposed to _any one subvol_ going down)?

   Let me know if this approach is suitable.

Regards
Mohit Agrawal

On Wed, Sep 7, 2016 at 10:27 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:

On Wed, Sep 7, 2016 at 9:46 PM, Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Hi Pranith,

In current approach i am getting list of xattr from first up volume and update the user attributes from that xattr to 
all other volumes.

I have assumed first up subvol is source and rest of them are sink as we are doing same in dht_dir_attr_heal.

I think first up subvol is different for different mounts as per my understanding, I could be wrong.

Regards
Mohit Agrawal

On Wed, Sep 7, 2016 at 9:34 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:
hi Mohit,
       How does dht find which subvolume has the correct list of xattrs? i.e. how does it determine which subvolume is source and which is sink?

On Wed, Sep 7, 2016 at 2:35 PM, Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:
Hi,

  I am trying to find out solution of one problem in dht specific to user xattr healing.
  I tried to correct it in a same way as we are doing for healing dir attribute but i feel it is not best solution.

  To find a right way to heal xattr i want to discuss with you if anyone does have better solution to correct it.

  Problem:
   In a distributed volume environment custom extended attribute value for a directory does not display correct value after stop/start the brick. If any extended attribute value is set for a directory after stop the brick the attribute value is not updated on brick after start the brick.

  Current approach:
    1) function set_user_xattr to store user extended attribute in dictionary  
    2) function dht_dir_xattr_heal call syncop_setxattr to update the attribute on all volume 
    3) Call the function (dht_dir_xattr_heal) for every directory lookup in dht_lookup_revalidate_cbk

  Psuedocode for function dht_dir_xatt_heal is like below

   1) First it will fetch atttributes from first up volume and store into xattr.
   2) Run loop on all subvolume and fetch existing attributes from every volume 
   3) Replace user attributes from current attributes with xattr user attributes 
   4) Set latest extended attributes(current + old user attributes) inot subvol.

   In this current approach problem is 

   1) it will call heal function(dht_dir_xattr_heal) for every directory lookup without comparing xattr. 
    2) The function internally call syncop xattr for every subvolume that would be a expensive operation.     

   I have one another way like below to correct it but again in this one it does have dependency on time (not sure time is synch on all bricks or not)

   1) At the time of set extended attribute(setxattr) change time in metadata at server side
   2) Compare change time before call healing function in dht_revalidate_cbk

    Please share your input on this.
    Appreciate your input.

Regards
Mohit Agrawal

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Pranith

-- 
Pranith

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel