Hey,
Sorry for the late reply but I missed this e-mail. With respect to identifying locking domains, we use the identical logic that GlusterFS itself uses to identify the domains; which is just a simple string comparison
if I'm not mistaken. System processes (SHD/Rebalance) locking domains are treated identical to any other, this is specifically critical to things like DHT healing as this locking domain is used both in userland and by SHDs (you cannot disable DHT healing).
To illustrate this, consider the case where a SHD holds a lock to do a DHT heal but can't because of GFID split-brain....a user comes along a hammers that directory attempting to get a lock....you can pretty much kiss your cluster good-bye after that :).
With this in mind, we explicitly choose not to respect system process (SHD/rebalance) locks any more than a user lock request as they can be just as likely (if not more so) to cause a system to fall over vs. a user (see example above). Although this might
seem unwise at first, I'd put forth that having clusters fall over catastrophically pushes far worse decisions on operators such as re-kicking random bricks or entire clusters in desperate attempts at freeing locks (the CLI is often unable to free the locks
in our experience) or stopping run away memory consumption due to frames piling up on the bricks. To date, we haven't even observed a single instance of data corruption (and we've been looking for it!) due to this feature.
We've even used it on clusters where they were on the verge of falling over and we enable revocation and the entire system stabilizes almost instantly (it's really like magic when you see it :) ). Hope this helps!
Richard
From: raghavendra.hg@xxxxxxxxx [raghavendra.hg@xxxxxxxxx] on behalf of Raghavendra G [raghavendra@xxxxxxxxxxx]
Sent: Tuesday, January 26, 2016 9:49 PM To: Raghavendra Gowdappa Cc: Richard Wareing; Gluster Devel Subject: Re: Feature: Automagic lock-revocation for features/locks xlator (v3.7.x) On Mon, Jan 25, 2016 at 10:39 AM, Raghavendra Gowdappa
<rgowdapp@xxxxxxxxxx> wrote:
I missed this point in my previous mail. Now I remember that we can use frame->root->pid (being negative) to identify internal processes. Was this the approach you followed to identify locks from rebalance process?
These two domains are used for locks to synchronize among and between rebalance process(es) and client(s). So, there is equal probability that these locks might be requests from clients and hence application can see some file operations failing. Raghavendra G
|
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel