Re: [Patch 4/7] tabled: retry conflicting locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/20/2010 03:16 PM, Pete Zaitcev wrote:
On Wed, 20 Jan 2010 14:53:17 -0500, Jeff Garzik<jeff@xxxxxxxxxx>  wrote:
On 01/14/2010 11:13 PM, Pete Zaitcev wrote:

This problem was with us for a while, and even with this fix our start-up
is not reliable. But at least we will not be 100% guaranteed to hang as
before when restarting too quickly. So although the whole area needs some
serious reworking, this specific case was just too annoying to let it
continue.

This is not correct.  CLD has blocking locks.  You issue the LOCK op,
and will be notified when you have acquired the lock, possibly hours or
days later.  There is no need to retry anything...

Meanwhile, there's no way to cancel an outstanding lock requiest
short of blowing off the whole session. I'll switch to LOCK when
you fix that, but currently TRYLOCK is the only way (which BTW you
use in cldcli too).

Do you mean cancelling someone else's lock request? That is not something that meshes with the design. If you mean cancelling your own lock request, that's probably reasonable.

But the entire logic behind LOCK is central to what needs to be done: ensure one and only one session holds a lock, until the lock is released or the client dies (thus forcing the server to time out and release the dead session's locks).

If you are restarting quickly, a lock-timeout wait does not seem unreasonable.


N.B. ncld continues with this approach. In fact currectly it does not
even have a method that performs a blocking lock.

That's definitely a problem, as blocking locks are pretty central to CLD's design. If you want to own a resource, you get a blocking lock. You only own the resource as long as the session is alive, and you have not released the lock yourself. If you do not immediate acquire the lock, (1) you should not access the shared resource as master, and (2) you will be notified immediately when atomic lock acquisition occurs.

TRYLOCK is painful in the cloud because it encourages programmers, with patch #4 being a perfect example, to create racy polling-lock solutions where forward [lock] progress is not guaranteed. IOW, the lock-polling loop should be in the server, with the client being asynchronously notified of acquisition. TRYLOCK mainly exists for the less-common situation of "if (!trylock) exit(0)" type of cloud client execution.

NFS and other protocols in this space have repeatedly shown that polling locks is a painful, racy, byte-heavy solution for lock acquisition.

If there is a problem implementing blocking locks in the protocol or client, let me know, and we'll fix it.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Fedora Clound]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux