Apologies if this is late.... Please pull from the 'server-cluster-locking-api' branch at git://linux-nfs.org/~bfields/linux.git server-cluster-locking-api for a series of patches which allow NFS to export the locking functionality provided by filesystems which define their own ->lock() method (cluster filesystems being the interesting case, and GFS2 the first example). There's also a little miscellaneous locks.c cleanup along the way. This has gone through an iteration or two with linux-fsdevel and sat in -mm a couple weeks, and Trond has made a pass through it. We've tested it by running cthon -l on ext3, gfs2, and nfs exports of the two, in addition to doing some manual testing to ensure correct handling of conflicts across multiple servers in a gfs2 cluster. --b. --- fs/fuse/file.c | 3 +- fs/gfs2/locking/dlm/plock.c | 109 +++++++++++++++-- fs/gfs2/locking/nolock/main.c | 8 +- fs/gfs2/ops_file.c | 12 +- fs/lockd/svc4proc.c | 6 +- fs/lockd/svclock.c | 275 ++++++++++++++++++++++++++++++++++------- fs/lockd/svcproc.c | 7 +- fs/lockd/svcsubs.c | 2 +- fs/locks.c | 264 +++++++++++++++++++++++---------------- fs/nfs/file.c | 7 +- fs/nfs/nfs4proc.c | 1 + fs/nfsd/nfs4state.c | 30 +++-- include/linux/fcntl.h | 4 + include/linux/fs.h | 9 +- include/linux/lockd/lockd.h | 14 ++- 15 files changed, 546 insertions(+), 205 deletions(-) commit 586759f03e2e9031ac5589912a51a909ed53c30a Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Nov 14 16:37:25 2006 -0500 gfs2: nfs lock support for gfs2 Add NFS lock support to GFS2. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> Acked-by: Steven Whitehouse <swhiteho@xxxxxxxxxx> commit 1a8322b2b02071b0c7ac37a28357b93e6362f13e Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Nov 28 16:27:06 2006 -0500 lockd: add code to handle deferred lock requests Rewrite nlmsvc_lock() to use the asynchronous interface. As with testlock, we answer nlm requests in nlmsvc_lock by first looking up the block and then using the results we find in the block if B_QUEUED is set, and calling vfs_lock_file() otherwise. If this a new lock request and we get -EINPROGRESS return on a non-blocking request then we defer the request. Also modify nlmsvc_unlock() to call the filesystem method if appropriate. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit f812048020282fdfa9b72a6cf539c33b6df1fd07 Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Dec 5 23:48:10 2006 -0500 lockd: always preallocate block in nlmsvc_lock() Normally we could skip ever having to allocate a block in the case where the client asks for a non-blocking lock, or asks for a blocking lock that succeeds immediately. However we're going to want to always look up a block first in order to check whether we're revisiting a deferred lock call, and to be prepared to handle the case where the filesystem returns -EINPROGRESS--in that case we want to make sure the lock we've given the filesystem is the one embedded in the block that we'll use to track the deferred request. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit 5ea0d75037b93baa453b4d326c6319968fe91cea Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Nov 28 16:27:06 2006 -0500 lockd: handle test_lock deferrals Rewrite nlmsvc_testlock() to use the new asynchronous interface: instead of immediately doing a posix_test_lock(), we first look for a matching block. If the subsequent test_lock returns anything other than -EINPROGRESS, we then remove the block we've found and return the results. If it returns -EINPROGRESS, then we defer the lock request. In the case where the block we find in the first step has B_QUEUED set, we bypass the vfs_test_lock entirely, instead using the block to decide how to respond: with nlm_lck_denied if B_TIMED_OUT is set. with nlm_granted if B_GOT_CALLBACK is set. by dropping if neither B_TIMED_OUT nor B_GOT_CALLBACK is set Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit 85f3f1b3f7a6197b51a2ab98d927517df730214c Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Nov 28 16:27:06 2006 -0500 lockd: pass cookie in nlmsvc_testlock Change NLM internal interface to pass more information for test lock; we need this to make sure the cookie information is pushed down to the place where we do request deferral, which is handled for testlock by the following patch. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit 0e4ac9d93515b27fd7635332d73eae3192ed5d4e Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Nov 28 16:26:51 2006 -0500 lockd: handle fl_grant callbacks Add code to handle file system callback when the lock is finally granted. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit 2b36f412ab6f2e5b64af9832b20eb7ef67d025b4 Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Nov 28 16:26:47 2006 -0500 lockd: save lock state on deferral We need to keep some state for a pending asynchronous lock request, so this patch adds that state to struct nlm_block. This also adds a function which defers the request, by calling rqstp->rq_chandle.defer and storing the resulting deferred request in a nlm_block structure which we insert into lockd's global block list. That new function isn't called yet, so it's dead code until a later patch. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit 2beb6614f5e36c6165b704c167d82ef3e4ceaa0c Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Dec 5 23:31:28 2006 -0500 locks: add fl_grant callback for asynchronous lock return Acquiring a lock on a cluster filesystem may require communication with remote hosts, and to avoid blocking lockd or nfsd threads during such communication, we allow the results to be returned asynchronously. When a ->lock() call needs to block, the file system will return -EINPROGRESS, and then later return the results with a call to the routine in the fl_grant field of the lock_manager_operations struct. This differs from the case when ->lock returns -EAGAIN to a blocking lock request; in that case, the filesystem calls fl_notify when the lock is granted, and the caller retries the original lock. So while fl_notify is merely a hint to the caller that it should retry, fl_grant actually communicates the final result of the lock operation (with the lock already acquired in the succesful case). Therefore fl_grant takes a lock, a status and, for the test lock case, a conflicting lock. We also allow fl_grant to return an error to the filesystem, to handle the case where the fl_grant requests arrives after the lock manager has already given up waiting for it. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit fd85b8170dabbf021987875ef7f903791f4f181e Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Tue Nov 28 16:26:41 2006 -0500 nfsd4: Convert NFSv4 to new lock interface Convert NFSv4 to the new lock interface. We don't define any callback for now, so we're not taking advantage of the asynchronous feature--that's less critical for the multi-threaded nfsd then it is for the single-threaded lockd. But this does allow a cluster filesystems to export cluster-coherent locking to NFS. Note that it's cluster filesystems that are the issue--of the filesystems that define lock methods (nfs, cifs, etc.), most are not exportable by nfsd. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> commit 9b9d2ab4154a42ea4a119f7d3e4e0288bfe0bb79 Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Thu Jan 18 17:52:58 2007 -0500 locks: add lock cancel command Lock managers need to be able to cancel pending lock requests. In the case where the exported filesystem manages its own locks, it's not sufficient just to call posix_unblock_lock(); we need to let the filesystem know what's happening too. We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this might also be made available to userspace applications that could benefit from an asynchronous locking api. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> commit 150b393456e5a23513cace286a019e87151e47f0 Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Thu Jan 18 16:15:35 2007 -0500 locks: allow {vfs,posix}_lock_file to return conflicting lock The nfsv4 protocol's lock operation, in the case of a conflict, returns information about the conflicting lock. It's unclear how clients can use this, so for now we're not going so far as to add a filesystem method that can return a conflicting lock, but we may as well return something in the local case when it's easy to. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> commit 7723ec9777d9832849b76475b1a21a2872a40d20 Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Thu Jan 18 15:08:55 2007 -0500 locks: factor out generic/filesystem switch from setlock code Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for all the setlk code. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> commit 3ee17abd14c728d4e0ca7a991c58f2250cb091af Author: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> Date: Wed Feb 21 00:58:50 2007 -0500 locks: factor out generic/filesystem switch from test_lock Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for test_lock. Note that this hasn't been necessary until recently, because the few filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS. However GFS (and, in the future, other cluster filesystems) need to implement their own locking to get cluster-coherent locking, and also want to be able to export locking to NFS (lockd and NFSv4). So we accomplish this by factoring out code such as this and exporting it for the use of lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> commit 9d6a8c5c213e34c475e72b245a8eb709258e968c Author: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Date: Wed Feb 21 00:55:18 2007 -0500 locks: give posix_test_lock same interface as ->lock posix_test_lock() and ->lock() do the same job but have gratuitously different interfaces. Modify posix_test_lock() so the two agree, simplifying some code in the process. Signed-off-by: Marc Eshel <eshel@xxxxxxxxxxxxxxx> Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> commit 70cc6487a4e08b8698c0e2ec935fb48d10490162 Author: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> Date: Thu Feb 22 18:48:53 2007 -0500 locks: make ->lock release private data before returning in GETLK case The file_lock argument to ->lock is used to return the conflicting lock when found. There's no reason for the filesystem to return any private information with this conflicting lock, but nfsv4 is. Fix nfsv4 client, and modify locks.c to stop calling fl_release_private for it in this case. Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> Cc: "Trond Myklebust" <Trond.Myklebust@xxxxxxxxxx>" commit c2fa1b8a6c059dd08a802545fed3badc8df2adc1 Author: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> Date: Tue Feb 20 16:10:11 2007 -0500 locks: create posix-to-flock helper functions Factor out a bit of messy code by creating posix-to-flock counterparts to the existing flock-to-posix helper functions. Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> commit 226a998dbf3c6f9b85f67d08a52c5a2143ed9d88 Author: J. Bruce Fields <bfields@xxxxxxxxxxxxxx> Date: Wed Feb 14 14:25:00 2007 -0500 locks: trivial removal of unnecessary parentheses Remove some unnecessary parentheses. Signed-off-by: "J. Bruce Fields" <bfields@xxxxxxxxxxxxxx> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html