Hi Bruce, The current implementation of vfs_setlease/generic_setlease/etc is a bit quirky. I've been thinking it over for the past couple of days, and I think we need to refactor it to work sensibly. As you may have noticed, I've been mulling over getting rid of the BKL in fs/locks.c and the lease interface is particularly problematic. Here's one reason why: fcntl_setlease lock_kernel vfs_setlease lock_kernel generic_setlease unlock_kernel fasync_helper unlock_kernel This is perfectly legitimate for the BKL of course, but other spinlocks can't be acquired recursively in this way. At first I thought I'd just push the call to generic_setlease into __vfs_setlease and have two ways of calling it: fcntl_setlease lock_kernel __vfs_setlease fasync_helper unlock_kernel vfs_setlease lock_kernel __vfs_setlease unlock_kernel But generic_setlease allocates memory (GFP_KERNEL) under the lock, so that's bad for converting to spnlocks later. Then I thought I'd move the spinlock acquisition into generic_setlease(). But I can't do that either as fcntl_setlease() needs to hold the spinlock around the call to generic_setlease() and fasync_helper(). Then I started to wonder about the current split of functionality between fcntl_setlease, vfs_setlease and generic_setlease. The check for no other process having this file open should be common to all filesystems. And it should be honoured by NFS (it shouldn't hand out a delegation if a local user has the file open), so it needs to be in vfs_setlease(). Now I'm worrying about the mechanics of calling out to a filesystem to perform a lock. Obviously, a network filesystem is going to have to sleep waiting for a response from the fileserver, so that precludes the use of a spinlock held over the call to f_op->setlease. I'm not keen on the idea of EXPORT_SYMBOL_GPL() a spinlock -- verifying locking is quite hard enough without worrying what a filesystem might be doing with it. So I think we need to refactor the interface, and I'd like to hear your thoughts on my ideas of how to handle this. First, have clients of vfs_setlease call lease_alloc() and pass it in, rather than allocate it on the stack and have this horrible interface where we may pass back an allocated lock. This allows us to not allocate memory (hence sleep) during generic_setlease(). Second, move some of the functionality of generic_setlease() to vfs_setlease(), as mentioned above. Third, change the purpose of f_op->setlease. We can rename it if you like to catch any out of tree users. I'd propose using it like this: vfs_setlease() if (f_op->setlease()) res = f_op->setlease() if (res) return res; lock_kernel() generic_setlease() unlock_kernel() fcntl_setlease if (f_op->setlease()) res = f_op->setlease() if (res) return res; lock_kernel generic_setlease() ... fasync ... unlock_kernel So 'f_op->setlease' now means "Try to get a lease from the fileserver". We can optimise this a bit to not even call setlease if, say, we already have a read lease and we're trying to get a second read lease. But we have to record our local leases (that way they'll show up in /proc/locks). I think the combination of these three ideas gives us a sane interface to the various setlease functions, and let us convert from lock_kernel() to spin_lock() later. Have I missed something? I don't often think about the needs of cluster filesystems, so I may misunderstand how they need this operation to work. At some point, we need to revisit the logic for 'is this file open by another process' -- it's clearly buggy since it doesn't hold the inode_lock while checking i_count, so it could race against an __iget(). -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html