On Tue, 2022-11-15 at 13:43 +0800, Xiubo Li wrote: > On 15/11/2022 03:46, Jeff Layton wrote: > > On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote: > > > Ceph has a need to know whether a particular file has any locks set on > > > it. It's currently tracking that by a num_locks field in its > > > filp->private_data, but that's problematic as it tries to decrement this > > > field when releasing locks and that can race with the file being torn > > > down. > > > > > > Add a new vfs_file_has_locks helper that will scan the flock and posix > > > lists, and return true if any of the locks have a fl_file that matches > > > the given one. Ceph can then call this instead of doing its own > > > tracking. > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > --- > > > fs/locks.c | 36 ++++++++++++++++++++++++++++++++++++ > > > include/linux/fs.h | 1 + > > > 2 files changed, 37 insertions(+) > > > > > > Xiubo, > > > > > > Here's what I was thinking instead of trying to track this within ceph. > > > Most inodes never have locks set, so in most cases this will be a NULL > > > pointer check. > > > > > > > > > > > I went ahead and added a slightly updated version of this this to my > > locks-next branch for now, but... > > > > Thinking about this more...I'm not sure this whole concept of what the > > ceph code is trying to do makes sense. Locks only conflict if they have > > different owners, and POSIX locks are owned by the process. Consider > > this scenario (obviously, this is not a problem with OFD locks). > > > > A process has the same file open via two different fds. It sets lock A > > from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via > > fd 2. The two locks will be merged, because they don't conflict (because > > it's the same process). > > > > Against which fd should the merged lock record be counted? > > Thanks Jeff. > > For the above example as you mentioned, from my reading of the lock code > after being merged it will always keep the old file_lock's fl_file. > > There is another case that if the Inode already has LockA and LockB: > > Lock A --> [0, 9] --> fileA > > Lock B --> [15, 20] --> fileB > > And then LockC comes: > > Lock C --> [8, 16] --> fileC > > Then the inode will only have the LockB: > > Lock B --> [0, 20] --> fileB. > > So the exiting ceph code seems buggy! > Yeah, there are a number of ways to end up with a different fl_file than you started with. > > > > Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when > > the fd hasn't had any locks explicitly set on it? > > Maybe we should check whether any POSIX lock exist, if so we should > check CEPH_I_ERROR_FILELOCK always. Or we need to check it depending on > each fd ? > > It was originally added here: commit ff5d913dfc7142974eb1694d5fd6284658e46bc6 Author: Yan, Zheng <zyan@xxxxxxxxxx> Date: Thu Jul 25 20:16:45 2019 +0800 ceph: return -EIO if read/write against filp that lost file locks After mds evicts session, file locks get lost sliently. It's not safe to let programs continue to do read/write. Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx> So I guess with the current code if you have the file open and set a lock on it, you'll get back EIO when you try to get caps for it, but if you never set a lock on the fd, then you wouldn't get an error. We don't reliably keep track of what fd was used to set a lock (as noted above), so we can't really do what Zheng was trying to do here. Having a file where some openers use locking and others don't is a really odd usage pattern though. Locks are like stoplights -- they only work if everyone pays attention to them. I think we should probably switch ceph_get_caps to just check whether any locks are set on the file. If there are POSIX/OFD/FLOCK locks on the file at the time, we should set CHECK_FILELOCK, regardless of what fd was used to set the lock. In practical terms, we probably want a vfs_inode_has_locks function, that just tests whether the flc_posix and flc_flock lists are empty. Maybe something like this instead? Then ceph could call this from ceph_get_caps and set CHECK_FILELOCK if it returns true. -------------8<--------------- [PATCH] filelock: new helper: vfs_inode_has_locks Ceph has a need to know whether a particular inode has any locks set on it. It's currently tracking that by a num_locks field in its filp->private_data, but that's problematic as it tries to decrement this field when releasing locks and that can race with the file being torn down. Add a new vfs_inode_has_locks helper that just returns whether any locks are currently held on the inode. Cc: Xiubo Li <xiubli@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> --- fs/locks.c | 23 +++++++++++++++++++++++ include/linux/fs.h | 1 + 2 files changed, 24 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index 5876c8ff0edc..9ccf89b6c95d 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2672,6 +2672,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl) } EXPORT_SYMBOL_GPL(vfs_cancel_lock); +/** + * vfs_inode_has_locks - are any file locks held on @inode? + * @inode: inode to check for locks + * + * Return true if there are any FL_POSIX or FL_FLOCK locks currently + * set on @inode. + */ +bool vfs_inode_has_locks(struct inode *inode) +{ + struct file_lock_context *ctx; + bool ret; + + ctx = smp_load_acquire(&inode->i_flctx); + if (!ctx) + return false; + + spin_lock(&ctx->flc_lock); + ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock); + spin_unlock(&ctx->flc_lock); + return ret; +} +EXPORT_SYMBOL_GPL(vfs_inode_has_locks); + #ifdef CONFIG_PROC_FS #include <linux/proc_fs.h> #include <linux/seq_file.h> diff --git a/include/linux/fs.h b/include/linux/fs.h index e654435f1651..d6cb42b7e91c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *); extern int vfs_test_lock(struct file *, struct file_lock *); extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *); extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl); +bool vfs_inode_has_locks(struct inode *inode); extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl); extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type); extern void lease_get_mtime(struct inode *, struct timespec64 *time); -- 2.38.1