> On Nov 8, 2022, at 11:41 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Tue, 2022-11-08 at 14:57 +0000, Chuck Lever III wrote: >> >>> On Nov 7, 2022, at 4:55 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>> >>>> On Nov 7, 2022, at 5:48 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>>> >>>> On Sun, 2022-11-06 at 14:02 -0500, trondmy@xxxxxxxxxx wrote: >>>>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>> >>>>> vfs_lock_file() expects the struct file_lock to be fully initialised by >>>>> the caller. >>> >>> As a reviewer, I don't see anything in the vfs_lock_file() kdoc >>> comment that suggests this, and vfs_lock_file() itself is just >>> a wrapper around each filesystem's f_ops->lock method. That >>> expectation is a bit deeper into NFS-specific code. A few more >>> observations below. >>> >>> >>>>> Re-exported NFSv3 has been seen to Oops if the fl_file field >>>>> is NULL. >>> >>> Needs a Link: to the bug report. Which I can add. >>> >>> This will also give us a call trace we can reference, so I won't >>> add that here. >>> >>> >>>>> Fixes: aec158242b87 ("lockd: set fl_owner when unlocking files") >>>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>> --- >>>>> fs/lockd/svcsubs.c | 17 ++++++++++------- >>>>> 1 file changed, 10 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c >>>>> index e1c4617de771..3515f17eaf3f 100644 >>>>> --- a/fs/lockd/svcsubs.c >>>>> +++ b/fs/lockd/svcsubs.c >>>>> @@ -176,7 +176,7 @@ nlm_delete_file(struct nlm_file *file) >>>>> } >>>>> } >>>>> >>>>> -static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) >>>>> +static int nlm_unlock_files(struct nlm_file *file, const struct file_lock *fl) >>>>> { >>>>> struct file_lock lock; >>>>> >>>>> @@ -184,12 +184,15 @@ static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) >>>>> lock.fl_type = F_UNLCK; >>>>> lock.fl_start = 0; >>>>> lock.fl_end = OFFSET_MAX; >>>>> - lock.fl_owner = owner; >>>>> - if (file->f_file[O_RDONLY] && >>>>> - vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) >>>>> + lock.fl_owner = fl->fl_owner; >>>>> + lock.fl_pid = fl->fl_pid; >>>>> + lock.fl_flags = FL_POSIX; >>>>> + >>>>> + lock.fl_file = file->f_file[O_RDONLY]; >>>>> + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) >>>>> goto out_err; >>>>> - if (file->f_file[O_WRONLY] && >>>>> - vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) >>>>> + lock.fl_file = file->f_file[O_WRONLY]; >>>>> + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) >>>>> goto out_err; >>>>> return 0; >>>>> out_err: >>>>> @@ -226,7 +229,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, >>>>> if (match(lockhost, host)) { >>>>> >>>>> spin_unlock(&flctx->flc_lock); >>>>> - if (nlm_unlock_files(file, fl->fl_owner)) >>>>> + if (nlm_unlock_files(file, fl)) >>>>> return 1; >>>>> goto again; >>>>> } >>>> >>>> Good catch. >>>> >>>> I wonder if we ought to roll an initializer function for file_locks to >>>> make it harder for callers to miss setting some fields like this? One >>>> idea: we could change vfs_lock_file to *not* take a file argument, and >>>> insist that the caller fill out fl_file when calling it? That would make >>>> it harder to screw this up. >>> >>> Commit history shows that, at least as far back as the beginning of >>> the git era, the vfs_lock_file() call site here did not initialize >>> the fl_file field. So, this code has been working without fully >>> initializing @fl for, like, forever. >>> >>> >>> Trond later says: >>>> The regression occurs in 5.16, because that was when Bruce merged his >>>> patches to enable locking when doing NFS re-exporting. >>> >>> That means the Fixes: tag above is misleading. The proposed patch >>> doesn't actually fix that commit (which went into v5.19), it simply >>> applies on that commit. >>> >>> I haven't been able to find the locking patches mentioned here. I think >>> those bear mentioning (by commit ID) in the patch description, at least. >>> If you know the commit ID, Trond, can you pass it along? >>> >>> Though I would say that, in agreement with Jeff, the true cause of this >>> issue is the awkward synopsis for vfs_lock_file(). >> >> Since Trond has re-assigned the kernel.org bug to me... I'll blather on >> a bit more. (Yesterday's patch is still queued up, I can replace it or >> move it depending on the outcome of this discussion). >> >> -> The vfs_{test,lock,cancel}_file APIs all take a file argument. Maybe >> we shouldn't remove the @filp argument from vfs_lock_file(). >> > > They all take a file_lock argument as well. @filp is redundant in all of > them. Keeping both just increases the ambiguity. I move that we drop the > explicit argument since we need to set it in the struct anyway. Sounds good to me. > We could also consider adding a @filp arguments to locks_alloc_lock and > locks_init_lock, to make it a bit more evident that it needs to be set. > >> -> The struct file_lock * argument of vfs_lock_file() is not a const. >> > > That might be tough. Even for "request" fl's we modify some fields in > them (for example, fl_wait and fl_blocked_member). fl_file should never > change though, once it has been assigned. We could potentially make that > const. > >> After auditing the call sites, I think it would be safe for vfs_lock_file() >> to explicitly overwrite the fl->fl_file field with the value of the @filp >> argument before calling f_ops->lock. At the very least, it should sanity- >> check that the two pointer values are the same, and document that as an >> API requirement. >> >> Alternatively we could cook up an NFS-specific fix... but the vfs_lock_file >> API would still look dodgy. >> > > I see no reason to do anything NFS-specific here. I'd be fine with > WARN_ONs in locks.c for now, until we decide what to do longer term. > It's possible we have some other call chains that are not setting that > field correctly. Agreed, a WARN_ON would be a good first step. > If we can audit all of the call sites and ensure that they are properly > setting fl_file in the struct, we should be able to painlessly drop the > separate @filp argument from all of those functions. The only one I found that doesn't set fl_file close to the vfs_lock_file call site is do_lock_file_wait(). > I'll toss it onto my to-do pile. I'm assuming you mean you'll do the API clean-up, and that I should keep Trond's fix in the nfsd queue. -- Chuck Lever