> On Nov 7, 2022, at 15:34, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > >> On Nov 7, 2022, at 3:22 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >> >> On Mon, 2022-11-07 at 18:42 +0000, Trond Myklebust wrote: >>> >>>> On Nov 7, 2022, at 09:12, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>>> >>>> >>>> >>>>> On Nov 7, 2022, at 5:48 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>>>> >>>>> On Sun, 2022-11-06 at 14:02 -0500, trondmy@xxxxxxxxxx wrote: >>>>>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>>> >>>>>> vfs_lock_file() expects the struct file_lock to be fully initialised by >>>>>> the caller. Re-exported NFSv3 has been seen to Oops if the fl_file field >>>>>> is NULL. >>>>>> >>>>>> Fixes: aec158242b87 ("lockd: set fl_owner when unlocking files") >>>>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>>> --- >>>>>> fs/lockd/svcsubs.c | 17 ++++++++++------- >>>>>> 1 file changed, 10 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c >>>>>> index e1c4617de771..3515f17eaf3f 100644 >>>>>> --- a/fs/lockd/svcsubs.c >>>>>> +++ b/fs/lockd/svcsubs.c >>>>>> @@ -176,7 +176,7 @@ nlm_delete_file(struct nlm_file *file) >>>>>> } >>>>>> } >>>>>> >>>>>> -static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) >>>>>> +static int nlm_unlock_files(struct nlm_file *file, const struct file_lock *fl) >>>>>> { >>>>>> struct file_lock lock; >>>>>> >>>>>> @@ -184,12 +184,15 @@ static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) >>>>>> lock.fl_type = F_UNLCK; >>>>>> lock.fl_start = 0; >>>>>> lock.fl_end = OFFSET_MAX; >>>>>> - lock.fl_owner = owner; >>>>>> - if (file->f_file[O_RDONLY] && >>>>>> - vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) >>>>>> + lock.fl_owner = fl->fl_owner; >>>>>> + lock.fl_pid = fl->fl_pid; >>>>>> + lock.fl_flags = FL_POSIX; >>>>>> + >>>>>> + lock.fl_file = file->f_file[O_RDONLY]; >>>>>> + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) >>>>>> goto out_err; >>>>>> - if (file->f_file[O_WRONLY] && >>>>>> - vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) >>>>>> + lock.fl_file = file->f_file[O_WRONLY]; >>>>>> + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) >>>>>> goto out_err; >>>>>> return 0; >>>>>> out_err: >>>>>> @@ -226,7 +229,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, >>>>>> if (match(lockhost, host)) { >>>>>> >>>>>> spin_unlock(&flctx->flc_lock); >>>>>> - if (nlm_unlock_files(file, fl->fl_owner)) >>>>>> + if (nlm_unlock_files(file, fl)) >>>>>> return 1; >>>>>> goto again; >>>>>> } >>>>> >>>>> Good catch. >>>>> >>>>> I wonder if we ought to roll an initializer function for file_locks to >>>>> make it harder for callers to miss setting some fields like this? One >>>>> idea: we could change vfs_lock_file to *not* take a file argument, and >>>>> insist that the caller fill out fl_file when calling it? That would make >>>>> it harder to screw this up. >>>>> >>>>> In any case, let's take this patch in the interim while we consider >>>>> whether and how to clean this up. >>>>> >>>>> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> >>>> >>>> Since this doesn't fix breakage in 6.1-rc, I plan to take it for 6.2. >>>> If all y'all feel the fix is more urgent than that, let me know. >>> >>> >>> It is relevant to fixing https://bugzilla.kernel.org/show_bug.cgi?id=216582 >>> No idea how urgent that is... >>> >> >> Seems like it's technically a regression then. Prior to aec158242b87, >> those locks were being ignored. Now that we actually try to unlock them, >> this causes a crash. > > The reporter can reproduce a crash back to v5.16. So, it's a regression, > but not one in v6.1-rc. I'm trying to be more strict about that to prevent > quickly backporting fixes that have bugs. > > >> I move for sending it to mainline sooner rather than later. > > I'd rather give this one more time in linux-next. The Fixes: tag will > trigger automatic backport once v6.2-rc1 closes. The fix is available > to the reporter to apply to his kernel. The regression occurs in 5.16, because that was when Bruce merged his patches to enable locking when doing NFS re-exporting. The workaround is therefore to mount the NFS filesystem with -o nolock on the re-exporting server. That said, the patch is trivially correct. _________________________________ Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx