> On Feb 2, 2024, at 7:26 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Sat, 03 Feb 2024, Chuck Lever III wrote: >> >> >>> On Feb 1, 2024, at 5:24 PM, NeilBrown <neilb@xxxxxxx> wrote: >>> >>> On Fri, 02 Feb 2024, Chuck Lever wrote: >>>> Passes pynfs, fstests, and the git regression suite. Please apply >>>> these to origin/linux-5.4.y. >>> >>> I should have mentioned this a day or two ago but I hadn't quite made >>> all the connection yet... >>> >>> The RELEASE_LOCKOWNER bug was masking a double-free bug that was fixed >>> by >>> Commit 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock") >>> which landed in v5.17 and wasn't marked as a bugfix, and so has not gone to >>> stable kernels. >> >> Then, instructions to stable@xxxxxxxxxxxxxxx: >> >> Do not apply the patches I just sent for 5.15, 5.10, and 5.4. I will >> apply 47446d74f170, run the tests again, and resend. >> >> >>> Any kernel earlier than v5.17 that receives the RELEASE_LOCKOWNER fix >>> also needs the nfsd4_blocked_lock fix. There is a minor follow-up fix >>> for that nfsd4_blocked_lock fix which Chuck queued yesterday. >>> >>> The problem scenario is that an nfsd4_lock() call finds a conflicting >>> lock and so has a reference to a particular nfsd4_blocked_lock. A concurrent >>> nfsd4_read_lockowner call frees all the nfsd4_blocked_locks including >>> the one held in nfsd4_lock(). nfsd4_lock then tries to free the >>> blocked_lock it has, and results in a double-free or a use-after-free. >>> >>> Before either patch is applied, the extra reference on the lock-owner >>> than nfsd4_lock holds causes nfsd4_realease_lockowner() to incorrectly >>> return an error and NOT free the blocks_lock. >>> With only the RELEASE_LOCKOWNER fix applied, the double-free happens. >> >> Our test suite currently does not exercise this use case, apparently. >> I didn't see a problem like this during testing. >> > > Our OpenQA testing found it (as did our customer :-). > Quoting from a bugzilla that unfortunately is not public (though might > not be accessible to anyone with an account) > > https://bugzilla.suse.com/show_bug.cgi?id=1219349 > > LTP test nfslock01.sh randomly fails on the latest SLE-15SP4 and > SLE-15SP5 KOTD. The failures appear only when testing NFS protocol > v4.0, other versions do not seem to be affected. The test either > gets stuck or sometimes triggers kernel oops. The contents of the > kernel backtrace vary. All archs appear to be affected. > > Does your test suite cover v4.0? pynfs covers v4.0 and v4.1. I ran fstests and the git regression suite with only NFSv4.0. > Does it include LTP ? kdevops doesn't include an LTP workflow at this time. > Thanks, > NeilBrown > > >> >>> With both patches applied the refcount on the nfsd4_blocked_lock prevents >>> the double-free. >>> >>> Kernels before 4.9 are (probably) not affected as they didn't have >>> find_or_allocate_block() which takes the second reference to a shared >>> object. But that is ancient history - those kernels are well past EOL. >>> >>> Thanks, >>> NeilBrown >>> >>> >>>> >>>> --- >>>> >>>> Chuck Lever (2): >>>> NFSD: Modernize nfsd4_release_lockowner() >>>> NFSD: Add documenting comment for nfsd4_release_lockowner() >>>> >>>> NeilBrown (1): >>>> nfsd: fix RELEASE_LOCKOWNER >>>> >>>> >>>> fs/nfsd/nfs4state.c | 65 +++++++++++++++++++++++++-------------------- >>>> 1 file changed, 36 insertions(+), 29 deletions(-) >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >>> >> >> -- >> Chuck Lever -- Chuck Lever