Hi all, Even after applying the above two patches, I have discovered a new set of NLM 4 requests that break lockd. Unfortunately, I don't have enough experience to suggest a fix, but would be glad to test anyone's attempt. All requests are non-blocking. Scenario A ========= lock(offset=UINT64_MAX, len=100) - GRANTED free_all() - never finishes and lockd thread is stuck busy looping Scenario B ======== lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED test(svid=2, offset=UINT64_MAX, len=50) - DENIED correct, holder offset, len are (UINT64_MAX, 100) test(svid=2, offset=75, len=10) - DENIED wrong, because holder (offset, len) are wrong (UINT64_MAX, 100), because the above lock overflows during comparison to (49, 50) Scenario C ======== lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED test(svid=2, offset=UINT64_MAX, len=50) - DENIED correct, holder offset, len are (UINT64_MAX, 100) unlock(svid=1, offset=UINT64_MAX, len=50) - GRANTED weird, because it has now created a lock at (offset=UINT64_MAX + 50, len=50) not sure what the correct behavior should be here - FBIG error? test(svid=2, offset=75, len=10) - DENIED wrong, because holder offset, len are wrong (49, 50), because the above unlock has overflowed the offset -Jan On Wed, Jul 20, 2022 at 4:01 PM Jan Kasiak <j.kasiak@xxxxxxxxx> wrote: > > Applying two commits from the Linux master branch seems to have fixed > the problem: > > aec158242b87a43d83322e99bc71ab4428e5ab79 > 1197eb5906a5464dbaea24cac296dfc38499cc00 > > -Jan > > On Wed, Jul 20, 2022 at 2:46 PM Jan Kasiak <j.kasiak@xxxxxxxxx> wrote: > > > > Hi all, > > > > I'm writing my own NFS client, and while trying to test it, I've come > > across a way to get the lockd thread into an infinite loop and stop > > accepting any new requests. > > > > Kernel Version: Linux ubuntu-jammy 5.15.0-41-generic > > > > The client is a python program, and it does not run rpcbind, NLM, etc... > > > > I issue an NM_LOCK (procedure 22) request with block set to false, and > > get a GRANTED reply. > > > > I then issue a FREE_ALL (procedure 23) request, and the lockd thread > > gets stuck in nlm_traverse_locks - it matches the host, calls > > nlm_unlock_files, and then jumps to the again label, and repeats this > > loop forever. > > > > It's not clear to me who is supposed to unset the host from the lock? > > Any pointers as to why there is a jump to again? > > > > Thanks, > > -Jan