Re: NLM 4 Infinite Loop Bug

Jan Kasiak <j.kasiak@xxxxxxxxx> · Tue, 26 Jul 2022 13:16:44 -0400

Hi all,

Even after applying the above two patches, I have discovered a new set
of NLM 4 requests that break lockd.

Unfortunately, I don't have enough experience to suggest a fix, but
would be glad to test anyone's attempt.

All requests are non-blocking.

Scenario A
=========
lock(offset=UINT64_MAX, len=100) - GRANTED
free_all() - never finishes and lockd thread is stuck busy looping

Scenario B
========
lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED

test(svid=2, offset=UINT64_MAX, len=50) - DENIED
correct, holder offset, len are (UINT64_MAX, 100)

test(svid=2, offset=75, len=10) - DENIED
wrong, because holder (offset, len) are wrong (UINT64_MAX, 100),
because the above
lock overflows during comparison to (49, 50)

Scenario C
========
lock(svid=1, offset=UINT64_MAX, len=100) - GRANTED

test(svid=2, offset=UINT64_MAX, len=50) - DENIED
correct, holder offset, len are (UINT64_MAX, 100)

unlock(svid=1, offset=UINT64_MAX, len=50) - GRANTED
weird, because it has now created a lock at (offset=UINT64_MAX + 50, len=50)
not sure what the correct behavior should be here - FBIG error?

test(svid=2, offset=75, len=10) - DENIED
wrong, because holder offset, len are wrong (49, 50), because the above
unlock has overflowed the offset

-Jan

On Wed, Jul 20, 2022 at 4:01 PM Jan Kasiak <j.kasiak@xxxxxxxxx> wrote:
>
> Applying two commits from the Linux master branch seems to have fixed
> the problem:
>
> aec158242b87a43d83322e99bc71ab4428e5ab79
> 1197eb5906a5464dbaea24cac296dfc38499cc00
>
> -Jan
>
> On Wed, Jul 20, 2022 at 2:46 PM Jan Kasiak <j.kasiak@xxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > I'm writing my own NFS client, and while trying to test it, I've come
> > across a way to get the lockd thread into an infinite loop and stop
> > accepting any new requests.
> >
> > Kernel Version: Linux ubuntu-jammy 5.15.0-41-generic
> >
> > The client is a python program, and it does not run rpcbind, NLM, etc...
> >
> > I issue an NM_LOCK (procedure 22) request with block set to false, and
> > get a GRANTED reply.
> >
> > I then issue a FREE_ALL (procedure 23) request, and the lockd thread
> > gets stuck in nlm_traverse_locks - it matches the host, calls
> > nlm_unlock_files, and then jumps to the again label, and repeats this
> > loop forever.
> >
> > It's not clear to me who is supposed to unset the host from the lock?
> > Any pointers as to why there is a jump to again?
> >
> > Thanks,
> > -Jan