Re: [PATCH 1/1] nfslock01.sh: Don't test on NFS v3 on TCP

"NeilBrown" <neilb@xxxxxxx> · Wed, 10 May 2023 09:00:43 +1000

On Mon, 08 May 2023, Nikita Yushchenko wrote:
> >> rpcbind was obviously written in a time before namespaces were even a
> >> thought to anyone. I wonder if there is something we can do in rpcbind
> >> itself to guard against these sorts of shenanigans? Probably not, I
> >> guess...
> > 
> > Maybe Steve or Neil have some idea.
> > 
> >> Is /var shared between namespaces in this test for some particular
> >> reason?
> > 
> > I hope I got , we talk about /var/run/netns/ltp_ns, which is symlink to
> > /proc/$pid/ns/net. Or is it really about /run/rpcbind.sock vs
> > /var/run/rpcbind.sock?
> 
> The overall picture is:
> 
> - by design, filesystem namespaces and network namespaces are independent, it is pretty ok for two 
> processes to share filesystem namespace and be in different network namespaces, or vice versa.
> 
> - the code in question, while being in the kernel for ages, is breaking this basic design, by using 
> filesystem path to contact a network service,

Not just in the kernel, but also in user-space.  The kernel contacts
rpcbind to find how to talk to statd.  statd talks to rpcbind to tell it
how it where it can be reached.  As you say - it has been this way for
"ages".

So maybe the bug is that something creates a network namespace without
also creating a filesystem namespace.  Certainly the kernel allows this
as it should because the kernel doesn't set policy.
But using the freedom to create a setup that doesn't actually work is
foolish.  If ltp creates a network namespace without creating a matching
filesystem name space, and expects NFSv3 to work - then that is a bug in
ltp.

Now I agree that using path names seems not ideal in this case, and it
would be a valuable enhancement to make it easy to avoid that.  But it
is an enhancement, not a bugfix.

> 
> - thus the fix is: just not do that.

Surely the fix is to do something else, not just to do nothing :-)

> 
> I consider kernel contacting rpcbind via AF_UNIX being a bug in the kernel namespace implementation. So 
> this is a rare case when a test suite (LTP) helped to find a non-obvious kernel bug. Just need to fix 
> that bug, if not yet.

There is good reason to use use AF_UNIX for the kernel to contact
rpcbind.

In fact the kernel has only been using AF_UNIX to contact rpcbind for
about 12 years.

Commit 7402ab19cdd5 ("SUNRPC: Use AF_LOCAL for rpcbind upcalls")
gives some of the reasons for the change.  They are still good reasons.

Fortunately Linux provides "abstract" AF_UNIX names, which provide all
the benefits that we want of AF_UNIX, but doesn't depend on the
filesystem and keeps the bindings private to the network namespace - the
best of both worlds.

To fully implement this we need changes to libtirpc and to the kernel.
Not big changes, but not entirely trivial either.

> 
> Rpcbind listens both via AF_INET and via AP_UNIX, and did so for ages.
> It is even not possible to disable AF_INET listening without patching code. By stopping contacting it 
> via AF_UNIX, it is virtually impossible to break anything.

Check that commit for what can break.

For testing, you can change /usr/lib/rpcbind.sock to listen on
/run/NOT-rpcbind.sock instead. of /run/rpcbind.sock

It must listen on at least one AF_UNIX socket as you noted,
but it doesn't have to listen on one that any tool will talk to.  This
will cause all code (user-space and kernel) to fall-back on IPv6 or IPv4
to contact rpcbind.
So maybe you can work-around the bug in ltp that way.  You could even
just "rm -f /var/run/rpcbind.sock" after starting rpcbind, and before
running the test.

Meanwhile I'll post some patches which enhancements to the kernel and to
libtirpc to use abstract AF_UNIX socket when available.

Thanks,
NeilBrown