On Mon, 08 May 2023, Nikita Yushchenko wrote: > >> rpcbind was obviously written in a time before namespaces were even a > >> thought to anyone. I wonder if there is something we can do in rpcbind > >> itself to guard against these sorts of shenanigans? Probably not, I > >> guess... > > > > Maybe Steve or Neil have some idea. > > > >> Is /var shared between namespaces in this test for some particular > >> reason? > > > > I hope I got , we talk about /var/run/netns/ltp_ns, which is symlink to > > /proc/$pid/ns/net. Or is it really about /run/rpcbind.sock vs > > /var/run/rpcbind.sock? > > The overall picture is: > > - by design, filesystem namespaces and network namespaces are independent, it is pretty ok for two > processes to share filesystem namespace and be in different network namespaces, or vice versa. > > - the code in question, while being in the kernel for ages, is breaking this basic design, by using > filesystem path to contact a network service, Not just in the kernel, but also in user-space. The kernel contacts rpcbind to find how to talk to statd. statd talks to rpcbind to tell it how it where it can be reached. As you say - it has been this way for "ages". So maybe the bug is that something creates a network namespace without also creating a filesystem namespace. Certainly the kernel allows this as it should because the kernel doesn't set policy. But using the freedom to create a setup that doesn't actually work is foolish. If ltp creates a network namespace without creating a matching filesystem name space, and expects NFSv3 to work - then that is a bug in ltp. Now I agree that using path names seems not ideal in this case, and it would be a valuable enhancement to make it easy to avoid that. But it is an enhancement, not a bugfix. > > - thus the fix is: just not do that. Surely the fix is to do something else, not just to do nothing :-) > > I consider kernel contacting rpcbind via AF_UNIX being a bug in the kernel namespace implementation. So > this is a rare case when a test suite (LTP) helped to find a non-obvious kernel bug. Just need to fix > that bug, if not yet. There is good reason to use use AF_UNIX for the kernel to contact rpcbind. In fact the kernel has only been using AF_UNIX to contact rpcbind for about 12 years. Commit 7402ab19cdd5 ("SUNRPC: Use AF_LOCAL for rpcbind upcalls") gives some of the reasons for the change. They are still good reasons. Fortunately Linux provides "abstract" AF_UNIX names, which provide all the benefits that we want of AF_UNIX, but doesn't depend on the filesystem and keeps the bindings private to the network namespace - the best of both worlds. To fully implement this we need changes to libtirpc and to the kernel. Not big changes, but not entirely trivial either. > > Rpcbind listens both via AF_INET and via AP_UNIX, and did so for ages. > It is even not possible to disable AF_INET listening without patching code. By stopping contacting it > via AF_UNIX, it is virtually impossible to break anything. Check that commit for what can break. For testing, you can change /usr/lib/rpcbind.sock to listen on /run/NOT-rpcbind.sock instead. of /run/rpcbind.sock It must listen on at least one AF_UNIX socket as you noted, but it doesn't have to listen on one that any tool will talk to. This will cause all code (user-space and kernel) to fall-back on IPv6 or IPv4 to contact rpcbind. So maybe you can work-around the bug in ltp that way. You could even just "rm -f /var/run/rpcbind.sock" after starting rpcbind, and before running the test. Meanwhile I'll post some patches which enhancements to the kernel and to libtirpc to use abstract AF_UNIX socket when available. Thanks, NeilBrown