On Tue, 2023-05-02 at 09:59 +0200, Petr Vorel wrote: > nfs_flock (run via nfslock01.sh) is known to fail on NFS v3 [1]: > > not unsharing /var makes AF_UNIX socket for host's rpcbind to become > available inside ltp_ns. Then, at NFS v3 mount time, kernel creates > an instance of lockd for ltp_ns, and ports for that instance leak to > host's rpcbind and overwrite ports for lockd already active for root > namespace. This breaks nfs3 file locking. > Yeccchhh...that is pretty nasty. rpcbind was obviously written in a time before namespaces were even a thought to anyone. I wonder if there is something we can do in rpcbind itself to guard against these sorts of shenanigans? Probably not, I guess... Is /var shared between namespaces in this test for some particular reason? > Before bd512e733 ("nfs_flock: fail the test if lock/unlock ops fail") > it run indefinitely with "unhandled error -107": > [ 2840.099565] lockd: cannot monitor 10.0.0.2 > [ 2840.109353] lockd: cannot monitor 10.0.0.2 > [ 2843.286811] xs_tcp_setup_socket: connect returned unhandled error -107 > [ 2850.198791] xs_tcp_setup_socket: connect returned unhandled error -107 > > bd512e733 caused an early abort (therefore only "cannot monitor 10.0.0.2" > appears). > > Although there is suggestion, how to fix the problem in kernel [2]: > > > Maybe rpcb_create_local() shall detect that it is not in root > > netns, and only try AF_INET connection to > localhost in that case. > > That would be simple and might be sensible. IF changing the AF_UNIX > path to "/run/rpcbind.sock" isn't sufficient, then testing for the > root_ns is probably the best second option. > Was it determined that changing the location of the socket wasn't sufficient to fix this? FWIW, My Fedora 38 machine seems to listen on that socket already: [Socket] ListenStream=/run/rpcbind.sock -- Jeff Layton <jlayton@xxxxxxxxxx>