Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

Nikita Yushchenko <nikita.yushchenko@xxxxxxxxxxxxx> · Wed, 19 Jan 2022 08:28:47 +0300

19.01.2022 08:26, Nikita Yushchenko wrote:
Big picture is - lockd tries to be per-netns, but lockd isn't standalone, it depends on rpcbind, and 
rpcbind isn't guaranteed to be per-netns.

One can argue that it is not kernel's job to provide per-netns rpcbind.

Still, the current situation is - by default, doing an nfs mount from within netns B immediately 
breaks lockd serving nfs mounts exported from different netns A. "By default" = "as long as nfsmount 
process executed in netns B is also in a different mount namespace that has RPCBIND_SOCK_PATHNAME not 
pointing to AF_UNIX socket instance owned by rpcbind serving netns A.

Although in LTP's 'nfslock01' test the "non working locking" is reproduced on the same mount that 
triggered the breakage, the breakage is not limited to that mount. Since that mount operation in netns 
B, any client of nfs exports from netns A will get locking broken - including clients running on 
different physical hosts.

I'd say that using AF_UNIX connection from lockd to rpcbind does not play well with per-netns lockd.

Solution to use AF_UNIX connection to rpcbind only for lockd serving root netns, and using AF_INET 
otherwise - looks more sane.

Btw, not sure (did not test) what will happen if nfs server will be similarly started in netns B.  Will 
it hijack requests addressed to nfs server running in netns A?

No it won't "hijack"...  because in will still listen inside netns B only.  But, if ports in rpcbind get 
overwritten in the similar manner, nfs server running in netns A will become no longer reachable.