On Tue, 12 Nov 2024, Kuniyuki Iwashima wrote: > From: "NeilBrown" <neilb@xxxxxxx> > Date: Tue, 12 Nov 2024 10:52:34 +1100 > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > > > index 6f272013fd9b..d4330aaadc23 100644 > > > --- a/net/sunrpc/svcsock.c > > > +++ b/net/sunrpc/svcsock.c > > > @@ -1551,6 +1551,10 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv, > > > newlen = error; > > > > > > if (protocol == IPPROTO_TCP) { > > > + __netns_tracker_free(net, &sock->sk->ns_tracker, false); > > > + sock->sk->sk_net_refcnt = 1; > > > + get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL); > > > + sock_inuse_add(net, 1); > > > > This is really ugly. These internal details of the network layer have > > no place in sunrpc code. There must be a better way. > > I asked to do this way. I agree this way is really ugly. Similar > code exists in MPTCP, SMC, CIFS, etc, so I plan to add a new API for > this case, but this requires huge change adding a new parameter for > ->create() prototype and the changes are not backportable. > > https://github.com/q2ven/linux/commit/bb8b8814a73b3f50c3fef5eaf8d30d8c1df43e7b > https://github.com/q2ven/linux/commits/427_2 > > After my series, we can use the following but cannot backport it to > stable. > > sock_create_net(net, family, type, protocol); > > e.g. commit for MPTCP > https://github.com/q2ven/linux/commit/24a4647561272c1e67a685d8403e27eb863398cf > > That's why I suggested to go with the ugly way and I will clean them > up in the next cycle. > > So, finally the sunrpc code will be much cleaner and the netns refcnt > will be touched only in the core code. This fact needs to be spelled out in the commit message: This is an ugly hack which can easily be backported to earlier kernels. A proper fix which cleans up the interfaces will follow, but will not be so easy to backport. or something like that. I would still prefer if a little helper were made available so sunrpc could just call one function rather than adding 4 cryptic lines. But I won't argue that too strongly. Thanks, NeilBrown > > > > > > Can we pass '0' for the kern arg to __sock_create()? That should fix > > the refcounting issues, but might mess up security labelling. > > This should be avoided as it's confusing for BPF programs, LSMs, and > LOCKDEP. > > > > > > Can we wait for something before we call put_net() to release the net. > > > > Maybe we want to split the "kern" arg t __sock_create() and have > > "kern" which affects labeling and "refnet" with affects refcounting the > > net. > > This is exactly what my series does, but again, it's not backport > friendly. > https://github.com/q2ven/linux/commit/413e867b4aee9e9f60f3c33fb38d2004aeb29c40 > > > > > > I had a quick look and very nearly every caller of __sock_create() > > outside of net/core really does want refcount. Many callers of > > sock_create_kern() possibly don't. > > Actually, since sock_create_kern() is added, we no longer need to > export __sock_create(), so I have a patch to convert them to > sock_create_kern(). > > And most of TCP socket does need refcnt, but non-TCP won't. > Also, handshake one is exception, which uses TCP but only in init_net, > where we need not take care of netns refcnt. > > https://github.com/q2ven/linux/commit/b56888bbbf327d57ea25a6b97275d6b9b8ad043a > > > > > > > So I really think this needs to be cleaned up in net/core, not in all > > the different network clients in the kernel. > > Yes, will be done in the next cycle. >