On Tue, May 3, 2022 at 6:04 PM Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > On 2022/05/04 7:37, Eric Dumazet wrote: > >> I think we merged this patch too soon. > >> > >> My question is : What prevents rds_tcp_conn_path_connect(), and thus > >> rds_tcp_tune() to be called > >> after the netns refcount already reached 0 ? > >> > >> I guess we can wait for next syzbot report, but I think that get_net() > >> should be replaced > >> by maybe_get_net() > > > > Yes, syzbot was fast to trigger this exact issue: > > Does maybe_get_net() help? > > Since rds_conn_net() returns a net namespace without holding a ref, it is theoretically > possible that the net namespace returned by rds_conn_net() is already kmem_cache_free()d > if refcount dropped to 0 by the moment sk_alloc() calls sock_net_set(). Nope. RDS has an exit() handler called from cleanup_net() (struct pernet_operations)->exit() or exit_batch() : rds_tcp_exit_net() (rds_tcp_kill_sock()) This exit() handler _has_ to remove all known listeners, and definitely cancel work queues (synchronous operation) before the actual "struct net" free can happen later. > > rds_tcp_conn_path_connect() { > sock_create_kern(net = rds_conn_net(conn)) { > __sock_create(net = rds_conn_net(conn), kern = 1) { > err = pf->create(net = rds_conn_net(conn), kern = 1) { > // pf->create is either inet_create or inet6_create > sk_alloc(net = rds_conn_net(conn), kern = 1) { > sk->sk_net_refcnt = kern ? 0 : 1; > if (likely(sk->sk_net_refcnt)) { > get_net_track(net, &sk->ns_tracker, priority); > sock_inuse_add(net, 1); > } > sock_net_set(sk, net); > } > } > } > } > rds_tcp_tune() { > if (!sk->sk_net_refcnt) { > sk->sk_net_refcnt = 1; > get_net_track(net, &sk->ns_tracker, GFP_KERNEL); > sock_inuse_add(net, 1); > } > } > } > > "struct rds_connection" needs to hold a ref in order to safely allow > rds_tcp_tune() to call maybe_get_net(), which in turn makes pointless > to use maybe_get_net() from rds_tcp_tune() because "struct rds_connection" > must have a ref. Situation where we are protected by maybe_get_net() is > quite limited if long-lived object is not holding a ref. > > Hmm, can we simply use &init_net instead of rds_conn_net(conn) ? Only if you plan making RDS unavailable for non init netns.