On 16.03.2018 17:36, Kirill Tkhai wrote: > On 16.03.2018 16:53, Sowmini Varadhan wrote: >> >> Found my previous question: >> >> https://www.mail-archive.com/netdev@xxxxxxxxxxxxxxx/msg72330.html >> >> (see section about "Comments are specifically ivinted.." > > I see, thanks. > >>> This is not a problem, and rds-tcp is not the only pernet_subsys registering >>> a socket. It's OK to close it from .exit method. There are many examples, >>> let me point you to icmp_sk_ops as one of them. But it's not the only. >> >> I'm not averse to changing this to NETDEV_UNREGISTER >> as long as it works for the 2 test cases below- you >> can test it by using rds-ping from rds-tools rpm, to >> be used from/to init_net, from/to the netns against >> some external machine (i.e something not on the same >> physical host) >> >>>> For rds-tcp, we need to be able to do the right thing in both of these >>>> cases >>>> 1. modprobe -r rds-tcp (cleanup of rds-tcp state should happen in >>>> every namespace, including init_net) >>>> 2. netns delete (rds_tcp.ko should remain loaded for other namespaces) >>> >>> The same as above, every pernet_subsys does this. It's not a problem. >>> exit and exit_batch methods are called in both of the cases. >>> >>> Please, see __unregister_pernet_operations()->ops_exit_list for the details. >> >> I am familiar with ops_exit_list, but this is the sequence: >> - when the module is loaded (or netns is started) it starts a >> kernel listen socket on *.16385 >> - when you start the rds-pings above, it will create kernel >> tcp connections from/to the 16385 in the netns. And it will >> start socket keepalives for those connections. Each tcp >> connection is associated with a rds_connection >> >> As I recall, when I wrote the initial patchset, my problem >> was that in order to let the module unload make progress, >> all these sockets had to be cleaned up. But to clean up these >> sockets, net_device cleanup had to complete (should not have >> any new incoming connections to the listen endpoint on a >> non-loopback socket) so I ended up with a circular dependancy. > > Ah, I see the reasons. Please, see my proposition at the end of this letter. > >>> If we replace NETDEV_UNREGISTER_FINAL with NETDEV_UNREGISTER, the only change >>> which happens is we call rds_tcp_kill_sock() earlier. So, it may be a reason >>> of problems only if someone changes the list during the time between >>> NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL are called for loopback. >>> But since this time noone related to this net can extend the list, >>> there is no a problem to do that. >> >> Please share your patch, I can review it and maybe help to test >> it.. >> >> As I was trying to say in my RFC, I am quite open to ways to make >> this cleanup more obvious > > How about something like this? Compile tested only. > > [PATCH]rds: Use pernet device to kill RDS sockets > > We register a new pernet device and use the fact, > that loopback device is last unregistered device. > So, on exit path, the new exit method will be called > before loopback_dev destruction. $git diff --patience gives better diff [PATCH]rds: Use pernet device to kill RDS sockets We register a new pernet device and use the fact, that loopback device is last unregistered device. So, on exit path, the new exit method will be called before loopback_dev destruction. Signed-off-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> --- diff --git a/net/rds/tcp.c b/net/rds/tcp.c index eb04e7fa2467..ec37868bf2dd 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -493,28 +493,11 @@ static void __net_exit rds_tcp_exit_net(struct net *net) if (net != &init_net && rtn->ctl_table) kfree(rtn->ctl_table); - - /* If rds_tcp_exit_net() is called as a result of netns deletion, - * the rds_tcp_kill_sock() device notifier would already have cleaned - * up the listen socket, thus there is no work to do in this function. - * - * If rds_tcp_exit_net() is called as a result of module unload, - * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then - * we do need to clean up the listen socket here. - */ - if (rtn->rds_tcp_listen_sock) { - struct socket *lsock = rtn->rds_tcp_listen_sock; - - rtn->rds_tcp_listen_sock = NULL; - rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w); - } } static struct pernet_operations rds_tcp_net_ops = { .init = rds_tcp_init_net, .exit = rds_tcp_exit_net, - .id = &rds_tcp_netid, - .size = sizeof(struct rds_tcp_net), .async = true, }; @@ -545,6 +528,27 @@ static void rds_tcp_kill_sock(struct net *net) rds_conn_destroy(tc->t_cpath->cp_conn); } +static __net_init int rds_tcp_init_dev(struct net *net) +{ + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + + rtn->rds_tcp_listen_sock = NULL; + return 0; +} + +static void __net_exit rds_tcp_exit_dev(struct net *net) +{ + rds_tcp_kill_sock(net); +} + +static struct pernet_operations rds_tcp_dev_ops = { + .init = rds_tcp_init_dev, + .exit = rds_tcp_exit_dev, + .id = &rds_tcp_netid, + .size = sizeof(struct rds_tcp_net), + .async = true, +}; + void *rds_tcp_listen_sock_def_readable(struct net *net) { struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); @@ -556,29 +560,6 @@ void *rds_tcp_listen_sock_def_readable(struct net *net) return lsock->sk->sk_user_data; } -static int rds_tcp_dev_event(struct notifier_block *this, - unsigned long event, void *ptr) -{ - struct net_device *dev = netdev_notifier_info_to_dev(ptr); - - /* rds-tcp registers as a pernet subys, so the ->exit will only - * get invoked after network acitivity has quiesced. We need to - * clean up all sockets to quiesce network activity, and use - * the unregistration of the per-net loopback device as a trigger - * to start that cleanup. - */ - if (event == NETDEV_UNREGISTER_FINAL && - dev->ifindex == LOOPBACK_IFINDEX) - rds_tcp_kill_sock(dev_net(dev)); - - return NOTIFY_DONE; -} - -static struct notifier_block rds_tcp_dev_notifier = { - .notifier_call = rds_tcp_dev_event, - .priority = -10, /* must be called after other network notifiers */ -}; - /* when sysctl is used to modify some kernel socket parameters,this * function resets the RDS connections in that netns so that we can * restart with new parameters. The assumption is that such reset @@ -624,9 +605,8 @@ static void rds_tcp_exit(void) rds_tcp_set_unloading(); synchronize_rcu(); rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info); + unregister_pernet_device(&rds_tcp_dev_ops); unregister_pernet_subsys(&rds_tcp_net_ops); - if (unregister_netdevice_notifier(&rds_tcp_dev_notifier)) - pr_warn("could not unregister rds_tcp_dev_notifier\n"); rds_tcp_destroy_conns(); rds_trans_unregister(&rds_tcp_transport); rds_tcp_recv_exit(); @@ -650,15 +630,13 @@ static int rds_tcp_init(void) if (ret) goto out_slab; - ret = register_pernet_subsys(&rds_tcp_net_ops); + ret = register_pernet_device(&rds_tcp_dev_ops); if (ret) goto out_recv; - ret = register_netdevice_notifier(&rds_tcp_dev_notifier); - if (ret) { - pr_warn("could not register rds_tcp_dev_notifier\n"); + ret = register_pernet_subsys(&rds_tcp_net_ops); + if (ret) goto out_pernet; - } rds_trans_register(&rds_tcp_transport); @@ -667,7 +645,7 @@ static int rds_tcp_init(void) goto out; out_pernet: - unregister_pernet_subsys(&rds_tcp_net_ops); + unregister_pernet_device(&rds_tcp_dev_ops); out_recv: rds_tcp_recv_exit(); out_slab: -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html