On 16.03.2018 16:53, Sowmini Varadhan wrote: > > Found my previous question: > > https://www.mail-archive.com/netdev@xxxxxxxxxxxxxxx/msg72330.html > > (see section about "Comments are specifically ivinted.." I see, thanks. >> This is not a problem, and rds-tcp is not the only pernet_subsys registering >> a socket. It's OK to close it from .exit method. There are many examples, >> let me point you to icmp_sk_ops as one of them. But it's not the only. > > I'm not averse to changing this to NETDEV_UNREGISTER > as long as it works for the 2 test cases below- you > can test it by using rds-ping from rds-tools rpm, to > be used from/to init_net, from/to the netns against > some external machine (i.e something not on the same > physical host) > >>> For rds-tcp, we need to be able to do the right thing in both of these >>> cases >>> 1. modprobe -r rds-tcp (cleanup of rds-tcp state should happen in >>> every namespace, including init_net) >>> 2. netns delete (rds_tcp.ko should remain loaded for other namespaces) >> >> The same as above, every pernet_subsys does this. It's not a problem. >> exit and exit_batch methods are called in both of the cases. >> >> Please, see __unregister_pernet_operations()->ops_exit_list for the details. > > I am familiar with ops_exit_list, but this is the sequence: > - when the module is loaded (or netns is started) it starts a > kernel listen socket on *.16385 > - when you start the rds-pings above, it will create kernel > tcp connections from/to the 16385 in the netns. And it will > start socket keepalives for those connections. Each tcp > connection is associated with a rds_connection > > As I recall, when I wrote the initial patchset, my problem > was that in order to let the module unload make progress, > all these sockets had to be cleaned up. But to clean up these > sockets, net_device cleanup had to complete (should not have > any new incoming connections to the listen endpoint on a > non-loopback socket) so I ended up with a circular dependancy. Ah, I see the reasons. Please, see my proposition at the end of this letter. >> If we replace NETDEV_UNREGISTER_FINAL with NETDEV_UNREGISTER, the only change >> which happens is we call rds_tcp_kill_sock() earlier. So, it may be a reason >> of problems only if someone changes the list during the time between >> NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL are called for loopback. >> But since this time noone related to this net can extend the list, >> there is no a problem to do that. > > Please share your patch, I can review it and maybe help to test > it.. > > As I was trying to say in my RFC, I am quite open to ways to make > this cleanup more obvious How about something like this? Compile tested only. [PATCH]rds: Use pernet device to kill RDS sockets We register a new pernet device and use the fact, that loopback device is last unregistered device. So, on exit path, the new exit method will be called before loopback_dev destruction. --- diff --git a/net/rds/tcp.c b/net/rds/tcp.c index eb04e7fa2467..ec37868bf2dd 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -493,28 +493,11 @@ static void __net_exit rds_tcp_exit_net(struct net *net) if (net != &init_net && rtn->ctl_table) kfree(rtn->ctl_table); - - /* If rds_tcp_exit_net() is called as a result of netns deletion, - * the rds_tcp_kill_sock() device notifier would already have cleaned - * up the listen socket, thus there is no work to do in this function. - * - * If rds_tcp_exit_net() is called as a result of module unload, - * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then - * we do need to clean up the listen socket here. - */ - if (rtn->rds_tcp_listen_sock) { - struct socket *lsock = rtn->rds_tcp_listen_sock; - - rtn->rds_tcp_listen_sock = NULL; - rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w); - } } static struct pernet_operations rds_tcp_net_ops = { .init = rds_tcp_init_net, .exit = rds_tcp_exit_net, - .id = &rds_tcp_netid, - .size = sizeof(struct rds_tcp_net), .async = true, }; @@ -545,40 +528,38 @@ static void rds_tcp_kill_sock(struct net *net) rds_conn_destroy(tc->t_cpath->cp_conn); } -void *rds_tcp_listen_sock_def_readable(struct net *net) +static __net_init int rds_tcp_init_dev(struct net *net) { struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); - struct socket *lsock = rtn->rds_tcp_listen_sock; - if (!lsock) - return NULL; + rtn->rds_tcp_listen_sock = NULL; + return 0; +} - return lsock->sk->sk_user_data; +static void __net_exit rds_tcp_exit_dev(struct net *net) +{ + rds_tcp_kill_sock(net); } -static int rds_tcp_dev_event(struct notifier_block *this, - unsigned long event, void *ptr) +static struct pernet_operations rds_tcp_dev_ops = { + .init = rds_tcp_init_dev, + .exit = rds_tcp_exit_dev, + .id = &rds_tcp_netid, + .size = sizeof(struct rds_tcp_net), + .async = true, +}; + +void *rds_tcp_listen_sock_def_readable(struct net *net) { - struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + struct socket *lsock = rtn->rds_tcp_listen_sock; - /* rds-tcp registers as a pernet subys, so the ->exit will only - * get invoked after network acitivity has quiesced. We need to - * clean up all sockets to quiesce network activity, and use - * the unregistration of the per-net loopback device as a trigger - * to start that cleanup. - */ - if (event == NETDEV_UNREGISTER_FINAL && - dev->ifindex == LOOPBACK_IFINDEX) - rds_tcp_kill_sock(dev_net(dev)); + if (!lsock) + return NULL; - return NOTIFY_DONE; + return lsock->sk->sk_user_data; } -static struct notifier_block rds_tcp_dev_notifier = { - .notifier_call = rds_tcp_dev_event, - .priority = -10, /* must be called after other network notifiers */ -}; - /* when sysctl is used to modify some kernel socket parameters,this * function resets the RDS connections in that netns so that we can * restart with new parameters. The assumption is that such reset @@ -624,9 +605,8 @@ static void rds_tcp_exit(void) rds_tcp_set_unloading(); synchronize_rcu(); rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info); + unregister_pernet_device(&rds_tcp_dev_ops); unregister_pernet_subsys(&rds_tcp_net_ops); - if (unregister_netdevice_notifier(&rds_tcp_dev_notifier)) - pr_warn("could not unregister rds_tcp_dev_notifier\n"); rds_tcp_destroy_conns(); rds_trans_unregister(&rds_tcp_transport); rds_tcp_recv_exit(); @@ -650,15 +630,13 @@ static int rds_tcp_init(void) if (ret) goto out_slab; - ret = register_pernet_subsys(&rds_tcp_net_ops); + ret = register_pernet_device(&rds_tcp_dev_ops); if (ret) goto out_recv; - ret = register_netdevice_notifier(&rds_tcp_dev_notifier); - if (ret) { - pr_warn("could not register rds_tcp_dev_notifier\n"); + ret = register_pernet_subsys(&rds_tcp_net_ops); + if (ret) goto out_pernet; - } rds_trans_register(&rds_tcp_transport); @@ -667,7 +645,7 @@ static int rds_tcp_init(void) goto out; out_pernet: - unregister_pernet_subsys(&rds_tcp_net_ops); + unregister_pernet_device(&rds_tcp_dev_ops); out_recv: rds_tcp_recv_exit(); out_slab: -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html