Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16.03.2018 17:36, Kirill Tkhai wrote:
> On 16.03.2018 16:53, Sowmini Varadhan wrote:
>>
>> Found my previous question:
>>
>> https://www.mail-archive.com/netdev@xxxxxxxxxxxxxxx/msg72330.html
>>
>> (see section about "Comments are specifically ivinted.."
> 
> I see, thanks.
> 
>>> This is not a problem, and rds-tcp is not the only pernet_subsys registering
>>> a socket. It's OK to close it from .exit method. There are many examples,
>>> let me point you to icmp_sk_ops as one of them. But it's not the only.
>>
>> I'm not averse to changing this to NETDEV_UNREGISTER
>> as long as it works for the 2 test cases below- you 
>> can test it by using rds-ping from rds-tools rpm, to
>> be used from/to init_net, from/to the netns  against
>> some external machine (i.e something not on the same
>> physical host)
>>
>>>> For rds-tcp, we need to be able to do the right thing in both of these
>>>> cases
>>>> 1. modprobe -r rds-tcp (cleanup of rds-tcp state should happen in
>>>>    every namespace, including init_net)
>>>> 2. netns delete (rds_tcp.ko should remain loaded for other namespaces)
>>>
>>> The same as above, every pernet_subsys does this. It's not a problem.
>>> exit and exit_batch methods are called in both of the cases.
>>>
>>> Please, see __unregister_pernet_operations()->ops_exit_list for the details.
>>
>> I am familiar with ops_exit_list, but this is the sequence:
>> - when the module is loaded (or netns is started) it starts a 
>>   kernel listen socket on *.16385
>> - when you start the rds-pings above, it will create kernel
>>   tcp connections from/to the 16385 in the netns. And it will
>>   start socket keepalives for those connections. Each tcp 
>>   connection is associated with a rds_connection
>>
>> As I recall, when I wrote the initial patchset, my problem
>> was that in order to let the module unload make progress,
>> all these sockets had to be cleaned up. But to clean up these
>> sockets, net_device cleanup had to complete (should not have
>> any new incoming connections to the listen endpoint on a 
>> non-loopback socket) so I ended up with a circular dependancy.
> 
> Ah, I see the reasons. Please, see my proposition at the end of this letter.
>  
>>> If we replace NETDEV_UNREGISTER_FINAL with NETDEV_UNREGISTER, the only change
>>> which happens is we call rds_tcp_kill_sock() earlier. So, it may be a reason
>>> of problems only if someone changes the list during the time between
>>> NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL are called for loopback.
>>> But since this time noone related to this net can extend the list,
>>> there is no a problem to do that.
>>
>> Please share your patch, I can review it and maybe help to test
>> it..
>>
>> As I was trying to say in my RFC, I am quite open to ways to make
>> this cleanup more obvious
> 
> How about something like this? Compile tested only.
> 
> [PATCH]rds: Use pernet device to kill RDS sockets
> 
> We register a new pernet device and use the fact,
> that loopback device is last unregistered device.
> So, on exit path, the new exit method will be called
> before loopback_dev destruction.

$git diff --patience gives better diff

[PATCH]rds: Use pernet device to kill RDS sockets
    
We register a new pernet device and use the fact,
that loopback device is last unregistered device.
So, on exit path, the new exit method will be called
before loopback_dev destruction.

Signed-off-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
---
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index eb04e7fa2467..ec37868bf2dd 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -493,28 +493,11 @@ static void __net_exit rds_tcp_exit_net(struct net *net)
 
 	if (net != &init_net && rtn->ctl_table)
 		kfree(rtn->ctl_table);
-
-	/* If rds_tcp_exit_net() is called as a result of netns deletion,
-	 * the rds_tcp_kill_sock() device notifier would already have cleaned
-	 * up the listen socket, thus there is no work to do in this function.
-	 *
-	 * If rds_tcp_exit_net() is called as a result of module unload,
-	 * i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
-	 * we do need to clean up the listen socket here.
-	 */
-	if (rtn->rds_tcp_listen_sock) {
-		struct socket *lsock = rtn->rds_tcp_listen_sock;
-
-		rtn->rds_tcp_listen_sock = NULL;
-		rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w);
-	}
 }
 
 static struct pernet_operations rds_tcp_net_ops = {
 	.init = rds_tcp_init_net,
 	.exit = rds_tcp_exit_net,
-	.id = &rds_tcp_netid,
-	.size = sizeof(struct rds_tcp_net),
 	.async = true,
 };
 
@@ -545,6 +528,27 @@ static void rds_tcp_kill_sock(struct net *net)
 		rds_conn_destroy(tc->t_cpath->cp_conn);
 }
 
+static __net_init int rds_tcp_init_dev(struct net *net)
+{
+	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+
+	rtn->rds_tcp_listen_sock = NULL;
+	return 0;
+}
+
+static void __net_exit rds_tcp_exit_dev(struct net *net)
+{
+	rds_tcp_kill_sock(net);
+}
+
+static struct pernet_operations rds_tcp_dev_ops = {
+	.init = rds_tcp_init_dev,
+	.exit = rds_tcp_exit_dev,
+	.id = &rds_tcp_netid,
+	.size = sizeof(struct rds_tcp_net),
+	.async = true,
+};
+
 void *rds_tcp_listen_sock_def_readable(struct net *net)
 {
 	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
@@ -556,29 +560,6 @@ void *rds_tcp_listen_sock_def_readable(struct net *net)
 	return lsock->sk->sk_user_data;
 }
 
-static int rds_tcp_dev_event(struct notifier_block *this,
-			     unsigned long event, void *ptr)
-{
-	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-
-	/* rds-tcp registers as a pernet subys, so the ->exit will only
-	 * get invoked after network acitivity has quiesced. We need to
-	 * clean up all sockets  to quiesce network activity, and use
-	 * the unregistration of the per-net loopback device as a trigger
-	 * to start that cleanup.
-	 */
-	if (event == NETDEV_UNREGISTER_FINAL &&
-	    dev->ifindex == LOOPBACK_IFINDEX)
-		rds_tcp_kill_sock(dev_net(dev));
-
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block rds_tcp_dev_notifier = {
-	.notifier_call        = rds_tcp_dev_event,
-	.priority = -10, /* must be called after other network notifiers */
-};
-
 /* when sysctl is used to modify some kernel socket parameters,this
  * function  resets the RDS connections in that netns  so that we can
  * restart with new parameters.  The assumption is that such reset
@@ -624,9 +605,8 @@ static void rds_tcp_exit(void)
 	rds_tcp_set_unloading();
 	synchronize_rcu();
 	rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
+	unregister_pernet_device(&rds_tcp_dev_ops);
 	unregister_pernet_subsys(&rds_tcp_net_ops);
-	if (unregister_netdevice_notifier(&rds_tcp_dev_notifier))
-		pr_warn("could not unregister rds_tcp_dev_notifier\n");
 	rds_tcp_destroy_conns();
 	rds_trans_unregister(&rds_tcp_transport);
 	rds_tcp_recv_exit();
@@ -650,15 +630,13 @@ static int rds_tcp_init(void)
 	if (ret)
 		goto out_slab;
 
-	ret = register_pernet_subsys(&rds_tcp_net_ops);
+	ret = register_pernet_device(&rds_tcp_dev_ops);
 	if (ret)
 		goto out_recv;
 
-	ret = register_netdevice_notifier(&rds_tcp_dev_notifier);
-	if (ret) {
-		pr_warn("could not register rds_tcp_dev_notifier\n");
+	ret = register_pernet_subsys(&rds_tcp_net_ops);
+	if (ret)
 		goto out_pernet;
-	}
 
 	rds_trans_register(&rds_tcp_transport);
 
@@ -667,7 +645,7 @@ static int rds_tcp_init(void)
 	goto out;
 
 out_pernet:
-	unregister_pernet_subsys(&rds_tcp_net_ops);
+	unregister_pernet_device(&rds_tcp_dev_ops);
 out_recv:
 	rds_tcp_recv_exit();
 out_slab:

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux