Re: netns: Issues with deleting virtual interfaces during namespace cleanup

Renato Westphal <renatowestphal@xxxxxxxxx> · Sun, 27 Feb 2011 02:16:23 -0300

Hello David,

You may try the patch below (kernel v2.6.35) and see if that helps. It
basically does what you asked for: during namespace cleanup, move back the
virtual interfaces to their original namespaces. I did some tests with veth
pairs and nested netns's and everything worked fine.

I think this should be the default behaviour, I would like if someone could
review/fix this patch and push it upstream.

Have a good day,
Renato.

2011/2/26 Daniel Lezcano <daniel.lezcano@xxxxxxx>

> On 02/26/2011 05:59 PM, Ward, David - 0663 - MITLL wrote:
> > (Apologies for the cross-post, but Thunderbird messed up the formatting
> > when I sent this originally, and then I realized I sent it to the wrong
> > list.)
> >
> > A patch was applied to the kernel in November 2008 that deletes virtual
> > network interfaces when network namespaces are cleaned up
> > (d0c082cea6dfb9b674b4f6e1e84025662dbd24e8). A discussion about this
> > patch took place on this list
> > (
> https://lists.linux-foundation.org/pipermail/containers/2008-October/013460.html
> ),
> > where Daniel Lezcano wrote:
> >
> >  > After discussing with Benjamin, this patch means an user can no longer
> >  > manage a pool of virtual devices because they will be automatically
> >  > destroyed when the namespace exits. I don't think it is a big concern,
> >  > but just in case I am asking :)
> >
> > I currently have two use cases where this behavior is not desirable:
> >
> > 1. I use a veth pair device to connect two containers together (as
> > opposed to connecting a container to the host). To do this, I
> > create the veth pair device manually in the host with iproute2
> > ("ip link add type veth"). Then when I start each container, it
> > pulls in one of the interfaces of the veth pair device with
> > "lxc.network.type = phys". When I stop one of the containers, its
> > interface to the veth pair device is deleted instead of moved back
> > to the host, so I can not just start the stopped container again
> > and re-establish the same link.
>
> Maybe you can rely on the lxc configuration to do that.
>
> Assuming you create the two container always in the same order.
>
> The first one:
>
> lxc.network.type=veth
> lxc.network.veth.pair=vethX
>
> The second one
>
> lxc.network.type=phys
> lxc.network.link=vethX
>
> The drawback is you have to stop / start both of them.
>
>
> Otherwise, why don't you use the macvlan configuration ?
>
> For both containers:
>
> lxc.network.type=macvlan
> lxc.network.macvlan.mode=bridge
> lxc.network.link=dummy0
>
>
> > 2. I start a process in the host that creates a TUN/TAP interface,
> > such as a VPN client. I pull the TUN/TAP interface into the
> > container with "lxc.network.type = phys". When the container
> > exits, the TUN/TAP interface is deleted because it is a virtual
> > interface, while the VPN client process continues to run in the
> > host. Again I can not just start the container again with the
> > same connection; I have to restart the VPN client.
> >
> > It makes sense that virtual network interfaces that get created inside a
> > container should be deleted when the container exits. However, I feel
> > that network interfaces from the host that get assigned to the container
> > should be returned to the host when the container exits, whether they
> > are physical or virtual.
>
> Wouldn't make sense to add a configuration option for lxc to create such
> device and handle the vpn client ?
>
> There is the lxc.network.script.up option where you can launch your vpn
> client. So adding the tun/tap interface as a network option, lxc will
> create it for you and when it is up, the up script is invoked where the
> vpn client is launched.
>
> The lxc.network.script.down does not exist yet, but it is quite easy to
> add the option.
>
> What do you think ?
>
> > Can the kernel distinguish between network interfaces that were created
> > inside the namespace, and network interfaces that were moved there?
>
> IMHO that will add more complexity to the network namespace, especially
> to handle the nested namespaces. Furthermore that will impact the
> current design. I am not really in favor of that as that was initial
> behavior and there were limitations.
>  <javascript:void(0);>
> _______________________________________________
> Containers mailing list
> Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> https://lists.linux-foundation.org/mailman/listinfo/containers
>



-- 
Renato Westphal
commit 4b938c007d9a20d7ee6753083d7a9c6b1f098671
Author: Renato Westphal <rwestphal@xxxxxxxxxxxx>
Date:   Sun Feb 27 02:07:56 2011 -0300

    netns: Preserve imported virtual interfaces during namespace cleanup

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b21e405..7cce799 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1019,6 +1019,8 @@ struct net_device {
 #ifdef CONFIG_NET_NS
 	/* Network namespace this network device is inside */
 	struct net		*nd_net;
+	/* Initial network namespace of this network device */
+	struct net		*nd_init_net;
 #endif
 
 	/* mid-layer private */
diff --git a/net/core/dev.c b/net/core/dev.c
index f3a24c4..16d9bc4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5830,6 +5830,7 @@ static struct pernet_operations __net_initdata netdev_net_ops = {
 static void __net_exit default_device_exit(struct net *net)
 {
 	struct net_device *dev, *aux;
+	struct net *dest_net;
 	/*
 	 * Push all migratable network devices back to the
 	 * initial network namespace
@@ -5844,12 +5845,13 @@ static void __net_exit default_device_exit(struct net *net)
 			continue;
 
 		/* Leave virtual devices for the generic cleanup */
-		if (dev->rtnl_link_ops)
+		if (dev->rtnl_link_ops && dev->nd_net == dev->nd_init_net)
 			continue;
 
 		/* Push remaing network devices to init_net */
+		dest_net = dev->rtnl_link_ops ? dev->nd_init_net : &init_net;
 		snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
-		err = dev_change_net_namespace(dev, &init_net, fb_name);
+		err = dev_change_net_namespace(dev, dest_net, fb_name);
 		if (err) {
 			printk(KERN_EMERG "%s: failed to move %s to init_net: %d\n",
 				__func__, dev->name, err);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 19bedd5..b2e3155 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1394,6 +1394,7 @@ struct net_device *rtnl_create_link(struct net *src_net, struct net *net,
 		goto err;
 
 	dev_net_set(dev, net);
+	dev->nd_init_net = dev_net(dev);
 	dev->rtnl_link_ops = ops;
 	dev->rtnl_link_state = RTNL_LINK_INITIALIZING;
 	dev->real_num_tx_queues = real_num_queues;
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/containers