On Thu, Oct 13, 2016 at 10:49:38AM -0500, Eric W. Biederman wrote: > Andrei Vagin <avagin@xxxxxxxxxx> writes: > > > From: Andrey Vagin <avagin@xxxxxxxxxx> > > > > The operation of destroying netns is heavy and it is executed under > > net_mutex. If many namespaces are destroyed concurrently, net_mutex can > > be locked for a long time. It is impossible to create a new netns during > > this period of time. > > This may be the right approach or at least the right approach to bound > net_mutex hold times but I have to take exception to calling network > namespace cleanup heavy. > > The only particularly time consuming operation I have ever found are calls to > synchronize_rcu/sycrhonize_sched/synchronize_net. I booted the kernel with maxcpus=1, in this case these functions work very fast and the problem is there any way. Accoding to perf, we spend a lot of time in kobject_uevent: - 99.96% 0.00% kworker/u4:1 [kernel.kallsyms] [k] unregister_netdevice_many ▒ - unregister_netdevice_many ◆ - 99.95% rollback_registered_many ▒ - 99.64% netdev_unregister_kobject ▒ - 33.43% netdev_queue_update_kobjects ▒ - 33.40% kobject_put ▒ - kobject_release ▒ + 33.37% kobject_uevent ▒ + 0.03% kobject_del ▒ + 0.03% sysfs_remove_group ▒ - 33.13% net_rx_queue_update_kobjects ▒ - kobject_put ▒ - kobject_release ▒ + 33.11% kobject_uevent ▒ + 0.01% kobject_del ▒ 0.00% rx_queue_release ▒ - 33.08% device_del ▒ + 32.75% kobject_uevent ▒ + 0.17% device_remove_attrs ▒ + 0.07% dpm_sysfs_remove ▒ + 0.04% device_remove_class_symlinks ▒ + 0.01% kobject_del ▒ + 0.01% device_pm_remove ▒ + 0.01% sysfs_remove_file_ns ▒ + 0.00% klist_del ▒ + 0.00% driver_deferred_probe_del ▒ 0.00% cleanup_glue_dir.isra.14.part.15 ▒ 0.00% to_acpi_device_node ▒ 0.00% sysfs_remove_group ▒ 0.00% klist_del ▒ 0.00% device_remove_attrs ▒ + 0.26% call_netdevice_notifiers_info ▒ + 0.04% rtmsg_ifinfo_build_skb ▒ + 0.01% rtmsg_ifinfo_send ▒ 0.00% dev_uc_flush ▒ 0.00% netif_reset_xps_queues_gt Someone can listen these uevents, so we can't stop sending them without breaking backward compatibility. We can try to optimize kobject_uevent... > > Ideally we can search those out calls in the network namespace cleanup > operations and figuroue out how to eliminate those operations or how to > stack them. > > > In our days when userns allows to create network namespaces to > > unprivilaged users, it may be a real problem. > > Sorting out syncrhonize_rcu calls will be a much larger > and much more effective improvement than your patch here. > > > On my laptop (fedora 24, i5-5200U, 12GB) 1000 namespaces requires about > > 300MB of RAM and are being destroyed for 8 seconds. > > > > In this patch, a number of namespaces which can be cleaned up > > concurrently is limited by 32. net_mutex is released after handling each > > portion of net namespaces and then it is locked again to handle the next > > one. It allows other users to lock it without waiting for a long > > time. > > > > I am not sure whether we need to add a sysctl to costomize this limit. > > Let me know if you think it's required. > > We definitely don't need an extra sysctl. Thanks, Andrei > > Eric > > > > Cc: "David S. Miller" <davem@xxxxxxxxxxxxx> > > Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > > Signed-off-by: Andrei Vagin <avagin@xxxxxxxxxx> > > --- > > net/core/net_namespace.c | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c > > index 989434f..33dd3b7 100644 > > --- a/net/core/net_namespace.c > > +++ b/net/core/net_namespace.c > > @@ -406,10 +406,20 @@ static void cleanup_net(struct work_struct *work) > > struct net *net, *tmp; > > struct list_head net_kill_list; > > LIST_HEAD(net_exit_list); > > + int i = 0; > > > > /* Atomically snapshot the list of namespaces to cleanup */ > > spin_lock_irq(&cleanup_list_lock); > > - list_replace_init(&cleanup_list, &net_kill_list); > > + list_for_each_entry_safe(net, tmp, &cleanup_list, cleanup_list) > > + if (++i == 32) > > + break; > > + if (i == 32) { > > + list_cut_position(&net_kill_list, > > + &cleanup_list, &net->cleanup_list); > > + queue_work(netns_wq, work); > > + } else { > > + list_replace_init(&cleanup_list, &net_kill_list); > > + } > > spin_unlock_irq(&cleanup_list_lock); > > > > mutex_lock(&net_mutex); _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers