We encountered kernel panic in our tests. We think it is because we introduced bind-mounted network namespaces, so basically for each container we doing unshare(NEWNET), bindmount it to path and then configure it and setns on it. Here is trace which I get on 4.0.1: May 26 13:37:26 minigrind kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000016 May 26 13:37:26 minigrind kernel: IP: [<ffffffff811d4683>] __detach_mounts+0x33/0x80 May 26 13:37:26 minigrind kernel: PGD 31aef9067 PUD 2b5ed8067 PMD 0 May 26 13:37:26 minigrind kernel: Oops: 0000 [#1] PREEMPT SMP May 26 13:37:26 minigrind kernel: Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ebtable_nat ebtab May 26 13:37:26 minigrind kernel: CPU: 0 PID: 4078 Comm: docker Not tainted 4.0.1-gentoo #1 May 26 13:37:26 minigrind kernel: Hardware name: LENOVO 20AQ006HUS/20AQ006HUS, BIOS GJET77WW (2.27 ) 05/20/2014 May 26 13:37:26 minigrind kernel: task: ffff8802b5e39980 ti: ffff88008bfbc000 task.ti: ffff88008bfbc000 May 26 13:37:26 minigrind kernel: RIP: 0010:[<ffffffff811d4683>] [<ffffffff811d4683>] __detach_mounts+0x33/0x80 May 26 13:37:26 minigrind kernel: RSP: 0018:ffff88008bfbfe38 EFLAGS: 00010202 May 26 13:37:26 minigrind kernel: RAX: 000000000000b9b9 RBX: fffffffffffffffe RCX: 00000000000000b9 May 26 13:37:26 minigrind kernel: RDX: ffff8802b5e39980 RSI: ffffffff819a10cd RDI: 0000000000000000 May 26 13:37:26 minigrind kernel: RBP: ffff880327fbe480 R08: 0000000000000000 R09: 0000000000000000 May 26 13:37:26 minigrind kernel: R10: ffff88033e2197e0 R11: 0000000000000000 R12: ffff88007dde8a78 May 26 13:37:26 minigrind kernel: R13: ffff88007dde8ea8 R14: ffff88008bfbfea0 R15: ffff88007dde8f40 May 26 13:37:26 minigrind kernel: FS: 00007f7421b0a700(0000) GS:ffff88033e200000(0000) knlGS:0000000000000000 May 26 13:37:26 minigrind kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 CR3: 000000031702b000 CR4: 00000000001406f0 May 26 13:37:26 minigrind kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 26 13:37:26 minigrind kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 26 13:37:26 minigrind kernel: Stack: May 26 13:37:26 minigrind kernel: ffff880327fbe4d8 ffffffff811bfc82 00000000014007f0 00000000fffffffe May 26 13:37:26 minigrind kernel: ffff88031724d000 0000000000000000 ffff88008bfbfeb8 ffff88007dde8ea8 May 26 13:37:26 minigrind kernel: 00000000ffffff9c ffffffff811c4ec8 000000c20858d5f0 ffff880327fbe480 May 26 13:37:26 minigrind kernel: Call Trace: May 26 13:37:26 minigrind kernel: [<ffffffff811bfc82>] ? vfs_unlink+0x172/0x180 May 26 13:37:26 minigrind kernel: [<ffffffff811c4ec8>] ? do_unlinkat+0x268/0x2d0 May 26 13:37:26 minigrind kernel: [<ffffffff8104bdb5>] ? syscall_trace_enter_phase1+0x195/0x1a0 May 26 13:37:26 minigrind kernel: [<ffffffff81746216>] ? int_check_syscall_exit_work+0x34/0x3d May 26 13:37:26 minigrind kernel: [<ffffffff81745ff6>] ? system_call_fastpath+0x16/0x1b May 26 13:37:26 minigrind kernel: Code: 62 c3 81 e8 b0 fc 56 00 48 89 df e8 18 da ff ff 48 85 c0 48 89 c3 74 55 48 c7 c7 84 b4 c0 81 e8 a4 0f 57 00 83 05 fd 6d a3 00 01 <48> 8b 53 18 48 85 d2 May 26 13:37:26 minigrind kernel: RIP [<ffffffff811d4683>] __detach_mounts+0x33/0x80 May 26 13:37:26 minigrind kernel: RSP <ffff88008bfbfe38> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 May 26 13:37:26 minigrind kernel: ---[ end trace 399f937a2cba4abb ]--- On 4.0.2 all is perfect for me. My colleagues got different errors, like rcu_stall and just deadlock when you can't create new namespaces. I think all this errors was fixed somewhere in 4.0.2, but I'm not sure where exactly. Test which produces panic(or hang) basically starts 16 containers in parallel, so it is 16 unshares+bindmount then unmount those namespaces. Also, here is info from one of my coworkers about deadlock: mrjana [10:38 PM] docker thread: root@jenkins-prs-7:/proc/8895/task/8931# cat stack [<ffffffff81466465>] copy_net_ns+0x75/0x150 [<ffffffff8108c3bd>] create_new_namespaces+0xfd/0x1a0 [<ffffffff8108c5ea>] unshare_nsproxy_namespaces+0x5a/0xc0 [<ffffffff8106d1c3>] SyS_unshare+0x183/0x330 [<ffffffff8156df4d>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff mrjana [10:38 PM] This docker thread is waiting on net_mutex mrjana [10:38 PM] which is held by the kworker thread and is not returning: mrjana [10:39 PM] here’s the stack trace of kernel thread: mrjana [10:39 PM] root@jenkins-prs-7:/proc# cat /proc/6/stack [<ffffffff810aec15>] mutex_optimistic_spin+0x185/0x1e0 [<ffffffff8147d5c5>] rtnl_lock+0x15/0x20 [<ffffffff8146c7a2>] default_device_exit_batch+0x72/0x160 [<ffffffff81465a83>] ops_exit_list.isra.1+0x53/0x60 [<ffffffff81466320>] cleanup_net+0x100/0x1d0 [<ffffffff81086064>] process_one_work+0x154/0x400 [<ffffffff81086a0b>] worker_thread+0x6b/0x490 [<ffffffff8108b8fb>] kthread+0xdb/0x100 [<ffffffff8156de98>] ret_from_fork+0x58/0x90 [<ffffffffffffffff>] 0xffffffffffffffff mrjana [10:41 PM] If you look at 3.18 code this thread acquires net_mutex at cleanup_net mrjana [10:41 PM] but this kworker thread has never released the net_mutex mrjana [10:41 PM] instead it is spinning on rtnl_lock We tried on our CI versions 3.18, 3.19 and 4.0.1. Feel free to ask if you need some additional info or machine where you can reproduce easily. Thanks! -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html