On Tue, May 26, 2015 at 02:14:24PM -0700, Alexandr Morozov wrote: > We encountered kernel panic in our tests. We think it is because we > introduced bind-mounted network namespaces, so basically for each > container we doing unshare(NEWNET), bindmount it to path and then > configure it and setns on it. Here is trace which I get on 4.0.1: > May 26 13:37:26 minigrind kernel: BUG: unable to handle kernel NULL > pointer dereference at 0000000000000016 > May 26 13:37:26 minigrind kernel: IP: [<ffffffff811d4683>] > __detach_mounts+0x33/0x80 > May 26 13:37:26 minigrind kernel: PGD 31aef9067 PUD 2b5ed8067 PMD 0 > May 26 13:37:26 minigrind kernel: Oops: 0000 [#1] PREEMPT SMP > May 26 13:37:26 minigrind kernel: Modules linked in: ipt_MASQUERADE > nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT > nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ebtable_nat ebtab > May 26 13:37:26 minigrind kernel: CPU: 0 PID: 4078 Comm: docker Not > tainted 4.0.1-gentoo #1 > May 26 13:37:26 minigrind kernel: Hardware name: LENOVO > 20AQ006HUS/20AQ006HUS, BIOS GJET77WW (2.27 ) 05/20/2014 > May 26 13:37:26 minigrind kernel: task: ffff8802b5e39980 ti: > ffff88008bfbc000 task.ti: ffff88008bfbc000 > May 26 13:37:26 minigrind kernel: RIP: 0010:[<ffffffff811d4683>] > [<ffffffff811d4683>] __detach_mounts+0x33/0x80 > May 26 13:37:26 minigrind kernel: RSP: 0018:ffff88008bfbfe38 EFLAGS: 00010202 > May 26 13:37:26 minigrind kernel: RAX: 000000000000b9b9 RBX: > fffffffffffffffe RCX: 00000000000000b9 > May 26 13:37:26 minigrind kernel: RDX: ffff8802b5e39980 RSI: > ffffffff819a10cd RDI: 0000000000000000 > May 26 13:37:26 minigrind kernel: RBP: ffff880327fbe480 R08: > 0000000000000000 R09: 0000000000000000 > May 26 13:37:26 minigrind kernel: R10: ffff88033e2197e0 R11: > 0000000000000000 R12: ffff88007dde8a78 > May 26 13:37:26 minigrind kernel: R13: ffff88007dde8ea8 R14: > ffff88008bfbfea0 R15: ffff88007dde8f40 > May 26 13:37:26 minigrind kernel: FS: 00007f7421b0a700(0000) > GS:ffff88033e200000(0000) knlGS:0000000000000000 > May 26 13:37:26 minigrind kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 CR3: > 000000031702b000 CR4: 00000000001406f0 > May 26 13:37:26 minigrind kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > May 26 13:37:26 minigrind kernel: DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 > May 26 13:37:26 minigrind kernel: Stack: > May 26 13:37:26 minigrind kernel: ffff880327fbe4d8 ffffffff811bfc82 > 00000000014007f0 00000000fffffffe > May 26 13:37:26 minigrind kernel: ffff88031724d000 0000000000000000 > ffff88008bfbfeb8 ffff88007dde8ea8 > May 26 13:37:26 minigrind kernel: 00000000ffffff9c ffffffff811c4ec8 > 000000c20858d5f0 ffff880327fbe480 > May 26 13:37:26 minigrind kernel: Call Trace: > May 26 13:37:26 minigrind kernel: [<ffffffff811bfc82>] ? vfs_unlink+0x172/0x180 > May 26 13:37:26 minigrind kernel: [<ffffffff811c4ec8>] ? > do_unlinkat+0x268/0x2d0 > May 26 13:37:26 minigrind kernel: [<ffffffff8104bdb5>] ? > syscall_trace_enter_phase1+0x195/0x1a0 > May 26 13:37:26 minigrind kernel: [<ffffffff81746216>] ? > int_check_syscall_exit_work+0x34/0x3d > May 26 13:37:26 minigrind kernel: [<ffffffff81745ff6>] ? > system_call_fastpath+0x16/0x1b > May 26 13:37:26 minigrind kernel: Code: 62 c3 81 e8 b0 fc 56 00 48 89 > df e8 18 da ff ff 48 85 c0 48 89 c3 74 55 48 c7 c7 84 b4 c0 81 e8 a4 > 0f 57 00 83 05 fd 6d a3 00 01 <48> 8b 53 18 48 85 d2 > May 26 13:37:26 minigrind kernel: RIP [<ffffffff811d4683>] > __detach_mounts+0x33/0x80 > May 26 13:37:26 minigrind kernel: RSP <ffff88008bfbfe38> > May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 > May 26 13:37:26 minigrind kernel: ---[ end trace 399f937a2cba4abb ]--- > > On 4.0.2 all is perfect for me. Great! What's the problem then? :) > My colleagues got different errors, like rcu_stall and just deadlock > when you can't create new namespaces. I think all this errors was > fixed somewhere in 4.0.2, but I'm not sure where exactly. > Test which produces panic(or hang) basically starts 16 containers in > parallel, so it is 16 unshares+bindmount then unmount those > namespaces. > Also, here is info from one of my coworkers about deadlock: > > mrjana [10:38 PM] > docker thread: > > root@jenkins-prs-7:/proc/8895/task/8931# cat stack > [<ffffffff81466465>] copy_net_ns+0x75/0x150 > [<ffffffff8108c3bd>] create_new_namespaces+0xfd/0x1a0 > [<ffffffff8108c5ea>] unshare_nsproxy_namespaces+0x5a/0xc0 > [<ffffffff8106d1c3>] SyS_unshare+0x183/0x330 > [<ffffffff8156df4d>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > mrjana [10:38 PM] > This docker thread is waiting on net_mutex > > mrjana [10:38 PM] > which is held by the kworker thread and is not returning: > > mrjana [10:39 PM] > here’s the stack trace of kernel thread: > > mrjana [10:39 PM] > root@jenkins-prs-7:/proc# cat /proc/6/stack > [<ffffffff810aec15>] mutex_optimistic_spin+0x185/0x1e0 > [<ffffffff8147d5c5>] rtnl_lock+0x15/0x20 > [<ffffffff8146c7a2>] default_device_exit_batch+0x72/0x160 > [<ffffffff81465a83>] ops_exit_list.isra.1+0x53/0x60 > [<ffffffff81466320>] cleanup_net+0x100/0x1d0 > [<ffffffff81086064>] process_one_work+0x154/0x400 > [<ffffffff81086a0b>] worker_thread+0x6b/0x490 > [<ffffffff8108b8fb>] kthread+0xdb/0x100 > [<ffffffff8156de98>] ret_from_fork+0x58/0x90 > [<ffffffffffffffff>] 0xffffffffffffffff > > mrjana [10:41 PM] > If you look at 3.18 code this thread acquires net_mutex at cleanup_net > > mrjana [10:41 PM] > but this kworker thread has never released the net_mutex > > mrjana [10:41 PM] > instead it is spinning on rtnl_lock > > We tried on our CI versions 3.18, 3.19 and 4.0.1. > > Feel free to ask if you need some additional info or machine where you > can reproduce easily. I don't understand, 4.0.2 is working, so what is there left for us to do here? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html