Re: Kernel panic 3.18 - 4.0.1

Greg KH <greg@xxxxxxxxx> · Tue, 26 May 2015 14:30:29 -0700

On Tue, May 26, 2015 at 02:14:24PM -0700, Alexandr Morozov wrote:
> We encountered kernel panic in our tests. We think it is because we
> introduced bind-mounted network namespaces, so basically for each
> container we doing unshare(NEWNET), bindmount it to path and then
> configure it and setns on it. Here is trace which I get on 4.0.1:
> May 26 13:37:26 minigrind kernel: BUG: unable to handle kernel NULL
> pointer dereference at 0000000000000016
> May 26 13:37:26 minigrind kernel: IP: [<ffffffff811d4683>]
> __detach_mounts+0x33/0x80
> May 26 13:37:26 minigrind kernel: PGD 31aef9067 PUD 2b5ed8067 PMD 0
> May 26 13:37:26 minigrind kernel: Oops: 0000 [#1] PREEMPT SMP
> May 26 13:37:26 minigrind kernel: Modules linked in: ipt_MASQUERADE
> nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ebtable_nat ebtab
> May 26 13:37:26 minigrind kernel: CPU: 0 PID: 4078 Comm: docker Not
> tainted 4.0.1-gentoo #1
> May 26 13:37:26 minigrind kernel: Hardware name: LENOVO
> 20AQ006HUS/20AQ006HUS, BIOS GJET77WW (2.27 ) 05/20/2014
> May 26 13:37:26 minigrind kernel: task: ffff8802b5e39980 ti:
> ffff88008bfbc000 task.ti: ffff88008bfbc000
> May 26 13:37:26 minigrind kernel: RIP: 0010:[<ffffffff811d4683>]
> [<ffffffff811d4683>] __detach_mounts+0x33/0x80
> May 26 13:37:26 minigrind kernel: RSP: 0018:ffff88008bfbfe38  EFLAGS: 00010202
> May 26 13:37:26 minigrind kernel: RAX: 000000000000b9b9 RBX:
> fffffffffffffffe RCX: 00000000000000b9
> May 26 13:37:26 minigrind kernel: RDX: ffff8802b5e39980 RSI:
> ffffffff819a10cd RDI: 0000000000000000
> May 26 13:37:26 minigrind kernel: RBP: ffff880327fbe480 R08:
> 0000000000000000 R09: 0000000000000000
> May 26 13:37:26 minigrind kernel: R10: ffff88033e2197e0 R11:
> 0000000000000000 R12: ffff88007dde8a78
> May 26 13:37:26 minigrind kernel: R13: ffff88007dde8ea8 R14:
> ffff88008bfbfea0 R15: ffff88007dde8f40
> May 26 13:37:26 minigrind kernel: FS:  00007f7421b0a700(0000)
> GS:ffff88033e200000(0000) knlGS:0000000000000000
> May 26 13:37:26 minigrind kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 CR3:
> 000000031702b000 CR4: 00000000001406f0
> May 26 13:37:26 minigrind kernel: DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> May 26 13:37:26 minigrind kernel: DR3: 0000000000000000 DR6:
> 00000000fffe0ff0 DR7: 0000000000000400
> May 26 13:37:26 minigrind kernel: Stack:
> May 26 13:37:26 minigrind kernel:  ffff880327fbe4d8 ffffffff811bfc82
> 00000000014007f0 00000000fffffffe
> May 26 13:37:26 minigrind kernel:  ffff88031724d000 0000000000000000
> ffff88008bfbfeb8 ffff88007dde8ea8
> May 26 13:37:26 minigrind kernel:  00000000ffffff9c ffffffff811c4ec8
> 000000c20858d5f0 ffff880327fbe480
> May 26 13:37:26 minigrind kernel: Call Trace:
> May 26 13:37:26 minigrind kernel:  [<ffffffff811bfc82>] ? vfs_unlink+0x172/0x180
> May 26 13:37:26 minigrind kernel:  [<ffffffff811c4ec8>] ?
> do_unlinkat+0x268/0x2d0
> May 26 13:37:26 minigrind kernel:  [<ffffffff8104bdb5>] ?
> syscall_trace_enter_phase1+0x195/0x1a0
> May 26 13:37:26 minigrind kernel:  [<ffffffff81746216>] ?
> int_check_syscall_exit_work+0x34/0x3d
> May 26 13:37:26 minigrind kernel:  [<ffffffff81745ff6>] ?
> system_call_fastpath+0x16/0x1b
> May 26 13:37:26 minigrind kernel: Code: 62 c3 81 e8 b0 fc 56 00 48 89
> df e8 18 da ff ff 48 85 c0 48 89 c3 74 55 48 c7 c7 84 b4 c0 81 e8 a4
> 0f 57 00 83 05 fd 6d a3 00 01 <48> 8b 53 18 48 85 d2
> May 26 13:37:26 minigrind kernel: RIP  [<ffffffff811d4683>]
> __detach_mounts+0x33/0x80
> May 26 13:37:26 minigrind kernel:  RSP <ffff88008bfbfe38>
> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016
> May 26 13:37:26 minigrind kernel: ---[ end trace 399f937a2cba4abb ]---
> 
> On 4.0.2 all is perfect for me.

Great!  What's the problem then?  :)

> My colleagues got different errors, like rcu_stall and just deadlock
> when you can't create new namespaces. I think all this errors was
> fixed somewhere in 4.0.2, but I'm not sure where exactly.
> Test which produces panic(or hang) basically starts 16 containers in
> parallel, so it is 16 unshares+bindmount then unmount those
> namespaces.
> Also, here is info from one of my coworkers about deadlock:
> 
> mrjana [10:38 PM]
> docker thread:
> 
> root@jenkins-prs-7:/proc/8895/task/8931# cat stack
> [<ffffffff81466465>] copy_net_ns+0x75/0x150
> [<ffffffff8108c3bd>] create_new_namespaces+0xfd/0x1a0
> [<ffffffff8108c5ea>] unshare_nsproxy_namespaces+0x5a/0xc0
> [<ffffffff8106d1c3>] SyS_unshare+0x183/0x330
> [<ffffffff8156df4d>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> mrjana [10:38 PM]
> This docker thread is waiting on net_mutex
> 
> mrjana [10:38 PM]
> which is held by the kworker thread and is not returning:
> 
> mrjana [10:39 PM]
> here’s the stack trace of kernel thread:
> 
> mrjana [10:39 PM]
> root@jenkins-prs-7:/proc# cat /proc/6/stack
> [<ffffffff810aec15>] mutex_optimistic_spin+0x185/0x1e0
> [<ffffffff8147d5c5>] rtnl_lock+0x15/0x20
> [<ffffffff8146c7a2>] default_device_exit_batch+0x72/0x160
> [<ffffffff81465a83>] ops_exit_list.isra.1+0x53/0x60
> [<ffffffff81466320>] cleanup_net+0x100/0x1d0
> [<ffffffff81086064>] process_one_work+0x154/0x400
> [<ffffffff81086a0b>] worker_thread+0x6b/0x490
> [<ffffffff8108b8fb>] kthread+0xdb/0x100
> [<ffffffff8156de98>] ret_from_fork+0x58/0x90
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> mrjana [10:41 PM]
> If you look at 3.18 code this thread acquires net_mutex at cleanup_net
> 
> mrjana [10:41 PM]
> but this kworker thread has never released the net_mutex
> 
> mrjana [10:41 PM]
> instead it is spinning on rtnl_lock
> 
> We tried on our CI versions 3.18, 3.19 and 4.0.1.
> 
> Feel free to ask if you need some additional info or machine where you
> can reproduce easily.

I don't understand, 4.0.2 is working, so what is there left for us to do
here?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html