Re: Kernel panic 3.18 - 4.0.1

Alexandr Morozov <alexandr.morozov@xxxxxxxxxx> · Tue, 26 May 2015 14:37:02 -0700

Ah, sorry. I though it is possible to backport fix to other stable
branches like 3.18 and 3.19. We use 3.19.6 and 3.18.14, seems like
3.18.14 is last longterm on kernel.org, we can try 3.19.8 too.
Sorry, I might hit wrong maillist for such problem, let me know. We'll
try 3.19.8 in meantime.

On Tue, May 26, 2015 at 2:30 PM, Greg KH <greg@xxxxxxxxx> wrote:
> On Tue, May 26, 2015 at 02:14:24PM -0700, Alexandr Morozov wrote:
>> We encountered kernel panic in our tests. We think it is because we
>> introduced bind-mounted network namespaces, so basically for each
>> container we doing unshare(NEWNET), bindmount it to path and then
>> configure it and setns on it. Here is trace which I get on 4.0.1:
>> May 26 13:37:26 minigrind kernel: BUG: unable to handle kernel NULL
>> pointer dereference at 0000000000000016
>> May 26 13:37:26 minigrind kernel: IP: [<ffffffff811d4683>]
>> __detach_mounts+0x33/0x80
>> May 26 13:37:26 minigrind kernel: PGD 31aef9067 PUD 2b5ed8067 PMD 0
>> May 26 13:37:26 minigrind kernel: Oops: 0000 [#1] PREEMPT SMP
>> May 26 13:37:26 minigrind kernel: Modules linked in: ipt_MASQUERADE
>> nf_nat_masquerade_ipv4 bridge stp llc overlay ip6t_REJECT
>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ebtable_nat ebtab
>> May 26 13:37:26 minigrind kernel: CPU: 0 PID: 4078 Comm: docker Not
>> tainted 4.0.1-gentoo #1
>> May 26 13:37:26 minigrind kernel: Hardware name: LENOVO
>> 20AQ006HUS/20AQ006HUS, BIOS GJET77WW (2.27 ) 05/20/2014
>> May 26 13:37:26 minigrind kernel: task: ffff8802b5e39980 ti:
>> ffff88008bfbc000 task.ti: ffff88008bfbc000
>> May 26 13:37:26 minigrind kernel: RIP: 0010:[<ffffffff811d4683>]
>> [<ffffffff811d4683>] __detach_mounts+0x33/0x80
>> May 26 13:37:26 minigrind kernel: RSP: 0018:ffff88008bfbfe38  EFLAGS: 00010202
>> May 26 13:37:26 minigrind kernel: RAX: 000000000000b9b9 RBX:
>> fffffffffffffffe RCX: 00000000000000b9
>> May 26 13:37:26 minigrind kernel: RDX: ffff8802b5e39980 RSI:
>> ffffffff819a10cd RDI: 0000000000000000
>> May 26 13:37:26 minigrind kernel: RBP: ffff880327fbe480 R08:
>> 0000000000000000 R09: 0000000000000000
>> May 26 13:37:26 minigrind kernel: R10: ffff88033e2197e0 R11:
>> 0000000000000000 R12: ffff88007dde8a78
>> May 26 13:37:26 minigrind kernel: R13: ffff88007dde8ea8 R14:
>> ffff88008bfbfea0 R15: ffff88007dde8f40
>> May 26 13:37:26 minigrind kernel: FS:  00007f7421b0a700(0000)
>> GS:ffff88033e200000(0000) knlGS:0000000000000000
>> May 26 13:37:26 minigrind kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016 CR3:
>> 000000031702b000 CR4: 00000000001406f0
>> May 26 13:37:26 minigrind kernel: DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> May 26 13:37:26 minigrind kernel: DR3: 0000000000000000 DR6:
>> 00000000fffe0ff0 DR7: 0000000000000400
>> May 26 13:37:26 minigrind kernel: Stack:
>> May 26 13:37:26 minigrind kernel:  ffff880327fbe4d8 ffffffff811bfc82
>> 00000000014007f0 00000000fffffffe
>> May 26 13:37:26 minigrind kernel:  ffff88031724d000 0000000000000000
>> ffff88008bfbfeb8 ffff88007dde8ea8
>> May 26 13:37:26 minigrind kernel:  00000000ffffff9c ffffffff811c4ec8
>> 000000c20858d5f0 ffff880327fbe480
>> May 26 13:37:26 minigrind kernel: Call Trace:
>> May 26 13:37:26 minigrind kernel:  [<ffffffff811bfc82>] ? vfs_unlink+0x172/0x180
>> May 26 13:37:26 minigrind kernel:  [<ffffffff811c4ec8>] ?
>> do_unlinkat+0x268/0x2d0
>> May 26 13:37:26 minigrind kernel:  [<ffffffff8104bdb5>] ?
>> syscall_trace_enter_phase1+0x195/0x1a0
>> May 26 13:37:26 minigrind kernel:  [<ffffffff81746216>] ?
>> int_check_syscall_exit_work+0x34/0x3d
>> May 26 13:37:26 minigrind kernel:  [<ffffffff81745ff6>] ?
>> system_call_fastpath+0x16/0x1b
>> May 26 13:37:26 minigrind kernel: Code: 62 c3 81 e8 b0 fc 56 00 48 89
>> df e8 18 da ff ff 48 85 c0 48 89 c3 74 55 48 c7 c7 84 b4 c0 81 e8 a4
>> 0f 57 00 83 05 fd 6d a3 00 01 <48> 8b 53 18 48 85 d2
>> May 26 13:37:26 minigrind kernel: RIP  [<ffffffff811d4683>]
>> __detach_mounts+0x33/0x80
>> May 26 13:37:26 minigrind kernel:  RSP <ffff88008bfbfe38>
>> May 26 13:37:26 minigrind kernel: CR2: 0000000000000016
>> May 26 13:37:26 minigrind kernel: ---[ end trace 399f937a2cba4abb ]---
>>
>> On 4.0.2 all is perfect for me.
>
> Great!  What's the problem then?  :)
>
>> My colleagues got different errors, like rcu_stall and just deadlock
>> when you can't create new namespaces. I think all this errors was
>> fixed somewhere in 4.0.2, but I'm not sure where exactly.
>> Test which produces panic(or hang) basically starts 16 containers in
>> parallel, so it is 16 unshares+bindmount then unmount those
>> namespaces.
>> Also, here is info from one of my coworkers about deadlock:
>>
>> mrjana [10:38 PM]
>> docker thread:
>>
>> root@jenkins-prs-7:/proc/8895/task/8931# cat stack
>> [<ffffffff81466465>] copy_net_ns+0x75/0x150
>> [<ffffffff8108c3bd>] create_new_namespaces+0xfd/0x1a0
>> [<ffffffff8108c5ea>] unshare_nsproxy_namespaces+0x5a/0xc0
>> [<ffffffff8106d1c3>] SyS_unshare+0x183/0x330
>> [<ffffffff8156df4d>] system_call_fastpath+0x16/0x1b
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> mrjana [10:38 PM]
>> This docker thread is waiting on net_mutex
>>
>> mrjana [10:38 PM]
>> which is held by the kworker thread and is not returning:
>>
>> mrjana [10:39 PM]
>> here’s the stack trace of kernel thread:
>>
>> mrjana [10:39 PM]
>> root@jenkins-prs-7:/proc# cat /proc/6/stack
>> [<ffffffff810aec15>] mutex_optimistic_spin+0x185/0x1e0
>> [<ffffffff8147d5c5>] rtnl_lock+0x15/0x20
>> [<ffffffff8146c7a2>] default_device_exit_batch+0x72/0x160
>> [<ffffffff81465a83>] ops_exit_list.isra.1+0x53/0x60
>> [<ffffffff81466320>] cleanup_net+0x100/0x1d0
>> [<ffffffff81086064>] process_one_work+0x154/0x400
>> [<ffffffff81086a0b>] worker_thread+0x6b/0x490
>> [<ffffffff8108b8fb>] kthread+0xdb/0x100
>> [<ffffffff8156de98>] ret_from_fork+0x58/0x90
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> mrjana [10:41 PM]
>> If you look at 3.18 code this thread acquires net_mutex at cleanup_net
>>
>> mrjana [10:41 PM]
>> but this kworker thread has never released the net_mutex
>>
>> mrjana [10:41 PM]
>> instead it is spinning on rtnl_lock
>>
>> We tried on our CI versions 3.18, 3.19 and 4.0.1.
>>
>> Feel free to ask if you need some additional info or machine where you
>> can reproduce easily.
>
> I don't understand, 4.0.2 is working, so what is there left for us to do
> here?
>
> thanks,
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html