Re: splat in stacktrace_build_id_nmi

Song Liu <songliubraving@xxxxxx> · Thu, 24 Sep 2020 06:51:36 +0000

> On Sep 23, 2020, at 7:26 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> 
> Hi Song,
> 
> the latest bpf-next has 100% reproducible splat:
> test_progs -t stacktrace_build_id_nmi

After a couple hours of building kernel and rebooting (this cannot 
run on qemu...), I realized I cannot repro this reliably. :( It 
happened once on one kernel, but not anymore, event on the same kernel. 

>From the trace, it looks like an A-A deadlock. But I think it is a 
false alert. 

> [   18.984806]
> [   18.984807] ================================
> [   18.984807] WARNING: inconsistent lock state
> [   18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted
> [   18.984809] --------------------------------
> [   18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> [   18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes:
> [   18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at:
> __pcpu_freelist_pop+0xe3/0x180
> [   18.984813] {INITIAL USE} state was registered at:
> [   18.984814]   lock_acquire+0x175/0x7c0
> [   18.984814]   _raw_spin_lock+0x2c/0x40
> [   18.984815]   __pcpu_freelist_pop+0xe3/0x180
> [   18.984815]   pcpu_freelist_pop+0x31/0x40
> [   18.984816]   htab_map_alloc+0xbbf/0xf40

Here, we access head->lock in htab_map_alloc. 

> [   18.984816]   __do_sys_bpf+0x5aa/0x3ed0
> [   18.984817]   do_syscall_64+0x2d/0x40
> [   18.984818]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   18.984818] irq event stamp: 12
> [   18.984819] hardirqs last  enabled at (11): [<ffffffff816953d4>]
> get_page_from_freelist+0x1314/0x6190
> [   18.984820] hardirqs last disabled at (12): [<ffffffff837e527d>]
> irqentry_enter+0x1d/0x50
> [   18.984821] softirqs last  enabled at (0): [<ffffffff8111011c>]
> copy_process+0x147c/0x5c10
> [   18.984821] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [   18.984822]
> [   18.984822] other info that might help us debug this:
> [   18.984823]  Possible unsafe locking scenario:
> [   18.984823]
> [   18.984824]        CPU0
> [   18.984824]        ----
> [   18.984824]   lock(&head->lock);
> [   18.984826]   <Interrupt>
> [   18.984826]     lock(&head->lock);
> [   18.984827]
> [   18.984828]  *** DEADLOCK ***
> [   18.984828]
> [   18.984829] 2 locks held by test_progs/1990:
> [   18.984829]  #0: ffff8881f52958e8 (&mm->mmap_lock#2){++++}-{3:3},
> at: do_user_addr_fault+0x1cd/0x821
> [   18.984832]  #1: ffff8881f6e39e20 (&cpuctx_lock){-...}-{2:2}, at:
> perf_event_task_tick+0x12b/0xc90
> [   18.984835]
> [   18.984835] stack backtrace:
> [   18.984836] CPU: 0 PID: 1990 Comm: test_progs Not tainted
> 5.9.0-rc6-01771-g1466de1330e1 #2967
> [   18.984837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.11.0-2.el7 04/01/2014
> [   18.984837] Call Trace:
> [   18.984838]  <NMI>
> [   18.984838]  dump_stack+0x9a/0xd0
> [   18.984839]  lock_acquire+0x5c9/0x7c0
> [   18.984839]  ? lock_release+0x6f0/0x6f0
> [   18.984840]  ? __pcpu_freelist_pop+0xe3/0x180
> [   18.984840]  _raw_spin_lock+0x2c/0x40
> [   18.984841]  ? __pcpu_freelist_pop+0xe3/0x180
> [   18.984841]  __pcpu_freelist_pop+0xe3/0x180
> [   18.984842]  pcpu_freelist_pop+0x17/0x40
> [   18.984842]  ? lock_release+0x6f0/0x6f0
> [   18.984843]  __bpf_get_stackid+0x534/0xaf0

Then in NMI, we do __bpf_get_stackid(). So the deadlock happens when
__bpf_get_stackid() is called when NMI interrupts htab_map_alloc()
of the same map. This should never happen, because we only use 
__bpf_get_stackid() after the map is allocated. 

Without a reliable repro, it will be really hard for me to pin point 
the commit that introduced this. I guess I will need your .config 
and/or test machine for further debugging. 

Thanks,
Song

[...]