On Fri, Sep 25, 2020 at 05:07:56PM -0700, Song Liu wrote: > Recent improvements in LOCKDEP highlighted a potential A-A deadlock with > pcpu_freelist in NMI: > > ./tools/testing/selftests/bpf/test_progs -t stacktrace_build_id_nmi > > [ 18.984807] ================================ > [ 18.984807] WARNING: inconsistent lock state > [ 18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted > [ 18.984809] -------------------------------- > [ 18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage. > [ 18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes: > [ 18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at: > __pcpu_freelist_pop+0xe3/0x180 > [ 18.984813] {INITIAL USE} state was registered at: > [ 18.984814] lock_acquire+0x175/0x7c0 > [ 18.984814] _raw_spin_lock+0x2c/0x40 > [ 18.984815] __pcpu_freelist_pop+0xe3/0x180 > [ 18.984815] pcpu_freelist_pop+0x31/0x40 > [ 18.984816] htab_map_alloc+0xbbf/0xf40 > [ 18.984816] __do_sys_bpf+0x5aa/0x3ed0 > [ 18.984817] do_syscall_64+0x2d/0x40 > [ 18.984818] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 18.984818] irq event stamp: 12 > [ ... ] > [ 18.984822] other info that might help us debug this: > [ 18.984823] Possible unsafe locking scenario: > [ 18.984823] > [ 18.984824] CPU0 > [ 18.984824] ---- > [ 18.984824] lock(&head->lock); > [ 18.984826] <Interrupt> > [ 18.984826] lock(&head->lock); > [ 18.984827] > [ 18.984828] *** DEADLOCK *** > [ 18.984828] > [ 18.984829] 2 locks held by test_progs/1990: > [ ... ] > [ 18.984838] <NMI> > [ 18.984838] dump_stack+0x9a/0xd0 > [ 18.984839] lock_acquire+0x5c9/0x7c0 > [ 18.984839] ? lock_release+0x6f0/0x6f0 > [ 18.984840] ? __pcpu_freelist_pop+0xe3/0x180 > [ 18.984840] _raw_spin_lock+0x2c/0x40 > [ 18.984841] ? __pcpu_freelist_pop+0xe3/0x180 > [ 18.984841] __pcpu_freelist_pop+0xe3/0x180 > [ 18.984842] pcpu_freelist_pop+0x17/0x40 > [ 18.984842] ? lock_release+0x6f0/0x6f0 > [ 18.984843] __bpf_get_stackid+0x534/0xaf0 > [ 18.984843] bpf_prog_1fd9e30e1438d3c5_oncpu+0x73/0x350 > [ 18.984844] bpf_overflow_handler+0x12f/0x3f0 > > This is because pcpu_freelist_head.lock is accessed in both NMI and > non-NMI context. Fix this issue by using raw_spin_trylock() in NMI. > > For systems with only one cpu, there is a trickier scenario with > pcpu_freelist_push(): if the only pcpu_freelist_head.lock is already > locked before NMI, raw_spin_trylock() will never succeed. Unlike, > _pop(), where we can failover and return NULL, failing _push() will leak > memory. Fix this issue with an extra list, pcpu_freelist.extralist. The > extralist is primarily used to take _push() when raw_spin_trylock() > failed on all the per cpu lists. It should be empty most of the time. It is tricky. LGTM. Acked-by: Martin KaFai Lau <kafai@xxxxxx>