Hou Tao <houtao@xxxxxxxxxxxxxxx> writes: > From: Hou Tao <houtao1@xxxxxxxxxx> > > Multiple syzbot warnings have been reported. These warnings are mainly > about the lock order between trie->lock and kmalloc()'s internal lock. > See report [1] as an example: > > ====================================================== > WARNING: possible circular locking dependency detected > 6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted > ------------------------------------------------------ > syz.3.2069/15008 is trying to acquire lock: > ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ... > > but task is already holding lock: > ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ... > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 (&trie->lock){-.-.}-{2:2}: > __raw_spin_lock_irqsave > _raw_spin_lock_irqsave+0x3a/0x60 > trie_delete_elem+0xb0/0x820 > ___bpf_prog_run+0x3e51/0xabd0 > __bpf_prog_run32+0xc1/0x100 > bpf_dispatcher_nop_func > ...... > bpf_trace_run2+0x231/0x590 > __bpf_trace_contention_end+0xca/0x110 > trace_contention_end.constprop.0+0xea/0x170 > __pv_queued_spin_lock_slowpath+0x28e/0xcc0 > pv_queued_spin_lock_slowpath > queued_spin_lock_slowpath > queued_spin_lock > do_raw_spin_lock+0x210/0x2c0 > __raw_spin_lock_irqsave > _raw_spin_lock_irqsave+0x42/0x60 > __put_partials+0xc3/0x170 > qlink_free > qlist_free_all+0x4e/0x140 > kasan_quarantine_reduce+0x192/0x1e0 > __kasan_slab_alloc+0x69/0x90 > kasan_slab_alloc > slab_post_alloc_hook > slab_alloc_node > kmem_cache_alloc_node_noprof+0x153/0x310 > __alloc_skb+0x2b1/0x380 > ...... > > -> #0 (&n->list_lock){-.-.}-{2:2}: > check_prev_add > check_prevs_add > validate_chain > __lock_acquire+0x2478/0x3b30 > lock_acquire > lock_acquire+0x1b1/0x560 > __raw_spin_lock_irqsave > _raw_spin_lock_irqsave+0x3a/0x60 > get_partial_node.part.0+0x20/0x350 > get_partial_node > get_partial > ___slab_alloc+0x65b/0x1870 > __slab_alloc.constprop.0+0x56/0xb0 > __slab_alloc_node > slab_alloc_node > __do_kmalloc_node > __kmalloc_node_noprof+0x35c/0x440 > kmalloc_node_noprof > bpf_map_kmalloc_node+0x98/0x4a0 > lpm_trie_node_alloc > trie_update_elem+0x1ef/0xe00 > bpf_map_update_value+0x2c1/0x6c0 > map_update_elem+0x623/0x910 > __sys_bpf+0x90c/0x49a0 > ... > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&trie->lock); > lock(&n->list_lock); > lock(&trie->lock); > lock(&n->list_lock); > > *** DEADLOCK *** > > [1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7 > > A bpf program attached to trace_contention_end() triggers after > acquiring &n->list_lock. The program invokes trie_delete_elem(), which > then acquires trie->lock. However, it is possible that another > process is invoking trie_update_elem(). trie_update_elem() will acquire > trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke > get_partial_node() and try to acquire &n->list_lock (not necessarily the > same lock object). Therefore, lockdep warns about the circular locking > dependency. > > Invoking kmalloc() before acquiring trie->lock could fix the warning. > However, since BPF programs call be invoked from any context (e.g., > through kprobe/tracepoint/fentry), there may still be lock ordering > problems for internal locks in kmalloc() or trie->lock itself. > > To eliminate these potential lock ordering problems with kmalloc()'s > internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent > BPF memory allocator APIs that can be invoked in any context. The lock > ordering problems with trie->lock (e.g., reentrance) will be handled > separately. > > Two aspects of this change require explanation: > > 1. Intermediate and leaf nodes are allocated from the same allocator. > The value size of LPM trie is usually small and only use one allocator > reduces the memory overhead of BPF memory allocator. > > 2. nodes are freed before invoking spin_unlock_irqrestore(). Therefore, > there is no need to add paired migrate_{disable|enable}() calls for > these free operations. > > Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> I agree with Alexei's comments, but otherwise: Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>