Re: [PATCH bpf v2 6/9] bpf: Switch to bpf mem allocator for LPM trie

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hou Tao <houtao@xxxxxxxxxxxxxxx> writes:

> From: Hou Tao <houtao1@xxxxxxxxxx>
>
> Multiple syzbot warnings have been reported. These warnings are mainly
> about the lock order between trie->lock and kmalloc()'s internal lock.
> See report [1] as an example:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted
> ------------------------------------------------------
> syz.3.2069/15008 is trying to acquire lock:
> ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ...
>
> but task is already holding lock:
> ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ...
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&trie->lock){-.-.}-{2:2}:
>        __raw_spin_lock_irqsave
>        _raw_spin_lock_irqsave+0x3a/0x60
>        trie_delete_elem+0xb0/0x820
>        ___bpf_prog_run+0x3e51/0xabd0
>        __bpf_prog_run32+0xc1/0x100
>        bpf_dispatcher_nop_func
>        ......
>        bpf_trace_run2+0x231/0x590
>        __bpf_trace_contention_end+0xca/0x110
>        trace_contention_end.constprop.0+0xea/0x170
>        __pv_queued_spin_lock_slowpath+0x28e/0xcc0
>        pv_queued_spin_lock_slowpath
>        queued_spin_lock_slowpath
>        queued_spin_lock
>        do_raw_spin_lock+0x210/0x2c0
>        __raw_spin_lock_irqsave
>        _raw_spin_lock_irqsave+0x42/0x60
>        __put_partials+0xc3/0x170
>        qlink_free
>        qlist_free_all+0x4e/0x140
>        kasan_quarantine_reduce+0x192/0x1e0
>        __kasan_slab_alloc+0x69/0x90
>        kasan_slab_alloc
>        slab_post_alloc_hook
>        slab_alloc_node
>        kmem_cache_alloc_node_noprof+0x153/0x310
>        __alloc_skb+0x2b1/0x380
>        ......
>
> -> #0 (&n->list_lock){-.-.}-{2:2}:
>        check_prev_add
>        check_prevs_add
>        validate_chain
>        __lock_acquire+0x2478/0x3b30
>        lock_acquire
>        lock_acquire+0x1b1/0x560
>        __raw_spin_lock_irqsave
>        _raw_spin_lock_irqsave+0x3a/0x60
>        get_partial_node.part.0+0x20/0x350
>        get_partial_node
>        get_partial
>        ___slab_alloc+0x65b/0x1870
>        __slab_alloc.constprop.0+0x56/0xb0
>        __slab_alloc_node
>        slab_alloc_node
>        __do_kmalloc_node
>        __kmalloc_node_noprof+0x35c/0x440
>        kmalloc_node_noprof
>        bpf_map_kmalloc_node+0x98/0x4a0
>        lpm_trie_node_alloc
>        trie_update_elem+0x1ef/0xe00
>        bpf_map_update_value+0x2c1/0x6c0
>        map_update_elem+0x623/0x910
>        __sys_bpf+0x90c/0x49a0
>        ...
>
> other info that might help us debug this:
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&trie->lock);
>                                lock(&n->list_lock);
>                                lock(&trie->lock);
>   lock(&n->list_lock);
>
>  *** DEADLOCK ***
>
> [1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7
>
> A bpf program attached to trace_contention_end() triggers after
> acquiring &n->list_lock. The program invokes trie_delete_elem(), which
> then acquires trie->lock. However, it is possible that another
> process is invoking trie_update_elem(). trie_update_elem() will acquire
> trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke
> get_partial_node() and try to acquire &n->list_lock (not necessarily the
> same lock object). Therefore, lockdep warns about the circular locking
> dependency.
>
> Invoking kmalloc() before acquiring trie->lock could fix the warning.
> However, since BPF programs call be invoked from any context (e.g.,
> through kprobe/tracepoint/fentry), there may still be lock ordering
> problems for internal locks in kmalloc() or trie->lock itself.
>
> To eliminate these potential lock ordering problems with kmalloc()'s
> internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent
> BPF memory allocator APIs that can be invoked in any context. The lock
> ordering problems with trie->lock (e.g., reentrance) will be handled
> separately.
>
> Two aspects of this change require explanation:
>
> 1. Intermediate and leaf nodes are allocated from the same allocator.
> The value size of LPM trie is usually small and only use one allocator
> reduces the memory overhead of BPF memory allocator.
>
> 2. nodes are freed before invoking spin_unlock_irqrestore(). Therefore,
> there is no need to add paired migrate_{disable|enable}() calls for
> these free operations.
>
> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>

I agree with Alexei's comments, but otherwise:

Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux