Re: [PATCH bpf-next 1/3] bpf: Call free_htab_elem() after htab_unlock_bucket()

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Sat, 9 Nov 2024 09:29:47 +0800

Hi,

On 11/9/2024 3:55 AM, Alexei Starovoitov wrote:
> On Wed, Nov 6, 2024 at 1:53 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>
>>
>> On 11/6/2024 2:35 PM, Hou Tao wrote:
>>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>>
>>> For htab of maps, when the map is removed from the htab, it may hold the
>>> last reference of the map. bpf_map_fd_put_ptr() will invoke
>>> bpf_map_free_id() to free the id of the removed map element. However,
>>> bpf_map_fd_put_ptr() is invoked while holding a bucket lock
>>> (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock
>>> (spinlock_t), triggering the following lockdep warning:
>>>
>>>   =============================
>>>   [ BUG: Invalid wait context ]
>>>   6.11.0-rc4+ #49 Not tainted
>>>   -----------------------------
>>>   test_maps/4881 is trying to lock:
>>>   ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70
>>>   other info that might help us debug this:
>>>   context-{5:5}
>>>   2 locks held by test_maps/4881:
>>>    #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270
>>>    #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80
>>>   stack backtrace:
>>>   CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49
>>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
>>>   Call Trace:
>>>    <TASK>
>>>    dump_stack_lvl+0x6e/0xb0
>>>    dump_stack+0x10/0x20
>>>    __lock_acquire+0x73e/0x36c0
>>>    lock_acquire+0x182/0x450
>>>    _raw_spin_lock_irqsave+0x43/0x70
>>>    bpf_map_free_id.part.0+0x21/0x70
>>>    bpf_map_put+0xcf/0x110
>>>    bpf_map_fd_put_ptr+0x9a/0xb0
>>>    free_htab_elem+0x69/0xe0
>>>    htab_map_update_elem+0x50f/0xa80
>>>    bpf_fd_htab_map_update_elem+0x131/0x270
>>>    htab_map_update_elem+0x50f/0xa80
>>>    bpf_fd_htab_map_update_elem+0x131/0x270
>>>    bpf_map_update_value+0x266/0x380
>>>    __sys_bpf+0x21bb/0x36b0
>>>    __x64_sys_bpf+0x45/0x60
>>>    x64_sys_call+0x1b2a/0x20d0
>>>    do_syscall_64+0x5d/0x100
>>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> One way to fix the lockdep warning is using raw_spinlock_t for
>>> map_idr_lock as well. However, bpf_map_alloc_id() invokes
>>> idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a
>>> similar lockdep warning because the slab's lock (s->cpu_slab->lock) is
>>> still a spinlock.
>> Is it OK to move the calling of bpf_map_free_id() from bpf_map_put() to
>> bpf_map_free_deferred() ? It could fix the lockdep warning for htab of
>> maps. Its downside is that the free of map id will be delayed, but I
>> think it will not make a visible effect to the user, because the refcnt
>> is already 0, trying to get the map fd through map id will return -ENOENT.
> I've applied the current patch, since doing free outside of bucket lock
> is a good thing to do anyway.
> With that no need to move bpf_map_free_id(), right?
> At least offload case relies on id being removed immediately.
> I don't remember what else might care.
> .

Yes. There is no need to move bpf_map_free_id() after applying the
patch. Thanks for the information about offload map. According to the
implementation of _bpf_map_offload_destroy(), it seems that it may need
to call bpf_map_free_id() twice, so now bpf_map_free_id() frees the
immediately and sets map->id as 0.