Re: [PATCH] bpf: fix possible endless loop in BPF map iteration

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Thu, 27 Feb 2025 10:24:49 +0800



Hi,

On 2/27/2025 10:07 AM, Alexei Starovoitov wrote:
> On Wed, Feb 26, 2025 at 5:45 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> On 2/26/2025 12:15 AM, Brandon Kammerdiener wrote:
>>> On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
>>>> Hi,
>>>>
>>>> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
>>>>> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
>>>>>> This patch fixes an endless loop condition that can occur in
>>>>>> bpf_for_each_hash_elem, causing the core to softlock. My
>>>>>> understanding is
>>>>>> that a combination of RCU list deletion and insertion introduces the new
>>>>>> element after the iteration cursor and that there is a chance that an
>>>>>> RCU
>>>>> new element is added to the head of the bucket, so the first thought
>>>>> is it should not extend the list beyond the current iteration point...
>>>>>
>>>>>> reader may in fact use this new element in iteration. The patch uses a
>>>>>> _safe variant of the macro which gets the next element to iterate before
>>>>>> executing the loop body for the current element. The following simple
>>>>>> BPF
>>>>>> program can be used to reproduce the issue:
>>>>>>
>>>>>>      #include "vmlinux.h"
>>>>>>      #include <bpf/bpf_helpers.h>
>>>>>>      #include <bpf/bpf_tracing.h>
>>>>>>
>>>>>>      #define N (64)
>>>>>>
>>>>>>      struct {
>>>>>>          __uint(type,        BPF_MAP_TYPE_HASH);
>>>>>>          __uint(max_entries, N);
>>>>>>          __type(key,         __u64);
>>>>>>          __type(value,       __u64);
>>>>>>      } map SEC(".maps");
>>>>>>
>>>>>>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
>>>>>> void *arg) {
>>>>>>          bpf_map_delete_elem(map, key);
>>>>>>          bpf_map_update_elem(map, &key, &val, 0);
>>>>> I suspect what happened in this reproducer is,
>>>>> there is a bucket with more than one elem(s) and the deleted elem gets
>>>>> immediately added back to the bucket->head.
>>>>> Something like this, '[ ]' as the current elem.
>>>>>
>>>>> 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
>>>>>                                   delete_elem:  elem_2
>>>>>                                   update_elem: [elem_1] ->  elem_2
>>>>> 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
>>>>>                                   delete_elem:  elem_1
>>>>>                                   update_elem: [elem_2] -> elem_1
>>>>> 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
>>>>>                   loop.......
>>>>>
>>>> Yes. The above procedure is exactly the test case tries to do (except
>>>> the &key and &val typos).
>>> Yes, apologies for the typos I must have introduced when minifying the
>>> example. Should just use key and val sans the &.
>> OK
>>>>> don't think "_safe" covers all cases though. "_safe" may solve this
>>>>> particular reproducer which is shooting itself in the foot by deleting
>>>>> and adding itself when iterating a bucket.
>>>> Yes, if the bpf prog could delete and re-add the saved next entry, there
>>>> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
>>>> from the same problem just as bpf_for_each_hash_elem(). The problem is
>>>> mainly due to the immediate reuse of deleted element. Maybe we need to
>>>> add a seqlock to the htab_elem and retry the traversal if the seqlock is
>>>> not stable.
>> The seqlock + restart traversal way doesn't work, because the seq-count
>> for each element will add 2 after the re-insert and the loop will always
>> try to restart the traversal. I have another idea: how about add an
>> per-bucket incremental id for each element in the bucket and during the
>> traversal, the traversal will ignore the element with id greater than
>> the id of current element, so it will ignore the newly-added element.
>> For example, there are three elements in a bucket list: head -> A [id=3]
>> -> B[id=2] -> C[id=1]
>>
>> (1) pass A to cb
>> current id = 3
>> cb deletes A and inserts A again
>> head -> A[4] -> B[2] -> C[1]
>>
>> (2) pass B to cb
>> current id=2
>> cb deletes B and inserts B again
>> head -> B[5] -> A[4] -> C[1]
>>
>> the id of A is greater than current id, so it is skipped.
> This looks like unnecessary overhead that attempts
> to reinvent nulls logic.

OK.
>
> At present I'm not convinced that lookup_nulls_elem_raw() is broken.
> The only issue with bpf_for_each_hash_elem() that loops forever
> in this convoluted case and _safe() is the right fix for it.
For lookup_nulls_elem_raw(), I think the dead loop in lookup depends on
the exact synchronization between lookup procedure and update/deletion
procedure, and it will hard to reproduce it.