Re: [PATCH] bpf: fix possible endless loop in BPF map iteration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2/26/2025 12:15 AM, Brandon Kammerdiener wrote:
> On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
>>> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
>>>> This patch fixes an endless loop condition that can occur in
>>>> bpf_for_each_hash_elem, causing the core to softlock. My
>>>> understanding is
>>>> that a combination of RCU list deletion and insertion introduces the new
>>>> element after the iteration cursor and that there is a chance that an
>>>> RCU
>>> new element is added to the head of the bucket, so the first thought
>>> is it should not extend the list beyond the current iteration point...
>>>
>>>> reader may in fact use this new element in iteration. The patch uses a
>>>> _safe variant of the macro which gets the next element to iterate before
>>>> executing the loop body for the current element. The following simple
>>>> BPF
>>>> program can be used to reproduce the issue:
>>>>
>>>>      #include "vmlinux.h"
>>>>      #include <bpf/bpf_helpers.h>
>>>>      #include <bpf/bpf_tracing.h>
>>>>
>>>>      #define N (64)
>>>>
>>>>      struct {
>>>>          __uint(type,        BPF_MAP_TYPE_HASH);
>>>>          __uint(max_entries, N);
>>>>          __type(key,         __u64);
>>>>          __type(value,       __u64);
>>>>      } map SEC(".maps");
>>>>
>>>>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
>>>> void *arg) {
>>>>          bpf_map_delete_elem(map, key);
>>>>          bpf_map_update_elem(map, &key, &val, 0);
>>> I suspect what happened in this reproducer is,
>>> there is a bucket with more than one elem(s) and the deleted elem gets
>>> immediately added back to the bucket->head.
>>> Something like this, '[ ]' as the current elem.
>>>
>>> 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
>>>                                   delete_elem:  elem_2
>>>                                   update_elem: [elem_1] ->  elem_2
>>> 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
>>>                                   delete_elem:  elem_1
>>>                                   update_elem: [elem_2] -> elem_1
>>> 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
>>>                   loop.......
>>>
>> Yes. The above procedure is exactly the test case tries to do (except
>> the &key and &val typos).
> Yes, apologies for the typos I must have introduced when minifying the
> example. Should just use key and val sans the &.

OK
>
>>> don't think "_safe" covers all cases though. "_safe" may solve this
>>> particular reproducer which is shooting itself in the foot by deleting
>>> and adding itself when iterating a bucket.
>> Yes, if the bpf prog could delete and re-add the saved next entry, there
>> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
>> from the same problem just as bpf_for_each_hash_elem(). The problem is
>> mainly due to the immediate reuse of deleted element. Maybe we need to
>> add a seqlock to the htab_elem and retry the traversal if the seqlock is
>> not stable.

The seqlock + restart traversal way doesn't work, because the seq-count
for each element will add 2 after the re-insert and the loop will always
try to restart the traversal. I have another idea: how about add an
per-bucket incremental id for each element in the bucket and during the
traversal, the traversal will ignore the element with id greater than
the id of current element, so it will ignore the newly-added element.
For example, there are three elements in a bucket list: head -> A [id=3]
-> B[id=2] -> C[id=1]

(1) pass A to cb
current id = 3
cb deletes A and inserts A again
head -> A[4] -> B[2] -> C[1]

(2) pass B to cb
current id=2
cb deletes B and inserts B again
head -> B[5] -> A[4] -> C[1]

the id of A is greater than current id, so it is skipped.

>>> [ btw, I don't think the test code can work as is. At least the "&key"
>>> arg of the bpf_map_update_elem looks wrong. ]
>>>
>>>>          return 0;
>>>>      }
>>>>
>>>>      SEC("uprobe//proc/self/exe:test")
>>>>      int BPF_PROG(test) {
>>>>          __u64 i;
>>>>
>>>>          bpf_for(i, 0, N) {
>>>>              bpf_map_update_elem(&map, &i, &i, 0);
>>>>          }
>>>>
>>>>          bpf_for_each_map_elem(&map, cb, NULL, 0);
>>>>
>>>>          return 0;
>>>>      }
>>>>
>>>>      char LICENSE[] SEC("license") = "GPL";
>>>>
>>>> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@xxxxxxxxx>
>>>>
>>>> ---
>>>>   kernel/bpf/hashtab.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>> index 4a9eeb7aef85..43574b0495c3 100644
>>>> --- a/kernel/bpf/hashtab.c
>>>> +++ b/kernel/bpf/hashtab.c
>>>> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct
>>>> bpf_map *map, bpf_callback_t callback_
>>>>           b = &htab->buckets[i];
>>>>           rcu_read_lock();
>>>>           head = &b->head;
>>>> -        hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
>>>> +        hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) {
>>>>               key = elem->key;
>>>>               if (is_percpu) {
>>>>                   /* current cpu value for percpu map */
>>>> --
>>>> 2.48.1
>>>>
>>>
>>> .
> .





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux