Re: [PATCH] bpf: fix possible endless loop in BPF map iteration

Brandon Kammerdiener <brandon.kammerdiener@xxxxxxxxx> · Tue, 25 Feb 2025 11:15:59 -0500

On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote:
> Hi,
>
> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote:
> > On 2/24/25 2:01 PM, Brandon Kammerdiener wrote:
> >> This patch fixes an endless loop condition that can occur in
> >> bpf_for_each_hash_elem, causing the core to softlock. My
> >> understanding is
> >> that a combination of RCU list deletion and insertion introduces the new
> >> element after the iteration cursor and that there is a chance that an
> >> RCU
> >
> > new element is added to the head of the bucket, so the first thought
> > is it should not extend the list beyond the current iteration point...
> >
> >> reader may in fact use this new element in iteration. The patch uses a
> >> _safe variant of the macro which gets the next element to iterate before
> >> executing the loop body for the current element. The following simple
> >> BPF
> >> program can be used to reproduce the issue:
> >>
> >>      #include "vmlinux.h"
> >>      #include <bpf/bpf_helpers.h>
> >>      #include <bpf/bpf_tracing.h>
> >>
> >>      #define N (64)
> >>
> >>      struct {
> >>          __uint(type,        BPF_MAP_TYPE_HASH);
> >>          __uint(max_entries, N);
> >>          __type(key,         __u64);
> >>          __type(value,       __u64);
> >>      } map SEC(".maps");
> >>
> >>      static int cb(struct bpf_map *map, __u64 *key, __u64 *value,
> >> void *arg) {
> >>          bpf_map_delete_elem(map, key);
> >>          bpf_map_update_elem(map, &key, &val, 0);
> >
> > I suspect what happened in this reproducer is,
> > there is a bucket with more than one elem(s) and the deleted elem gets
> > immediately added back to the bucket->head.
> > Something like this, '[ ]' as the current elem.
> >
> > 1st iteration     (follow bucket->head.first): [elem_1] ->  elem_2
> >                                   delete_elem:  elem_2
> >                                   update_elem: [elem_1] ->  elem_2
> > 2nd iteration (follow elem_1->hash_node.next):  elem_1  -> [elem_2]
> >                                   delete_elem:  elem_1
> >                                   update_elem: [elem_2] -> elem_1
> > 3rd iteration (follow elem_2->hash_node.next):  elem_2  -> [elem_1]
> >                   loop.......
> >
>
> Yes. The above procedure is exactly the test case tries to do (except
> the &key and &val typos).

Yes, apologies for the typos I must have introduced when minifying the
example. Should just use key and val sans the &.

>
> > don't think "_safe" covers all cases though. "_safe" may solve this
> > particular reproducer which is shooting itself in the foot by deleting
> > and adding itself when iterating a bucket.
>
> Yes, if the bpf prog could delete and re-add the saved next entry, there
> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer
> from the same problem just as bpf_for_each_hash_elem(). The problem is
> mainly due to the immediate reuse of deleted element. Maybe we need to
> add a seqlock to the htab_elem and retry the traversal if the seqlock is
> not stable.
> >
> > [ btw, I don't think the test code can work as is. At least the "&key"
> > arg of the bpf_map_update_elem looks wrong. ]
> >
> >>          return 0;
> >>      }
> >>
> >>      SEC("uprobe//proc/self/exe:test")
> >>      int BPF_PROG(test) {
> >>          __u64 i;
> >>
> >>          bpf_for(i, 0, N) {
> >>              bpf_map_update_elem(&map, &i, &i, 0);
> >>          }
> >>
> >>          bpf_for_each_map_elem(&map, cb, NULL, 0);
> >>
> >>          return 0;
> >>      }
> >>
> >>      char LICENSE[] SEC("license") = "GPL";
> >>
> >> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@xxxxxxxxx>
> >>
> >> ---
> >>   kernel/bpf/hashtab.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> index 4a9eeb7aef85..43574b0495c3 100644
> >> --- a/kernel/bpf/hashtab.c
> >> +++ b/kernel/bpf/hashtab.c
> >> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct
> >> bpf_map *map, bpf_callback_t callback_
> >>           b = &htab->buckets[i];
> >>           rcu_read_lock();
> >>           head = &b->head;
> >> -        hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
> >> +        hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) {
> >>               key = elem->key;
> >>               if (is_percpu) {
> >>                   /* current cpu value for percpu map */
> >> --
> >> 2.48.1
> >>
> >
> >
> > .
>