On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote: > Hi, > > On 2/25/2025 3:13 PM, Martin KaFai Lau wrote: > > On 2/24/25 2:01 PM, Brandon Kammerdiener wrote: > >> This patch fixes an endless loop condition that can occur in > >> bpf_for_each_hash_elem, causing the core to softlock. My > >> understanding is > >> that a combination of RCU list deletion and insertion introduces the new > >> element after the iteration cursor and that there is a chance that an > >> RCU > > > > new element is added to the head of the bucket, so the first thought > > is it should not extend the list beyond the current iteration point... > > > >> reader may in fact use this new element in iteration. The patch uses a > >> _safe variant of the macro which gets the next element to iterate before > >> executing the loop body for the current element. The following simple > >> BPF > >> program can be used to reproduce the issue: > >> > >> #include "vmlinux.h" > >> #include <bpf/bpf_helpers.h> > >> #include <bpf/bpf_tracing.h> > >> > >> #define N (64) > >> > >> struct { > >> __uint(type, BPF_MAP_TYPE_HASH); > >> __uint(max_entries, N); > >> __type(key, __u64); > >> __type(value, __u64); > >> } map SEC(".maps"); > >> > >> static int cb(struct bpf_map *map, __u64 *key, __u64 *value, > >> void *arg) { > >> bpf_map_delete_elem(map, key); > >> bpf_map_update_elem(map, &key, &val, 0); > > > > I suspect what happened in this reproducer is, > > there is a bucket with more than one elem(s) and the deleted elem gets > > immediately added back to the bucket->head. > > Something like this, '[ ]' as the current elem. > > > > 1st iteration (follow bucket->head.first): [elem_1] -> elem_2 > > delete_elem: elem_2 > > update_elem: [elem_1] -> elem_2 > > 2nd iteration (follow elem_1->hash_node.next): elem_1 -> [elem_2] > > delete_elem: elem_1 > > update_elem: [elem_2] -> elem_1 > > 3rd iteration (follow elem_2->hash_node.next): elem_2 -> [elem_1] > > loop....... > > > > Yes. The above procedure is exactly the test case tries to do (except > the &key and &val typos). Yes, apologies for the typos I must have introduced when minifying the example. Should just use key and val sans the &. > > > don't think "_safe" covers all cases though. "_safe" may solve this > > particular reproducer which is shooting itself in the foot by deleting > > and adding itself when iterating a bucket. > > Yes, if the bpf prog could delete and re-add the saved next entry, there > will be dead loop as well. It seems __htab_map_lookup_elem() may suffer > from the same problem just as bpf_for_each_hash_elem(). The problem is > mainly due to the immediate reuse of deleted element. Maybe we need to > add a seqlock to the htab_elem and retry the traversal if the seqlock is > not stable. > > > > [ btw, I don't think the test code can work as is. At least the "&key" > > arg of the bpf_map_update_elem looks wrong. ] > > > >> return 0; > >> } > >> > >> SEC("uprobe//proc/self/exe:test") > >> int BPF_PROG(test) { > >> __u64 i; > >> > >> bpf_for(i, 0, N) { > >> bpf_map_update_elem(&map, &i, &i, 0); > >> } > >> > >> bpf_for_each_map_elem(&map, cb, NULL, 0); > >> > >> return 0; > >> } > >> > >> char LICENSE[] SEC("license") = "GPL"; > >> > >> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@xxxxxxxxx> > >> > >> --- > >> kernel/bpf/hashtab.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > >> index 4a9eeb7aef85..43574b0495c3 100644 > >> --- a/kernel/bpf/hashtab.c > >> +++ b/kernel/bpf/hashtab.c > >> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct > >> bpf_map *map, bpf_callback_t callback_ > >> b = &htab->buckets[i]; > >> rcu_read_lock(); > >> head = &b->head; > >> - hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) { > >> + hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) { > >> key = elem->key; > >> if (is_percpu) { > >> /* current cpu value for percpu map */ > >> -- > >> 2.48.1 > >> > > > > > > . >