Hi, On 2/25/2025 3:13 PM, Martin KaFai Lau wrote: > On 2/24/25 2:01 PM, Brandon Kammerdiener wrote: >> This patch fixes an endless loop condition that can occur in >> bpf_for_each_hash_elem, causing the core to softlock. My >> understanding is >> that a combination of RCU list deletion and insertion introduces the new >> element after the iteration cursor and that there is a chance that an >> RCU > > new element is added to the head of the bucket, so the first thought > is it should not extend the list beyond the current iteration point... > >> reader may in fact use this new element in iteration. The patch uses a >> _safe variant of the macro which gets the next element to iterate before >> executing the loop body for the current element. The following simple >> BPF >> program can be used to reproduce the issue: >> >> #include "vmlinux.h" >> #include <bpf/bpf_helpers.h> >> #include <bpf/bpf_tracing.h> >> >> #define N (64) >> >> struct { >> __uint(type, BPF_MAP_TYPE_HASH); >> __uint(max_entries, N); >> __type(key, __u64); >> __type(value, __u64); >> } map SEC(".maps"); >> >> static int cb(struct bpf_map *map, __u64 *key, __u64 *value, >> void *arg) { >> bpf_map_delete_elem(map, key); >> bpf_map_update_elem(map, &key, &val, 0); > > I suspect what happened in this reproducer is, > there is a bucket with more than one elem(s) and the deleted elem gets > immediately added back to the bucket->head. > Something like this, '[ ]' as the current elem. > > 1st iteration (follow bucket->head.first): [elem_1] -> elem_2 > delete_elem: elem_2 > update_elem: [elem_1] -> elem_2 > 2nd iteration (follow elem_1->hash_node.next): elem_1 -> [elem_2] > delete_elem: elem_1 > update_elem: [elem_2] -> elem_1 > 3rd iteration (follow elem_2->hash_node.next): elem_2 -> [elem_1] > loop....... > Yes. The above procedure is exactly the test case tries to do (except the &key and &val typos). > don't think "_safe" covers all cases though. "_safe" may solve this > particular reproducer which is shooting itself in the foot by deleting > and adding itself when iterating a bucket. Yes, if the bpf prog could delete and re-add the saved next entry, there will be dead loop as well. It seems __htab_map_lookup_elem() may suffer from the same problem just as bpf_for_each_hash_elem(). The problem is mainly due to the immediate reuse of deleted element. Maybe we need to add a seqlock to the htab_elem and retry the traversal if the seqlock is not stable. > > [ btw, I don't think the test code can work as is. At least the "&key" > arg of the bpf_map_update_elem looks wrong. ] > >> return 0; >> } >> >> SEC("uprobe//proc/self/exe:test") >> int BPF_PROG(test) { >> __u64 i; >> >> bpf_for(i, 0, N) { >> bpf_map_update_elem(&map, &i, &i, 0); >> } >> >> bpf_for_each_map_elem(&map, cb, NULL, 0); >> >> return 0; >> } >> >> char LICENSE[] SEC("license") = "GPL"; >> >> Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@xxxxxxxxx> >> >> --- >> kernel/bpf/hashtab.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index 4a9eeb7aef85..43574b0495c3 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -2224,7 +2224,7 @@ static long bpf_for_each_hash_elem(struct >> bpf_map *map, bpf_callback_t callback_ >> b = &htab->buckets[i]; >> rcu_read_lock(); >> head = &b->head; >> - hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) { >> + hlist_nulls_for_each_entry_safe(elem, n, head, hash_node) { >> key = elem->key; >> if (is_percpu) { >> /* current cpu value for percpu map */ >> -- >> 2.48.1 >> > > > .