On Wed, Feb 26, 2025 at 5:45 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 2/26/2025 12:15 AM, Brandon Kammerdiener wrote: > > On Tue, Feb 25, 2025 at 08:26:17PM +0800, Hou Tao wrote: > >> Hi, > >> > >> On 2/25/2025 3:13 PM, Martin KaFai Lau wrote: > >>> On 2/24/25 2:01 PM, Brandon Kammerdiener wrote: > >>>> This patch fixes an endless loop condition that can occur in > >>>> bpf_for_each_hash_elem, causing the core to softlock. My > >>>> understanding is > >>>> that a combination of RCU list deletion and insertion introduces the new > >>>> element after the iteration cursor and that there is a chance that an > >>>> RCU > >>> new element is added to the head of the bucket, so the first thought > >>> is it should not extend the list beyond the current iteration point... > >>> > >>>> reader may in fact use this new element in iteration. The patch uses a > >>>> _safe variant of the macro which gets the next element to iterate before > >>>> executing the loop body for the current element. The following simple > >>>> BPF > >>>> program can be used to reproduce the issue: > >>>> > >>>> #include "vmlinux.h" > >>>> #include <bpf/bpf_helpers.h> > >>>> #include <bpf/bpf_tracing.h> > >>>> > >>>> #define N (64) > >>>> > >>>> struct { > >>>> __uint(type, BPF_MAP_TYPE_HASH); > >>>> __uint(max_entries, N); > >>>> __type(key, __u64); > >>>> __type(value, __u64); > >>>> } map SEC(".maps"); > >>>> > >>>> static int cb(struct bpf_map *map, __u64 *key, __u64 *value, > >>>> void *arg) { > >>>> bpf_map_delete_elem(map, key); > >>>> bpf_map_update_elem(map, &key, &val, 0); > >>> I suspect what happened in this reproducer is, > >>> there is a bucket with more than one elem(s) and the deleted elem gets > >>> immediately added back to the bucket->head. > >>> Something like this, '[ ]' as the current elem. > >>> > >>> 1st iteration (follow bucket->head.first): [elem_1] -> elem_2 > >>> delete_elem: elem_2 > >>> update_elem: [elem_1] -> elem_2 > >>> 2nd iteration (follow elem_1->hash_node.next): elem_1 -> [elem_2] > >>> delete_elem: elem_1 > >>> update_elem: [elem_2] -> elem_1 > >>> 3rd iteration (follow elem_2->hash_node.next): elem_2 -> [elem_1] > >>> loop....... > >>> > >> Yes. The above procedure is exactly the test case tries to do (except > >> the &key and &val typos). > > Yes, apologies for the typos I must have introduced when minifying the > > example. Should just use key and val sans the &. > > OK > > > >>> don't think "_safe" covers all cases though. "_safe" may solve this > >>> particular reproducer which is shooting itself in the foot by deleting > >>> and adding itself when iterating a bucket. > >> Yes, if the bpf prog could delete and re-add the saved next entry, there > >> will be dead loop as well. It seems __htab_map_lookup_elem() may suffer > >> from the same problem just as bpf_for_each_hash_elem(). The problem is > >> mainly due to the immediate reuse of deleted element. Maybe we need to > >> add a seqlock to the htab_elem and retry the traversal if the seqlock is > >> not stable. > > The seqlock + restart traversal way doesn't work, because the seq-count > for each element will add 2 after the re-insert and the loop will always > try to restart the traversal. I have another idea: how about add an > per-bucket incremental id for each element in the bucket and during the > traversal, the traversal will ignore the element with id greater than > the id of current element, so it will ignore the newly-added element. > For example, there are three elements in a bucket list: head -> A [id=3] > -> B[id=2] -> C[id=1] > > (1) pass A to cb > current id = 3 > cb deletes A and inserts A again > head -> A[4] -> B[2] -> C[1] > > (2) pass B to cb > current id=2 > cb deletes B and inserts B again > head -> B[5] -> A[4] -> C[1] > > the id of A is greater than current id, so it is skipped. This looks like unnecessary overhead that attempts to reinvent nulls logic. At present I'm not convinced that lookup_nulls_elem_raw() is broken. The only issue with bpf_for_each_hash_elem() that loops forever in this convoluted case and _safe() is the right fix for it.