On Wed, 22 Mar 2023 at 23:21, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Mar 22, 2023 at 2:39 PM Davide Miola <davide.miola99@xxxxxxxxx> wrote: > > > > On Wed, 22 Mar 2023 at 17:06, Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > On Wed, Mar 22, 2023 at 6:10 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > > > > > there was discussion about this some time ago: > > > > https://lore.kernel.org/bpf/CAEf4BzZ-xe-zSjbBpKLHfQKPnTRTBMA2Eg382+_4kQoTLnj4eQ@xxxxxxxxxxxxxx/ > > > > > > > > seems the 'active' problem andrii described fits to your case as well > > > > > > I suspect per-cpu recursion counter will miss more events in this case, > > > since _any_ kprobe on that cpu will be blocked. > > > If missing events is not an issue you probably want a per-cpu counter > > > that is specific to your single ip_queue_xmit attach point. > > > > The difference between the scenario described in the linked thread > > and mine is also the reason why I think in-bpf solutions like a > > per-cpu guard can't work here: my programs are recursing due to irqs > > interrupting them and invoking ip_queue_xmit, not because some helper > > I'm using ends up calling ip_queue_xmit. Recursion can happen > > anywhere in my programs, even before they get the chance to set a > > flag or increment a counter in a per-cpu map, since there is no > > atomic "bpf_map_lookup_and_increment" (or is there?) > > __sync_fetch_and_add() is supported. A bunch of selftests are using it. > Or you can use bpf_spin_lock. Sure, but I'd still have to lookup the element from the map first. At a minimum it would look something like: SEC("fentry/ip_queue_xmit") int BPF_PROG(entry_prog) { int key = 0; int64_t *guard = bpf_map_lookup_elem(&per_cpu, &key); if (guard) { if (__sync_fetch_and_add(guard, 1) == 0) { ... } } } The program could be interrupted before it reaches __sync_fetch_and_add (just tested this and it does not solve the problem)