Re: Behavior of pinned perf event array

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Thu, 24 Sep 2020 10:43:34 +0200

On 9/23/20 6:21 PM, Song Liu wrote:
On Sep 14, 2020, at 3:59 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
On Fri, Sep 11, 2020 at 1:36 PM Song Liu <songliubraving@xxxxxx> wrote:
[...]
Daniel, are you aware of any use cases that do rely on such a behavior
of PERV_EVENT_ARRAY?

For me this auto-removal of elements on closing *one of a few*
PERF_EVENT_ARRAY FDs (original one, but still just one of a few active
ones) was extremely surprising. It doesn't follow what we do for any
other BPF map, as far as I can tell. E.g., think about
BPF_MAP_TYPE_PROG_ARRAY. If we pin it in BPF FS and close FD, it won't
auto-remove all the tail call programs, right? There is exactly the
same concern with not auto-releasing bpf_progs, just like with
perf_event. But it's not accidental, if you are pinning a BPF map, you
know what you are doing (at least we have to assume so :).

So instead of adding an extra option, shouldn't we just fix this
behavior instead and make it the same across all BPF maps that hold
kernel resources?

Could you please share your thoughts on this? I personally don't have
strong preference one way (add a flag) or the other (change the default
behavior). But I think we need to agree on the direction to go.

My preference would be to have an opt-in flag, we do rely on the auto-removal
of the perf event map entry on client close in Cilium at least, e.g. a monitor
application can insert itself into the map to start receiving events from
the BPF datapath, and upon exit (no matter whether graceful or not) we don't
consume any more cycles in the data path than necessary for events, and
from the __bpf_perf_event_output() we bail out right after checking the
READ_ONCE(array->ptrs[index]) instead of pushing data that later on no-one
picks up.