On Thu, 20 Aug 2020, Andrii Nakryiko wrote: > Add a set of APIs to perf_buffer manage to allow applications to integrate > perf buffer polling into existing epoll-based infrastructure. One example is > applications using libevent already and wanting to plug perf_buffer polling, > instead of relying on perf_buffer__poll() and waste an extra thread to do it. > But perf_buffer is still extremely useful to set up and consume perf buffer > rings even for such use cases. > > So to accomodate such new use cases, add three new APIs: > - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by > given instance of perf_buffer manager; > - perf_buffer__buffer_fd() returns FD of perf_event corresponding to > a specified per-CPU buffer; this FD is then polled independently; > - perf_buffer__consume_buffer() consumes data from single per-CPU buffer, > identified by its slot index. > > These APIs allow for great flexiblity, but do not sacrifice general usability > of perf_buffer. > This is great! If I understand correctly, you're supporting the retrieval and ultimately insertion of the individual per-cpu buffer fds into another epoll()ed fd. I've been exploring another possibility - hierarchical epoll, where the top-level perf_buffer epoll_fd field is used rather than the individual per-cpu buffers. In that context, would an interface to return the perf_buffer epoll_fd make sense too? i.e. int perf_buffer__fd(const struct perf_buffer *pb); ? When events occur for the perf_buffer__fd, we can simply call perf_buffer__poll(perf_buffer__fd(pb), ...) to handle them it seems. That approach _appears_ to work, though I do see occasional event loss. Is that method legit too or am I missing something? > Also exercise and check new APIs in perf_buffer selftest. > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> A few question around the test below, but Reviewed-by: Alan Maguire <alan.maguire@xxxxxxxxxx> > --- > tools/lib/bpf/libbpf.c | 51 ++++++++++++++- > tools/lib/bpf/libbpf.h | 3 + > tools/lib/bpf/libbpf.map | 7 +++ > .../selftests/bpf/prog_tests/perf_buffer.c | 62 +++++++++++++++---- > 4 files changed, 111 insertions(+), 12 deletions(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 0bc1fd813408..a6359d49aa9d 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -9390,6 +9390,55 @@ int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) > return cnt < 0 ? -errno : cnt; > } > > +/* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer > + * manager. > + */ > +size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb) > +{ > + return pb->cpu_cnt; > +} > + > +/* > + * Return perf_event FD of a ring buffer in *buf_idx* slot of > + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using > + * select()/poll()/epoll() Linux syscalls. > + */ > +int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx) > +{ > + struct perf_cpu_buf *cpu_buf; > + > + if (buf_idx >= pb->cpu_cnt) > + return -EINVAL; > + > + cpu_buf = pb->cpu_bufs[buf_idx]; > + if (!cpu_buf) > + return -ENOENT; > + > + return cpu_buf->fd; > +} > + > +/* > + * Consume data from perf ring buffer corresponding to slot *buf_idx* in > + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to > + * consume, do nothing and return success. > + * Returns: > + * - 0 on success; > + * - <0 on failure. > + */ > +int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx) > +{ > + struct perf_cpu_buf *cpu_buf; > + > + if (buf_idx >= pb->cpu_cnt) > + return -EINVAL; > + > + cpu_buf = pb->cpu_bufs[buf_idx]; > + if (!cpu_buf) > + return -ENOENT; > + > + return perf_buffer__process_records(pb, cpu_buf); > +} > + > int perf_buffer__consume(struct perf_buffer *pb) > { > int i, err; > @@ -9402,7 +9451,7 @@ int perf_buffer__consume(struct perf_buffer *pb) > > err = perf_buffer__process_records(pb, cpu_buf); > if (err) { > - pr_warn("error while processing records: %d\n", err); > + pr_warn("perf_buffer: failed to process records in buffer #%d: %d\n", i, err); > return err; > } > } > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > index 5ecb4069a9f0..15e02dcda2c7 100644 > --- a/tools/lib/bpf/libbpf.h > +++ b/tools/lib/bpf/libbpf.h > @@ -590,6 +590,9 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt, > LIBBPF_API void perf_buffer__free(struct perf_buffer *pb); > LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms); > LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb); > +LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); > +LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); > +LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); > > typedef enum bpf_perf_event_ret > (*bpf_perf_event_print_t)(struct perf_event_header *hdr, > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map > index e35bd6cdbdbf..77466958310a 100644 > --- a/tools/lib/bpf/libbpf.map > +++ b/tools/lib/bpf/libbpf.map > @@ -299,3 +299,10 @@ LIBBPF_0.1.0 { > btf__set_fd; > btf__set_pointer_size; > } LIBBPF_0.0.9; > + > +LIBBPF_0.2.0 { > + global: > + perf_buffer__buffer_cnt; > + perf_buffer__buffer_fd; > + perf_buffer__consume_buffer; > +} LIBBPF_0.1.0; > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c > index c33ec180b3f2..add224ce17af 100644 > --- a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c > +++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c > @@ -7,6 +7,8 @@ > #include "test_perf_buffer.skel.h" > #include "bpf/libbpf_internal.h" > > +static int duration; > + > /* AddressSanitizer sometimes crashes due to data dereference below, due to > * this being mmap()'ed memory. Disable instrumentation with > * no_sanitize_address attribute > @@ -24,13 +26,31 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size) > CPU_SET(cpu, cpu_seen); > } > > +int trigger_on_cpu(int cpu) > +{ > + cpu_set_t cpu_set; > + int err; > + > + CPU_ZERO(&cpu_set); > + CPU_SET(cpu, &cpu_set); > + > + err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set); > + if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", cpu, err)) > + return err; > + > + usleep(1); > + > + return 0; > +} > + > void test_perf_buffer(void) > { > - int err, on_len, nr_on_cpus = 0, nr_cpus, i, duration = 0; > + int err, on_len, nr_on_cpus = 0, nr_cpus, i; > struct perf_buffer_opts pb_opts = {}; > struct test_perf_buffer *skel; > - cpu_set_t cpu_set, cpu_seen; > + cpu_set_t cpu_seen; > struct perf_buffer *pb; > + int last_fd = -1, fd; > bool *online; > > nr_cpus = libbpf_num_possible_cpus(); > @@ -71,16 +91,8 @@ void test_perf_buffer(void) > continue; > } > > - CPU_ZERO(&cpu_set); > - CPU_SET(i, &cpu_set); > - > - err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), > - &cpu_set); > - if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", > - i, err)) > + if (trigger_on_cpu(i)) > goto out_close; > - > - usleep(1); > } > > /* read perf buffer */ > @@ -92,6 +104,34 @@ void test_perf_buffer(void) > "expect %d, seen %d\n", nr_on_cpus, CPU_COUNT(&cpu_seen))) > goto out_free_pb; > > + if (CHECK(perf_buffer__buffer_cnt(pb) != nr_cpus, "buf_cnt", > + "got %zu, expected %d\n", perf_buffer__buffer_cnt(pb), nr_cpus)) > + goto out_close; > + > + for (i = 0; i < nr_cpus; i++) { > + if (i >= on_len || !online[i]) > + continue; > + > + fd = perf_buffer__buffer_fd(pb, i); > + CHECK(last_fd == fd, "fd_check", "last fd %d == fd %d\n", last_fd, fd); > + last_fd = fd; > + I'm not sure why you're testing this way - shouldn't it just be a verification of whether we get an unexpected error code rather than a valid fd? > + err = perf_buffer__consume_buffer(pb, i); > + if (CHECK(err, "drain_buf", "cpu %d, err %d\n", i, err)) > + goto out_close; > + I think I'm a bit lost in what processes what here. The first perf_buffer__poll() should handle the processing of the records associated with the first set of per-cpu triggering I think. Is the above perf_buffer__consume_buffer() checking the "no data, return success" case? If that's right should we do something to explicitly check it indeed was a no-op, like CHECK()ing CPU_ISSET(i, &cpu_seen) to ensure the on_sample() handler wasn't called? The "drain_buf" description makes me think I'm misreading this and we're draining additional events, so I wanted to check what's going on here to make sure. Thanks! Alan