On 2023/12/28 02:02, Alexei Starovoitov wrote:
On Wed, Dec 27, 2023 at 2:01 AM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote:
The patch set introduce a new type of map, BPF_MAP_TYPE_RELAY, based on
relay interface [0]. It provides a way for persistent and overwritable data
transfer.
As stated in [0], relay is a efficient method for log and data transfer.
And the interface is simple enough so that we can implement and use this
type of map with current map interfaces. Besides we need a kfunc
bpf_relay_output to output data to user, similar with bpf_ringbuf_output.
We need this map because currently neither ringbuf nor perfbuf satisfies
the requirements of relatively long-term consistent tracing, where the bpf
program keeps writing into the buffer without any bundled reader, and the
buffer supports overwriting. For users, they just run the bpf program to
collect data, and are able to read as need. The detailed discussion can be
found at [1].
Hold on.
Earlier I mistakenly assumed that this relayfs is a multi producer
buffer instead of per-cpu.
Since it's actually per-cpu I see no need to introduce another per-cpu
ring buffer. We already have a perf_event buffer.
I think relay map and perfbuf don't conflict with each other, and relay
map could be a better choice in some use cases (e.g., constant tracing).
In our application, we output the tracing records as strings into relay
files, and users just read it through `cat` without any process, which
seems impossible to be implemented even with pinnable perfbuf.
Specifically, the advantages of relay map are summarized as follows:
(1) Read at any time without extra process: As discussed before, with
relay map, bpf programs can keep writing into the buffer and users can
read at any time.
(2) Custom data format: Unlike perfbuf processing data entry by entry
(or event), the data format of relay is up to users. It could be simple
string, or binary struct with a header, which provides users with high
flexibility.
(3) Better performance: Due to the simple design, relay outperforms
perfbuf in current bench_ringbufs (I added a relay map case to
`tools/testing/selftests/bpf/benchs/bench_ringbufs.c` without other
changes). Note that relay outputs data directly without notification,
and the consumer can get a batch of samples using read() at a time.
Single-producer, parallel producer, sampled notification
========================================================
relaymap 51.652 ± 0.007M/s (drops 0.000 ± 0.000M/s)
rb-libbpf 22.773 ± 0.015M/s (drops 0.000 ± 0.000M/s)
rb-custom 23.782 ± 0.004M/s (drops 0.000 ± 0.000M/s)
pb-libbpf 18.506 ± 0.007M/s (drops 0.000 ± 0.000M/s)
pb-custom 19.503 ± 0.007M/s (drops 0.000 ± 0.000M/s)
Single-producer, back-to-back mode
==================================
relaymap 44.771 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-libbpf 25.091 ± 0.013M/s (drops 0.000 ± 0.000M/s)
rb-libbpf-sampled 24.779 ± 0.018M/s (drops 0.000 ± 0.000M/s)
rb-custom 27.784 ± 0.012M/s (drops 0.000 ± 0.000M/s)
rb-custom-sampled 27.414 ± 0.017M/s (drops 0.000 ± 0.000M/s)
pb-libbpf 1.409 ± 0.000M/s (drops 0.000 ± 0.000M/s)
pb-libbpf-sampled 18.467 ± 0.005M/s (drops 0.000 ± 0.000M/s)
pb-custom 1.415 ± 0.000M/s (drops 0.000 ± 0.000M/s)
pb-custom-sampled 19.913 ± 0.007M/s (drops 0.000 ± 0.000M/s)
Thanks.
Earlier you said:
"I can use BPF_F_PRESERVE_ELEMS flag to keep the
perf_events, but I do not know how to get the buffer again in a new process.
"
Looks like the issue is lack of map_fd_sys_lookup_elem callback ?
Solve the latter part.
perf_event_array_map should be pinnable like any other map,
so there is a way to get an FD to a map in a new process.
What's missing is a way to get an FD to perf event itself.