Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 11, 2022 at 9:19 PM Jon Doron <arilou@xxxxxxxxx> wrote:
>
>
>
> On Tue, Jul 12, 2022, 07:01 Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>>
>> On Sun, Jul 10, 2022 at 10:07 AM Jon Doron <arilou@xxxxxxxxx> wrote:
>> >
>> >
>> > On Sun, Jul 10, 2022, 18:16 Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>> >>
>> >> On Sat, Jul 9, 2022 at 10:43 PM Jon Doron <arilou@xxxxxxxxx> wrote:
>> >> >
>> >> > I was referring to the following:
>> >> > https://github.com/libbpf/libbpf-rs/blob/master/libbpf-rs/src/perf_buffer.rs
>> >>
>> >> How does your patch help libbpf-rs?
>> >>
>> >> Please don't top post.
>> >
>> >
>> > You will be able to implement a custom perf buffer consumer, as it already has good bindings with libbpf-sys which is built from the C headers
>> >
>> > Sorry for the top posting I'm not home and replying from my phone
>> >
>>
>> I can see us exposing per-CPU buffers for (very) advanced users, something like:
>>
>> int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void
>> **buf, size_t buf_sz);
>
>
> Not sure I'm fully following what this API does, you will get a pointer to a message in the ring buffer?
> If so how do you consume without setting up a new tail?
>
> Or do you get a full copy of the current ring buffer (because that will mean you would have to alloc and copy which might hurt performance), but in that case you no longer a set tail or drain function.

No, it returns a pointer to mmap()'ed per-CPU buffer memory, including
its header page which contains head/tail positions. As I said, it's
for an advanced user, you need to know the layout and how to consume
data.

>
> Also perhaps regardless if this patchset will be approved or not it would probably be nice to have something like
> int perf_buffer__state(perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, size_t *free_space, size_t *used_space);
>
> Cheers,
> --Jon.
>
>>
>> Then in combination with perf_buffer__buffer_fd() you can implement
>> your own polling and processing. So you just use libbpf logic to setup
>> buffers, but then don't call perf_buffer__poll() at all and read
>> records and update tail on your own.
>>
>> But this combination of perf_buffer__raw_ring_buf() and
>> perf_buffer__set_ring_buf_tail() seems like a bad API, sorry.
>>
>>
>> >>
>> >> > Thanks,
>> >> > -- Jon.
>> >> >
>> >> > On Sun, Jul 10, 2022, 08:23 Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>> >> >>
>> >> >> On Fri, Jul 8, 2022 at 7:54 PM Jon Doron <arilou@xxxxxxxxx> wrote:
>> >> >> >
>> >> >> > On 08/07/2022, Andrii Nakryiko wrote:
>> >> >> > >On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@xxxxxxxxx> wrote:
>> >> >> > >>
>> >> >> > >> From: Jon Doron <jond@xxxxxx>
>> >> >> > >>
>> >> >> > >> Add support for writing a custom event reader, by exposing the ring
>> >> >> > >> buffer state, and allowing to set it's tail.
>> >> >> > >>
>> >> >> > >> Few simple examples where this type of needed:
>> >> >> > >> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>> >> >> > >>    to handle the wrap-around in some other way.
>> >> >> > >> 2. Since perf buf is per-cpu then the order of the events is not
>> >> >> > >>    guarnteed, for example:
>> >> >> > >>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>> >> >> > >>    and the events are spread on more than 1 CPU, then we can end
>> >> >> > >>    up with the following state in the ring buf:
>> >> >> > >>    CPU[0] => [t0, t2]
>> >> >> > >>    CPU[1] => [t1]
>> >> >> > >>    When you consume the events from CPU[0], you could know there is
>> >> >> > >>    a t1 missing, (assuming there are no drops, and your event data
>> >> >> > >>    contains a sequential index).
>> >> >> > >>    So now one can simply do the following, for CPU[0], you can store
>> >> >> > >>    the address of t0 and t2 in an array (without moving the tail, so
>> >> >> > >>    there data is not perished) then move on the CPU[1] and set the
>> >> >> > >>    address of t1 in the same array.
>> >> >> > >>    So you end up with something like:
>> >> >> > >>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>> >> >> > >>    and move the tails as you process in order.
>> >> >> > >> 3. Assuming there are multiple CPUs and we want to start draining the
>> >> >> > >>    messages from them, then we can "pick" with which one to start with
>> >> >> > >>    according to the remaining free space in the ring buffer.
>> >> >> > >>
>> >> >> > >
>> >> >> > >All the above use cases are sufficiently advanced that you as such an
>> >> >> > >advanced user should be able to write your own perfbuf consumer code.
>> >> >> > >There isn't a lot of code to set everything up, but then you get full
>> >> >> > >control over all the details.
>> >> >> > >
>> >> >> > >I don't see this API as a generally useful, it feels way too low-level
>> >> >> > >and special for inclusion in libbpf.
>> >> >> > >
>> >> >> >
>> >> >> > Hi Andrii,
>> >> >> >
>> >> >> > I understand, but I was still hoping you will be willing to expose this
>> >> >> > API.
>> >> >> > libbpf has very simple and nice binding to Rust and other languages,
>> >> >> > implementing one of those use cases in the bindings can make things much
>> >> >> > simpler than using some libc or syscall APIs, instead of enjoying all
>> >> >> > the simplicity that you get for free in libbpf.
>> >> >> >
>> >> >> > Hope you will be willing to reconsider :)
>> >> >>
>> >> >> The discussion would have been different if you mentioned that
>> >> >> motivation in the commit logs.
>> >> >> Please provide links to "Rust and other languages" code that
>> >> >> uses this api.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux