Re: perf event and data_sz

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 27, 2020 at 3:02 AM Borna Cafuk <borna.cafuk@xxxxxxxxxx> wrote:
>
> On Wed, Aug 26, 2020 at 8:45 PM Yonghong Song <yhs@xxxxxx> wrote:
> >
> > On 8/26/20 7:11 AM, Borna Cafuk wrote:
> > > Hi everyone,
> > >
> > > When examining BPF programs that use perf buffers, I have noticed that
> > > the `data_sz` parameter in the sample callback has a different value than
> > > the size passed to `bpf_perf_event_output`.
> > >
> > > Raw samples are padded in `perf_prepare_sample` so their size is a multiple of
> > > `sizeof(u64)` (see [0]). The sample includes the size header, a `u32`.
> > > The size stored in `raw->size` is then size of the sample, minus the 4 bytes for
> > > the size header. This size, however, includes both the data and the padding.
> > >
> > > What I find confusing is that this size including the padding is eventually
> > > passed as `data_sz` to the callback in the userspace, instead of
> > > the size of the data that was passed as an argument to `bpf_perf_event_output`.
> > >
> > > Is this intended behaviour?
> >
> >  From the kernel source code, yes, this is expected behavior. What you
> > described below matches what the kernel did. So raw->size = 68 is expected.
>
> I understand why this size that is stored in `raw->size` is needed.
> What I don't see is how the value is of any use in the callback.
>
> >
> > >
> > > I have a use-case for getting only the size of the data in the
> > > userspace, could this be done?
> >
> > In this case, since we know the kernel writes one record at a time,
> > you check the size, it is 68 more than 62, you just read 62 bytes
> > as your real data, ignore the rest as the padding. Does this work?
>
> The `data_sz` parameter seems a little pointless, then. What is its purpose
> in the callback if it doesn't accurately represent the size of the data?
>
> >
> > bcc callback passed the the buffer with raw->size to application.
> > But applications are expected to know what the record layout is...
>
> I'm afraid that wouldn't work for the use-case, our application should be able
> to read the raw data without having to know the record layout. It has to be
> generic, we handle interpreting the records elsewhere and at another time.

I agree it's confusing and less useful (not exactly useless, though),
but seems like PERF_SAMPLE_RAW doesn't store neither original size nor
(equivalently) the added padding size, so libbpf has nothing to go off
of, unfortunately.

I can only offer two suggestions:
1. Make sure samples you are emitting are 8 byte aligned, in which
case data_sz will always match actual data size. If you need different
sizes for different structs, you'd have to artificially add extra
bytes, though.
2. Consider using BPF ringbuf, it's data_sz is exact and doesn't
include padding (you need 5.8+ kernel).

>
> >
> > >
> > > To demonstrate, I have prepared a minimal example by modifying
> > > BCC's filelife example. It uses a kprobe on vfs_unlink to print some sizes
> > > every time a file is unlinked. The sizes are:
> > >   * the `sizeof(struct event)` measured in the userspace program,
> > >   * the `sizeof(struct event)` measured in the BPF program, and
> > >   * the `data_sz` parameter.
> > >
> > > The first two are 62, as expected, but `data_sz` is 68.
> > > The 62 bytes of the struct and the 4 bytes of the sample header make 66 bytes.
> > > This is rounded up to the first multiple of 8, which is 72.
> > > The 4 bytes for the size header are then subtracted,
> > > and 68 is written as the data size.
> > >
> > > Any input is much appreciated,
> > >
> > > Best regards,
> > > Borna Cafuk
> > >
> > >
> > > [0] https://github.com/torvalds/linux/blob/6a9dc5fd6170d0a41c8a14eb19e63d94bea5705a/kernel/events/core.c#L7035
> > >
> > >
> > > example.h

[...]



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux