Re: BPF ring buffer variable-length data appending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 7, 2021 at 2:55 PM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> On Thu, 7 Jan 2021, Andrii Nakryiko wrote:
>
> > We discussed this topic today at office hour. As I mentioned, I don't
> > know the ideal solution, but here is something that has enough
> > flexibility for real-world uses, while giving the performance and
> > convenience of reserve/commit API. Ignore naming, we can bikeshed that
> > later.
> >
> > So what we can do is introduce a new bpf_ringbuf_reserve() variant:
> >
> > bpf_ringbuf_reserve_extra(void *ringbuf, __u64 size, __u64 flags, void
> > *extra, __u64 extra_sz);
> >
> > The idea is that we reserve a fixed size amount of data that can be
> > used like it is today for filling a fixed-sized metadata/sample
> > directly. But the real size of the reserved sample is (size +
> > extra_sz), and bpf_ringbuf_reserve_extra() helper will bpf_probe_read
> > (kernel or user, depending on flags) data from extra and put it right
> > after the fixed-size part.
> >
> > So the use would be something like:
> >
> > struct my_meta *m = bpf_ringbuf_reserve_extra(&rb, sizeof(*m),
> > BPF_RB_PROBE_USER, env_vars, 1024);
> >
> > if (!m)
> >     /* too bad, either probe_read_user failed or ringbuf is full */
> >     return 1;
> >
> > m->my_field1 = 123;
> > m->my_field2 = 321;
> >
> >
> > So the main problem with this is that when probe_read fails, we fail
> > reservation completely(internally we'd just discard ringbuf sample).
> > Is that OK? Or is it better to still reserve fixed-sized part and
> > zero-out the variable-length part? We are combining two separate
> > operations into a single API, so error handling is more convoluted.
> >
> >
> > Now, the main use case requested was to be able to fetch an array of
> > zero-terminated strings. I honestly don't think it's possible to
> > implement this efficiently without two copies of string data. Mostly
> > because to just determine the size of the string you have to read it
> > one extra time. And you'd probably want to copy string data into some
> > controlled storage first, so that you don't end up reading it once
> > successfully and then failing to read it on the second try. Next, when
> > you have multiple strings, how do you deal with partial failures? It's
> > even worse in terms of error handling and error propagation than the
> > fixed extra size variant described above.
> >
> > Ignoring all that, let's say we'd implement
> > bpf_ringbuf_reserve_extra_strs() helper, that would somehow be copying
> > multiple zero-terminated strings after the fixed-size prefix. Think
> > about implementation. Just to determine the total size of the ringbuf
> > sample, you'd need to read all strings once, and probably also copy
> > them locally.  Then you'd reserve a ringbuf sample and copy all that
> > for the second time. So it's as inefficient as a BPF program
> > constructing a single block of memory by reading all such strings
> > manually into a per-CPU array and then using the above
> > bpf_ringbuf_reserve_extra().
> >
>
> I ran into a variation of this problem with the ksnoop tool [1]. I'd hoped
> to use ringbuf, but the problem I had was I needed to store a series of N
> strings into a buffer, and for each I needed to specify a maximum size (for
> bpf_snprintf_btf()).  However it was entirely possible that the
> strings would be a lot smaller, and they'd be copied one after the other,
> so while I needed to reserve a buffer for those N strings of
> (N * MAX_STRINGSIZE) size as the worst case scenario, it would likely be a
> lot smaller (the sum of the lengths of the N strings plus null
> termination), so there was no need to commit the unused space.  I ended up
> using a BPF map-derived string buffer and perf events to send the events
> instead (the code for this is ksnoop.bpf.c in [1]).

Yes, you could have used ringbuf as well, just replace
bpf_perf_event_output() with bpf_ringbuf_output().

>
> So all of this is to say that I'm assuming along with the reserve_extra()
> API, there'd need to be some sort of bpf_ringbuf_submit_extra(ringbuf,
> flags, extra_size) which specifies how much of the extra space was used?
> If that's the case I think this approach makes ringbuf usable for my
> scenario; the string buffer would effectively all be considered extra
> space, and we'd just submit what was used.

Not really, there is no need for bpf_ringbuf_submit_extra(), normal
submit will do. See definition of bpf_ringbuf_reserve_extra(), it
already expects the caller to specify how much extra (extra_sz) bytes
to append after the fixed-size part. Why would it return this again to
you?

>
> However I _think_ you were suggesting above combining the probe read and
> the extra reservation as one operation; in my case that wouldn't work
> because the strings were written directly by a helper (bpf_snprintf_btf())
> into the target buffer. It's probably an oddball situation of course, but
> thought I'd better mention it just in case. Thanks!

Not sure why that wouldn't work. You'd use your struct trace's buffer
and correct size of accumulated strings as the last two params to
bpf_ringbuf_reserve_extra(). I have a feeling that we are not on the
same page yet :)

>
> Alan
>
> [1] https://lore.kernel.org/bpf/1609773991-10509-1-git-send-email-alan.maguire@xxxxxxxxxx/



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux