Re: BPF ring buffer variable-length data appending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 7 Jan 2021, Andrii Nakryiko wrote:

> We discussed this topic today at office hour. As I mentioned, I don't
> know the ideal solution, but here is something that has enough
> flexibility for real-world uses, while giving the performance and
> convenience of reserve/commit API. Ignore naming, we can bikeshed that
> later.
> 
> So what we can do is introduce a new bpf_ringbuf_reserve() variant:
> 
> bpf_ringbuf_reserve_extra(void *ringbuf, __u64 size, __u64 flags, void
> *extra, __u64 extra_sz);
> 
> The idea is that we reserve a fixed size amount of data that can be
> used like it is today for filling a fixed-sized metadata/sample
> directly. But the real size of the reserved sample is (size +
> extra_sz), and bpf_ringbuf_reserve_extra() helper will bpf_probe_read
> (kernel or user, depending on flags) data from extra and put it right
> after the fixed-size part.
> 
> So the use would be something like:
> 
> struct my_meta *m = bpf_ringbuf_reserve_extra(&rb, sizeof(*m),
> BPF_RB_PROBE_USER, env_vars, 1024);
> 
> if (!m)
>     /* too bad, either probe_read_user failed or ringbuf is full */
>     return 1;
> 
> m->my_field1 = 123;
> m->my_field2 = 321;
> 
> 
> So the main problem with this is that when probe_read fails, we fail
> reservation completely(internally we'd just discard ringbuf sample).
> Is that OK? Or is it better to still reserve fixed-sized part and
> zero-out the variable-length part? We are combining two separate
> operations into a single API, so error handling is more convoluted.
> 
> 
> Now, the main use case requested was to be able to fetch an array of
> zero-terminated strings. I honestly don't think it's possible to
> implement this efficiently without two copies of string data. Mostly
> because to just determine the size of the string you have to read it
> one extra time. And you'd probably want to copy string data into some
> controlled storage first, so that you don't end up reading it once
> successfully and then failing to read it on the second try. Next, when
> you have multiple strings, how do you deal with partial failures? It's
> even worse in terms of error handling and error propagation than the
> fixed extra size variant described above.
> 
> Ignoring all that, let's say we'd implement
> bpf_ringbuf_reserve_extra_strs() helper, that would somehow be copying
> multiple zero-terminated strings after the fixed-size prefix. Think
> about implementation. Just to determine the total size of the ringbuf
> sample, you'd need to read all strings once, and probably also copy
> them locally.  Then you'd reserve a ringbuf sample and copy all that
> for the second time. So it's as inefficient as a BPF program
> constructing a single block of memory by reading all such strings
> manually into a per-CPU array and then using the above
> bpf_ringbuf_reserve_extra().
> 

I ran into a variation of this problem with the ksnoop tool [1]. I'd hoped 
to use ringbuf, but the problem I had was I needed to store a series of N 
strings into a buffer, and for each I needed to specify a maximum size (for 
bpf_snprintf_btf()).  However it was entirely possible that the 
strings would be a lot smaller, and they'd be copied one after the other, 
so while I needed to reserve a buffer for those N strings of
(N * MAX_STRINGSIZE) size as the worst case scenario, it would likely be a 
lot smaller (the sum of the lengths of the N strings plus null 
termination), so there was no need to commit the unused space.  I ended up 
using a BPF map-derived string buffer and perf events to send the events 
instead (the code for this is ksnoop.bpf.c in [1]).

So all of this is to say that I'm assuming along with the reserve_extra() 
API, there'd need to be some sort of bpf_ringbuf_submit_extra(ringbuf, 
flags, extra_size) which specifies how much of the extra space was used? 
If that's the case I think this approach makes ringbuf usable for my 
scenario; the string buffer would effectively all be considered extra 
space, and we'd just submit what was used.

However I _think_ you were suggesting above combining the probe read and 
the extra reservation as one operation; in my case that wouldn't work 
because the strings were written directly by a helper (bpf_snprintf_btf())
into the target buffer. It's probably an oddball situation of course, but 
thought I'd better mention it just in case. Thanks!

Alan

[1] https://lore.kernel.org/bpf/1609773991-10509-1-git-send-email-alan.maguire@xxxxxxxxxx/



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux