On Wed, May 13, 2020 at 11:08:46PM -0700, Andrii Nakryiko wrote: > On Wed, May 13, 2020 at 3:49 PM Jonathan Lemon <jonathan.lemon@xxxxxxxxx> wrote: > > > > On Wed, May 13, 2020 at 12:25:26PM -0700, Andrii Nakryiko wrote: > > > Implement a new BPF ring buffer, as presented at BPF virtual conference ([0]). > > > It presents an alternative to perf buffer, following its semantics closely, > > > but allowing sharing same instance of ring buffer across multiple CPUs > > > efficiently. > > > > > > Most patches have extensive commentary explaining various aspects, so I'll > > > keep cover letter short. Overall structure of the patch set: > > > - patch #1 adds BPF ring buffer implementation to kernel and necessary > > > verifier support; > > > - patch #2 adds litmus tests validating all the memory orderings and locking > > > is correct; > > > - patch #3 is an optional patch that generalizes verifier's reference tracking > > > machinery to capture type of reference; > > > - patch #4 adds libbpf consumer implementation for BPF ringbuf; > > > - path #5 adds selftest, both for single BPF ring buf use case, as well as > > > using it with array/hash of maps; > > > - patch #6 adds extensive benchmarks and provide some analysis in commit > > > message, it build upon selftests/bpf's bench runner. > > > > > > [0] https://docs.google.com/presentation/d/18ITdg77Bj6YDOH2LghxrnFxiPWe0fAqcmJY95t_qr0w > > > > > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > Cc: Jonathan Lemon <jonathan.lemon@xxxxxxxxx> > > > > Looks very nice! A few random questions: > > > > 1) Why not use a structure for the header, instead of 2 32bit ints? > > hm... no reason, just never occurred to me it's necessary :) It might be clearer to do this. Something like: struct ringbuf_record { union { struct { u32 size:30; bool busy:1; bool discard:1; }; u32 word1; }; union { u32 pgoff; u32 word2; }; }; While perhaps a bit overkill, makes it clear what is going on. > > 2) Would it make sense to reserve X bytes, but only commit Y? > > the offset field could be used to write the record length. > > > > E.g.: > > reserve 512 bytes [BUSYBIT | 512][PG OFFSET] > > commit 400 bytes [ 512 ] [ 400 ] > > It could be done, though I had tentative plans to use those second 4 > bytes for something useful eventually. > > But what's the use case? From ring buffer's perspective, X bytes were > reserved and are gone already and subsequent writers might have > already advanced producer counter with the assumption that all X bytes > are going to be used. So there are no space savings, even if record is > discarded or only portion of it is submitted. I can only see a bit of > added convenience for an application, because it doesn't have to track > amount of actual data in its record. But this doesn't seem to be a > common case either, so not sure how it's worth supporting... Is there > a particular case where this is extremely useful and extra 4 bytes in > record payload is too much? Not off the top of my head - it was just the first thing that came to mind when reading about the commit/discard paradigm. I was thinking about variable records, where the maximum is reserved, but less data is written. But there's no particular reason for the ringbuffer to track this either, it could be part of the application framing. > > 3) Why have 2 separate pages for producer/consumer, instead of > > just aligning to a smp cache line (or even 1/2 page?) > > Access rights restrictions. Consumer page is readable/writable, > producer page is read-only for user-space. If user-space had ability > to write producer position, it could wreck a huge havoc for the > ringbuf algorithm. Ah, thanks, that makes sense. Might want to add a comment to that effect, as it's different from other implementations. -- Jonathan