On Wed, May 13, 2020 at 3:49 PM Jonathan Lemon <jonathan.lemon@xxxxxxxxx> wrote: > > On Wed, May 13, 2020 at 12:25:26PM -0700, Andrii Nakryiko wrote: > > Implement a new BPF ring buffer, as presented at BPF virtual conference ([0]). > > It presents an alternative to perf buffer, following its semantics closely, > > but allowing sharing same instance of ring buffer across multiple CPUs > > efficiently. > > > > Most patches have extensive commentary explaining various aspects, so I'll > > keep cover letter short. Overall structure of the patch set: > > - patch #1 adds BPF ring buffer implementation to kernel and necessary > > verifier support; > > - patch #2 adds litmus tests validating all the memory orderings and locking > > is correct; > > - patch #3 is an optional patch that generalizes verifier's reference tracking > > machinery to capture type of reference; > > - patch #4 adds libbpf consumer implementation for BPF ringbuf; > > - path #5 adds selftest, both for single BPF ring buf use case, as well as > > using it with array/hash of maps; > > - patch #6 adds extensive benchmarks and provide some analysis in commit > > message, it build upon selftests/bpf's bench runner. > > > > [0] https://docs.google.com/presentation/d/18ITdg77Bj6YDOH2LghxrnFxiPWe0fAqcmJY95t_qr0w > > > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> > > Cc: Jonathan Lemon <jonathan.lemon@xxxxxxxxx> > > Looks very nice! A few random questions: > > 1) Why not use a structure for the header, instead of 2 32bit ints? hm... no reason, just never occurred to me it's necessary :) > > 2) Would it make sense to reserve X bytes, but only commit Y? > the offset field could be used to write the record length. > > E.g.: > reserve 512 bytes [BUSYBIT | 512][PG OFFSET] > commit 400 bytes [ 512 ] [ 400 ] It could be done, though I had tentative plans to use those second 4 bytes for something useful eventually. But what's the use case? From ring buffer's perspective, X bytes were reserved and are gone already and subsequent writers might have already advanced producer counter with the assumption that all X bytes are going to be used. So there are no space savings, even if record is discarded or only portion of it is submitted. I can only see a bit of added convenience for an application, because it doesn't have to track amount of actual data in its record. But this doesn't seem to be a common case either, so not sure how it's worth supporting... Is there a particular case where this is extremely useful and extra 4 bytes in record payload is too much? > > 3) Why have 2 separate pages for producer/consumer, instead of > just aligning to a smp cache line (or even 1/2 page?) Access rights restrictions. Consumer page is readable/writable, producer page is read-only for user-space. If user-space had ability to write producer position, it could wreck a huge havoc for the ringbuf algorithm. > > 4) The XOR of busybit makes me wonder if there is anything that > prevents the system from calling commit twice? Yes, verifier checks this and will reject such BPF program. > -- > Jonathan