On Thu, May 14, 2020 at 1:39 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > Jakub Kicinski <kuba@xxxxxxxxxx> writes: > > > On Wed, 13 May 2020 12:25:27 -0700 Andrii Nakryiko wrote: > >> One interesting implementation bit, that significantly simplifies (and thus > >> speeds up as well) implementation of both producers and consumers is how data > >> area is mapped twice contiguously back-to-back in the virtual memory. This > >> allows to not take any special measures for samples that have to wrap around > >> at the end of the circular buffer data area, because the next page after the > >> last data page would be first data page again, and thus the sample will still > >> appear completely contiguous in virtual memory. See comment and a simple ASCII > >> diagram showing this visually in bpf_ringbuf_area_alloc(). > > > > Out of curiosity - is this 100% okay to do in the kernel and user space > > these days? Is this bit part of the uAPI in case we need to back out of > > it? > > > > In the olden days virtually mapped/tagged caches could get confused > > seeing the same physical memory have two active virtual mappings, or > > at least that's what I've been told in school :) > > Yes, caching the same thing twice causes coherency problems. > > VIVT can be found in ARMv5, MIPS, NDS32 and Unicore32. > > > Checking with Paul - he says that could have been the case for Itanium > > and PA-RISC CPUs. > > Itanium: PIPT L1/L2. > PA-RISC: VIPT L1 and PIPT L2 > > Thanks, > Jakub, thanks for bringing this up. Thomas, Paul, what kind of problems are we talking about here? What are the possible problems in practice? So just for the context, all the metadata (record header) that is written/read under lock and with smp_store_release/smp_load_acquire is written through the one set of page mappings (the first one). Only some of sample payload might go into the second set of mapped pages. Does this mean that user-space might read some old payloads in such case? I could work-around that in user-space, by mmaping twice the same range, one after the other (second mmap would use MAP_FIXED flag, of course). So that's not a big deal. But on the kernel side it's crucial property, because it allows BPF programs to work with data with the assumption that all data is linearly mapped. If we can't do that, reserve() API is impossible to implement. So in that case, I'd rather enable BPF ring buffer only on platforms that won't have these problems, instead of removing reserve/commit API altogether. Well, another way is to just "discard" remaining space at the end, if it's not sufficient for entire record. That's doable, there will always be at least 8 bytes available for record header, so not a problem in that regard. But I would appreciate if you can help me understand full implications of caching physical memory twice. Also just for my education, with VIVT caches, if user-space application mmap()'s same region of memory twice (without MAP_FIXED), wouldn't that cause similar problems? Can't this happen today with mmap() API? Why is that not a problem? > tglx