Hi, Thanks. Both motivators look very interesting to me: On Sun, 17 May 2020 at 21:58, Andrii Nakryiko <andriin@xxxxxx> wrote: [...] > +Motivation > +---------- > +There are two distinctive motivators for this work, which are not satisfied by > +existing perf buffer, which prompted creation of a new ring buffer > +implementation. > + - more efficient memory utilization by sharing ring buffer across CPUs; I have a use case with traceloop (https://github.com/kinvolk/traceloop) where I use one BPF_MAP_TYPE_PERF_EVENT_ARRAY per container, so when the number of containers times the number of CPU is high, it can use a lot of memory. > + - preserving ordering of events that happen sequentially in time, even > + across multiple CPUs (e.g., fork/exec/exit events for a task). I had the problem to keep track of TCP connections and when tcp-connect and tcp-close events can be on different CPUs, it makes it difficult to get the correct order. [...] > +There are a bunch of similarities between perf buffer > +(BPF_MAP_TYPE_PERF_EVENT_ARRAY) and new BPF ring buffer semantics: > + - variable-length records; > + - if there is no more space left in ring buffer, reservation fails, no > + blocking; [...] BPF_MAP_TYPE_PERF_EVENT_ARRAY can be set as both 'overwriteable' and 'backward': if there is no more space left in ring buffer, it would then overwrite the old events. For that, the buffer needs to be prepared with mmap(...PROT_READ) instead of mmap(...PROT_READ | PROT_WRITE), and set the write_backward flag. See details in commit 9ecda41acb97 ("perf/core: Add ::write_backward attribute to perf event"): struct perf_event_attr attr = {0,}; attr.write_backward = 1; /* backward */ fd = perf_event_open_map(&attr, ...); base = mmap(fd, 0, size, PROT_READ /* overwriteable */, MAP_SHARED); I use overwriteable and backward ring buffers in traceloop: buffers are continuously overwritten and are usually not read, except when a user explicitly asks for it (e.g. to inspect the last few events of an application after a crash). If BPF_MAP_TYPE_RINGBUF implements the same features, then I would be able to switch and use less memory. Do you think it will be possible to implement that in BPF_MAP_TYPE_RINGBUF? Cheers, Alban