On Fri, 5 Jan 2024 18:23:57 +0000 Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote: > On Fri, Jan 05, 2024 at 12:31:11PM -0500, Steven Rostedt wrote: > > On Fri, 5 Jan 2024 14:25:15 +0000 > > Vincent Donnefort <vdonnefort@xxxxxxxxxx> wrote: > > > > > > > Maybe this ioctl should be called regardless if events are found on the current > > > > > reader page. This would at least update the reader->read field and make sure > > > > > subsequent readers are not getting the same events we already had here? > > > > > > > > If we call the ioctl() before we are finished reading events, the events on > > > > the reader page will be discarded and added back to the writer buffer if > > > > the writer is not on the reader page. > > > > > > Hum, I am a bit confused here. If the writer is not on the reader page, then > > > there's no way a writer could overwrite these events while we access them? > > > > Maybe I'm confused. Where do you want to call the ioctl again? If the > > writer is off the reader page and we call the ioctl() where no new events > > were added since our last read, the ioctl() will advance the reader page, > > will it not? That is, any events that were on the previous reader page that > > we did not read yet is lost. > > I meant like the snippet you shared below. > > If on a reader page, there are "unread" events, the ioctl will simply return the > same reader page, as long as there is something to read. It is then safe to call > the ioctl unconditionally. There's only unread events the first time we read the cpubuffer after mapping. Then every ioctl() will give a new page if the writer is not currently on it. int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu) { struct ring_buffer_per_cpu *cpu_buffer; unsigned long reader_size; unsigned long flags; cpu_buffer = rb_get_mapped_buffer(buffer, cpu); if (IS_ERR(cpu_buffer)) return (int)PTR_ERR(cpu_buffer); raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); consume: if (rb_per_cpu_empty(cpu_buffer)) goto out; reader_size = rb_page_size(cpu_buffer->reader_page); /* * There are data to be read on the current reader page, we can * return to the caller. But before that, we assume the latter will read * everything. Let's update the kernel reader accordingly. */ if (cpu_buffer->reader_page->read < reader_size) { while (cpu_buffer->reader_page->read < reader_size) rb_advance_reader(cpu_buffer); // This updates the reader_page to the size of the page. // This means that the next time this is called, it will swap the page. goto out; } if (WARN_ON(!rb_get_reader_page(cpu_buffer))) goto out; goto consume; out: raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); rb_put_mapped_buffer(cpu_buffer); return 0; } > > > > > The trace_mmap_load_subbuf() is called to update the reader page. If for > > some reason the kbuf hasn't read all the data and calls the > > tracefs_cpu_read_buf() again, it should still return the same kbuf, but > > updated. > > > > kbuf = tracefs_cpu_read_buf() { > > ret = trace_mmap_load_subbuf(tcpu->mapping, tcpu->kbuf) { > > if (data != kbuffer_subbuffer(kbuf)) { > > kbuffer_load_subbuffer(kbuf, data); > > return 1; > > } > > return ret > 0 ? tcpu->kbuf : NULL; > > } > > > > record.data = kbuffer_read_event(kbuf, &record.ts); > > kbuffer_next_event(kbuf, NULL); > > > > kbuf = tracefs_cpu_read_buf() { > > ret = trace_mmap_load_subbuf(tcpu->mapping, tcpu->kbuf) { > > kbuffer_refresh(kbuf); > > /* Are there still events to read? */ > > if (kbuffer_curr_size(kbuf)) > > return 1; > > > > The above should return because the application has not read all of the > > kbuf yet. This is perfectly fine for the application to do this. > > > > If we don't return here, but instead do: > > > > ioctl(tmap->fd, TRACE_MMAP_IOCTL_GET_READER); > > > > The buffer that kbuf points to will be swapped out with a new page. And the > > data on the buffer is lost. > > It shouldn't, the same reader page will be returned, only the reader_page->read > pointer would be updated. But after the first read of the page, the reader_page->read is at the end, and will swap. > > > > > Hmm, but looking at how the code currently works without mapping, it looks > > like it would do the read again anyway assuming that the kbuf was > > completely read before reading again. Perhaps it would make the logic > > simpler to assume that the buffer was completely read before this gets > > called again. That is, we could have it do: > > > > if (data != kbuffer_subbuffer(kbuf)) { > > kbuffer_load_subbuffer(kbuf, data); > > return 1; > > } > > > > if (ioctl(tmap->fd, TRACE_MMAP_IOCTL_GET_READER) < 0) > > return -1; > > > > if (id != tmap->map->reader.id) { > > /* New page, just reload it */ > > data = tmap->data + tmap->map->subbuf_size * id; > > kbuffer_load_subbuffer(kbuf, data); > > return 1; > > } > > > > /* > > * Perhaps the reader page had a write that added > > * more data. > > */ > > kbuffer_refresh(kbuf); > > > > /* Are there still events to read? */ > > return kbuffer_curr_size(kbuf) ? 1 : 0; > > > > That sounds good > > > > > > > > > > > > > > And there should only be one reader accessing the map. This is not thread > > > > safe. Once you load the subbuffer into kbuf, kbuf handles what was read. We > > > > don't want to use the reader page for that. > > > > > > I had in mind a scenario with two sequential read. In that case only the reader > > > page read could help to "save" what has been read so far. > > > > > > Currently, we have: > > > > > > <event A> > > > ./cpu-map > > > print event A > > > > > > <event B> > > > ./cpu-map > > > print event A > > > print event B > > > > The kbuffer keeps track of what was read from it, so I'm still confused by > > what you mean. > > We probably want to keep track of what has already been consumed outside of > the remits of a single libtraceevent execution. > > But now with the code snippet above, in addition to the fast-forward when the > kbuf is first loaded (kbuffer_read_buffer(kbuf, tmpbuf, tmap->map->reader.read)), > we shouldn't have that problem anymore. > > <event A> > $ ./cpu-map > print event A > > <event B> > $ ./cpu-map > print event B I think that should be on the kernel side. The first mapping should update the read meta page and then update the reader_page so that the next mapping will give something different. > > Or even > > <event A> > $ ./cpu-map > print event A > > <event B> > $ cat /sys/kernel/tracing/trace_pipe > print event B So this is a kernel change, not a library one. -- Steve