----- Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > ----- On Feb 24, 2021, at 11:22 AM, Michael Jeanson mjeanson@xxxxxxxxxxxx wrote: > > > [ Adding Mathieu Desnoyers in CC ] > > > > On 2021-02-23 21 h 16, Steven Rostedt wrote: > >> On Thu, 18 Feb 2021 17:21:19 -0500 > >> Michael Jeanson <mjeanson@xxxxxxxxxxxx> wrote: > >> > >>> This series only implements the tracepoint infrastructure required to > >>> allow tracers to handle page faults. Modifying each tracer to handle > >>> those page faults would be a next step after we all agree on this piece > >>> of instrumentation infrastructure. > >> > >> I started taking a quick look at this, and came up with the question: how > >> do you allow preemption when dealing with per-cpu buffers or storage to > >> record the data? > >> > >> That is, perf, bpf and ftrace are all using some kind of per-cpu data, and > >> this is the reason for the need to disable preemption. What's the solution > >> that LTTng is using for this? I know it has a per cpu buffers too, but does > >> it have some kind of "per task" buffer that is being used to extract the > >> data that can fault? > > As a prototype solution, what I've done currently is to copy the user-space > data into a kmalloc'd buffer in a preparation step before disabling preemption > and copying data over into the per-cpu buffers. It works, but I think we should > be able to do it without the needless copy. > > What I have in mind as an efficient solution (not implemented yet) for the LTTng > kernel tracer goes as follows: > > #define COMMIT_LOCAL 0 > #define COMMIT_REMOTE 1 > > - faultable probe is called from system call tracepoint [ preemption/blocking/migration is allowed ] > - probe code calculate the length which needs to be reserved to store the event > (e.g. user strlen), > > - preempt disable -> [ preemption/blocking/migration is not allowed from here ] > - reserve_cpu = smp_processor_id() > - reserve space in the ring buffer for reserve_cpu > [ from that point on, we have _exclusive_ access to write into the ring buffer "slot" > from any cpu until we commit. ] > - preempt enable -> [ preemption/blocking/migration is allowed from here ] > > - copy data from user-space to the ring buffer "slot", > > - preempt disable -> [ preemption/blocking/migration is not allowed from here ] > commit_cpu = smp_processor_id() > if (commit_cpu == reserve_cpu) > use local_add to increment the buf[commit_cpu].subbuffer[current].commit_count[COMMIT_LOCAL] > else > use atomic_add to increment the buf[commit_cpu].subbuffer[current].commit_count[COMMIT_REMOTE] The line above should increment reserve_cpu's buffer commit count, of course. Thanks, Mathieu > - preempt enable -> [ preemption/blocking/migration is allowed from here ] > > Given that lttng uses per-buffer/per-sub-buffer commit counters as simple free-running > accumulators, the trick here is to use two commit counters rather than single one for each > sub-buffer. Whenever we need to read a commit count value, we always sum the total of the > LOCAL and REMOTE counter. > > This allows dealing with migration between reserve and commit without requiring the overhead > of an atomic operation on the fast-path (LOCAL case). > > I had to design this kind of dual-counter trick in the context of user-space use of restartable > sequences. It looks like it may have a role to play in the kernel as well. :) > > Or am I missing something important that would not survive real-life ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com