* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > One gloriously ugly hack would be to delay the userspace unwind to > > return-to-userspace, at which point we have a schedulable context and can take > > faults. I don't think it's ugly, and it has various advantages: > > Of course, then you have to somehow identify this later unwind sample with all > > relevant prior samples and stitch the whole thing back together, but that > > should be doable. > > > > In fact, it would not be at all hard to do, just queue a task_work from the > > NMI and have that do the EH based unwind. This would have a couple of advantages: - as you mention, being able to fault in debug info and generally do IO/scheduling, - profiling overhead would be accounted to the task context that generates it, not the NMI context, - there would be a natural batching/coalescing optimization if multiple events hit the same system call: the user-space backtrace would only have to be looked up once for all samples that got collected. This could be done by separating the user-space backtrace into a separate event, and perf tooling would then apply the same user-space backtrace to all prior kernel samples. I.e. the ring-buffer would have trace entries like: [ kernel sample #1, with kernel backtrace #1 ] [ kernel sample #2, with kernel backtrace #2 ] [ kernel sample #3, with kernel backtrace #3 ] [ user-space backtrace #1 at syscall return ] ... Note how the three kernel samples didn't have to do any user-space unwinding at all, so the user-space unwinding overhead got reduced by a factor of 3. Tooling would know that 'user-space backtrace #1' applies to the previous three kernel samples. Or so? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html