On Thu, Jul 13, 2017 at 11:19:11AM +0200, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > One gloriously ugly hack would be to delay the userspace unwind to > > > return-to-userspace, at which point we have a schedulable context and can take > > > faults. > > I don't think it's ugly, and it has various advantages: > > > > Of course, then you have to somehow identify this later unwind sample with all > > > relevant prior samples and stitch the whole thing back together, but that > > > should be doable. > > > > > > In fact, it would not be at all hard to do, just queue a task_work from the > > > NMI and have that do the EH based unwind. > > This would have a couple of advantages: > > - as you mention, being able to fault in debug info and generally do > IO/scheduling, > > - profiling overhead would be accounted to the task context that generates it, > not the NMI context, > > - there would be a natural batching/coalescing optimization if multiple events > hit the same system call: the user-space backtrace would only have to be looked > up once for all samples that got collected. > > This could be done by separating the user-space backtrace into a separate event, > and perf tooling would then apply the same user-space backtrace to all prior > kernel samples. > > I.e. the ring-buffer would have trace entries like: > > [ kernel sample #1, with kernel backtrace #1 ] > [ kernel sample #2, with kernel backtrace #2 ] > [ kernel sample #3, with kernel backtrace #3 ] > [ user-space backtrace #1 at syscall return ] > ... > > Note how the three kernel samples didn't have to do any user-space unwinding at > all, so the user-space unwinding overhead got reduced by a factor of 3. > > Tooling would know that 'user-space backtrace #1' applies to the previous three > kernel samples. > > Or so? BTW, while we're throwing out ideas for this, here's another idea, though it's almost certainly not a good one :-) For user space stack unwinding, the kernel could emulate what the kernel 'guess' unwinder does by scanning the user space stack and returning all the text addresses it finds. The results wouldn't be 100% accurate, but they could end up being useful over time. -- Josh -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html