On Mon, Jan 16, 2023 at 3:30 PM Daniel Colascione <dancol@xxxxxxxxxx> wrote: > > Frame pointers also have the disadvantage of working only with AOT-compiled languages for which a trace analysis tool can associate an instruction pointer with a semantically-relevant bit of code. If you try to use frame pointers to profile a Python program, all you're going to get is a profile of the interpreter. It seems like the debate is between those who want observability (via frame pointers) and those who want the performance benefits of -fomit-frame-pointer. > > There's a third way. > > See, both pro-FP and anti-FP camps think that it's the kernel that has to do the unwinding unless we copy whole stacks into traces. Why should that be? As mentioned in [1], instead of finding a way to have the kernel unwind user programs, we can create a protocol through which the kernel can ask usermode to unwind itself. It could work like this: > > 1) backtrace requested in the kernel (e.g. to a perf counter overflow) > > 2) kernel unwinds itself to the userspace boundary the usual way > > 3) kernel forms a nonce (e.g. by incrementing a 64-bit counter) > > 4) kernel logs a stack trace the usual way (e.g. to the ftrace ring buffer), but with the final frame referring to the nonce created in the previous step > > 5) kernel queues a signal (one userspace has explicitly opted into via a new prctl()); the siginfo_t structure encodes (e.g. via si_status and si_value) the nonce > > 6) kernel eventually returns to userspace; queued signal handler gains control > > 7) signal handler unwinds the calling thread however it wants (and can sleep and take page faults if needed) > > 8) signal handler logs the result of its unwind, along with the nonce, to the system log (e.g. via a new system call, a sysfs write, an io_uring submission, etc.) > > Post-processing tools can associate kernel stacks with user stacks tagged with the corresponding nonces and reconstitute the full stacks in effect at the time of each logged event. > > We can avoid duplicating unwindgs too: at step #3, if the kernel finds that the current thread already has an unwind pending, it can uses the already-pending nonce instead of making a new one and queuing a signal: many kernel stacks can end with the same user stack "tail". > > One nice property of this scheme is that the userspace unwinding isn't limited to native code. Libc could arbitrate unwinding across an arbitrary number of managed runtime environments in the context of a single process: the system could be smart enough to know that instead of unwinding through, e.g. Python interpreter frames, the unwinder (which is normal userspace code, pluggable via DSO!) could traverse and log *Python* stack frames instead, with meaningful function names. And if you happened to have, say, a JavaScript runtime in the same process, both JavaScript and Python could participate in the semantic unwinding process. > > A pluggable userspace unwind mechanism would have zero cost in the case that we're not recording stack frames. On top of that, a pluggable userspace unwinder *could* be written to traverse frame pointers just as the kernel unwinder does today, if userspace thinks that's the best option. Without breaking kernel ABI, that userspace unwinder could use DWARF, or ORC, or any other userspace unwinding approach. It's future-proof. > > In other words, choice between frame pointers and no frame pointers is a false dichotomy. There's a better approach. The Linux ecosystem as a whole would be better off building something like the pluggable userspace asynchronous unwinding infrastructure described above. > > [1] https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/message/646XXHGEGOKO465LQKWCPPPAZBSW5NWO/ This sounds great, but how is it going to get made? And is the kernel amenable to this in the first place? -- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue