> I strongly prefer the latter approach. I believe the unwinder
> executes in NMI context, meaning that it must not block and must finish > executing in a bounded amount of time. Furthermore, any oops becomes > an immediate kernel panic. The eBPF verifier can trivially guarantee > that the unwinder satisfies the properties needed here. For security > reasons, submitting eBPF programs is a privileged operation, but some > programs could be compiled into the kernel and thus considered trusted. > Such programs could be used without any special privileges. > > The key advantage of this approach is that privileged user-mode > profiling tools, such as sysprof, can submit their own eBPF unwinders. > This means that the kernel does not need to support whatever unwind > info format userspace uses. One could use DWARF, ORC, or any other > format one wishes. BPF programs do not have access to arbitrary ELF sections AFAIK. Every EBPF unwinder that I've found is implemented via preprocessing the unwind format
in userspace and storing that in BPF maps so that it can be accessed from the
BPF program.
Effectively, this means that every program that wants to do unwinding in BPF has to do this preprocessing and store all the required information in BPF maps. When you don't know which program you're going to be requesting a stacktrace for, this effectively means userspace has to provide this information for every program that might run on the system. While this might work for dedicated long-running system profiling daemons, it is not an option for software such as perf or bpftrace since it would drastically increase their startup time, as well as their overall resource usage. Cheers,
Daan
________________________________________ From: Demi Marie Obenour <demiobenour@xxxxxxxxx> Sent: 09 July 2022 04:02 To: devel@xxxxxxxxxxxxxxxxxxxxxxx Subject: Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal) On 7/8/22 20:18, Christian Hergert wrote: >> That is the problem right here: .eh_frame-based unwinding is too slow, so it has to be >> done offline in userspace. What about instead adding ORC information to userspace? That >> would be much faster to use. > > I'm not familiar with ORC, but there are a few things that initially come to > mind in looking towards such a solution. > > First, are there any examples of perf being able to reference ORC data coming > from user-space or is it currently limited to PERF_CONTEXT_KERNEL? For > system-wide profiling, we still require that the kernel can do high-velocity > unwinding across address contexts. Why does the unwinding need to happen in the kernel? The kernel can already asynchronously invoke userspace code in the form of signal handlers. Is the problem that it is necessary to collect profiling information in the middle of a system call, where another syscall would see inconsistent (and potentially exploitable) kernel state? > My (limited) understanding of ORC is that the result produced by objtool gets > you a series of unwind tables, but those tables require further processing by > the kernel at boot. > > Again, I have limited understanding, but wouldn't something need to > be processed as part of spawning and loading executable pages? There are both > .orc_unwind and .orc_unwind_ip sections, both of which need to be sorted. I > don't know what layer would be responsible for that, or how it adapts to > dlopen(), double-mapping pages like libffi, etc... but I'm sure people will > have opinions about it. Ouch. That is a serious problem for a number of reasons, not least of which is security. Having the kernel parse even more complex untrusted input in C is a horrible idea. I can think of at least two better options: 1. Wait for Rust support to be merged, and write the unwinder in Rust. 2. Implement the unwinder as an eBPF program. I strongly prefer the latter approach. I believe the unwinder executes in NMI context, meaning that it must not block and must finish executing in a bounded amount of time. Furthermore, any oops becomes an immediate kernel panic. The eBPF verifier can trivially guarantee that the unwinder satisfies the properties needed here. For security reasons, submitting eBPF programs is a privileged operation, but some programs could be compiled into the kernel and thus considered trusted. Such programs could be used without any special privileges. The key advantage of this approach is that privileged user-mode profiling tools, such as sysprof, can submit their own eBPF unwinders. This means that the kernel does not need to support whatever unwind info format userspace uses. One could use DWARF, ORC, or any other format one wishes. Christian, would this be sufficient for your needs? -- Sincerely, Demi Marie Obenour (she/her/hers) _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure |
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure