Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

Daan De Meyer via devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> · Sun, 10 Jul 2022 08:15:25 +0000

> I strongly prefer the latter approach.  I believe the unwinder

> executes in NMI context, meaning that it must not block and must finish

> executing in a bounded amount of time.  Furthermore, any oops becomes

> an immediate kernel panic.  The eBPF verifier can trivially guarantee

> that the unwinder satisfies the properties needed here.  For security

> reasons, submitting eBPF programs is a privileged operation, but some

> programs could be compiled into the kernel and thus considered trusted.

> Such programs could be used without any special privileges.

>

> The key advantage of this approach is that privileged user-mode

> profiling tools, such as sysprof, can submit their own eBPF unwinders.

> This means that the kernel does not need to support whatever unwind

> info format userspace uses.  One could use DWARF, ORC, or any other

> format one wishes.

BPF programs do not have access to arbitrary ELF sections AFAIK. Every EBPF
unwinder that I've found is implemented via preprocessing the unwind format
in userspace and storing that in BPF maps so that it can be accessed from the 
BPF program.

Effectively, this means that every program that wants to do unwinding

in BPF has to do this preprocessing and store all the required information

in BPF maps. When you don't know which program you're going to be

requesting a stacktrace for, this effectively means userspace has to provide

this information for every program that might run on the system. While this

might work for dedicated long-running system profiling daemons, it is not

an option for software such as perf or bpftrace since it would drastically

increase their startup time, as well as their overall resource usage.

Cheers,

Daan

________________________________________

From: Demi Marie Obenour <demiobenour@xxxxxxxxx>

Sent: 09 July 2022 04:02

To: devel@xxxxxxxxxxxxxxxxxxxxxxx

Subject: Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

On 7/8/22 20:18, Christian Hergert wrote:

>> That is the problem right here: .eh_frame-based unwinding is too slow, so it has to be

>> done offline in userspace.  What about instead adding ORC information to userspace?  That

>> would be much faster to use.

>

> I'm not familiar with ORC, but there are a few things that initially come to

> mind in looking towards such a solution.

>

> First, are there any examples of perf being able to reference ORC data coming

> from user-space or is it currently limited to PERF_CONTEXT_KERNEL? For

> system-wide profiling, we still require that the kernel can do high-velocity

> unwinding across address contexts.

Why does the unwinding need to happen in the kernel?  The kernel can

already asynchronously invoke userspace code in the form of signal

handlers.  Is the problem that it is necessary to collect profiling

information in the middle of a system call, where another syscall

would see inconsistent (and potentially exploitable) kernel state?

> My (limited) understanding of ORC is that the result produced by objtool gets

> you a series of unwind tables, but those tables require further processing by

> the kernel at boot.

>

> Again, I have limited understanding, but wouldn't something need to

> be processed as part of spawning and loading executable pages? There are both

> .orc_unwind and .orc_unwind_ip sections, both of which need to be sorted. I

> don't know what layer would be responsible for that, or how it adapts to

> dlopen(), double-mapping pages like libffi, etc... but I'm sure people will

> have opinions about it.

Ouch.  That is a serious problem for a number of reasons, not least

of which is security.  Having the kernel parse even more complex

untrusted input in C is a horrible idea.

I can think of at least two better options:

1. Wait for Rust support to be merged, and write the unwinder in Rust.

2. Implement the unwinder as an eBPF program.

I strongly prefer the latter approach.  I believe the unwinder

executes in NMI context, meaning that it must not block and must finish

executing in a bounded amount of time.  Furthermore, any oops becomes

an immediate kernel panic.  The eBPF verifier can trivially guarantee

that the unwinder satisfies the properties needed here.  For security

reasons, submitting eBPF programs is a privileged operation, but some

programs could be compiled into the kernel and thus considered trusted.

Such programs could be used without any special privileges.

The key advantage of this approach is that privileged user-mode

profiling tools, such as sysprof, can submit their own eBPF unwinders.

This means that the kernel does not need to support whatever unwind

info format userspace uses.  One could use DWARF, ORC, or any other

format one wishes.

Christian, would this be sufficient for your needs?

--

Sincerely,

Demi Marie Obenour (she/her/hers)

_______________________________________________

devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx

To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines

List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx

Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure