Re: Yet another unwinding approach

Daniel Colascione <dancol@xxxxxxxxxx> · Wed, 18 Jan 2023 11:01:15 -0500

Florian Weimer <fweimer@xxxxxxxxxx> writes:

> * Daniel Colascione:
>
>> See, both pro-FP and anti-FP camps think that it's the kernel that has
>> to do the unwinding unless we copy whole stacks into traces.
>
> Well, I think we should explore hardware-assisted backtraces (shadow
> stacks), which hopefully are going to get merged in Linux 6.2.

Shadow call stacks are fine, but they do nothing to help us profile
managed code. We have to think bigger than AOT-compiled
C++/Rust/C/etc. machine code.

>> Why should that be? As mentioned in [1], instead of finding a way to
>> have the kernel unwind user programs, we can create a protocol through
>> which the kernel can ask usermode to unwind itself. It could work like
>> this:
>
> If the unwind information is incomplete, this …
>
>> 7) signal handler unwinds the calling thread however it wants (and can
>> sleep and take page faults if needed)
>
> … might encounter segmentation faults and terminate the process.  So
> far, incorrect unwind information (whether it's a clobbered frame
> pointer, or missing DWARF information about clobbered registers) is not
> a problem, but it's kind of hard to validate this information from
> within the process itself.  Maybe we'd have to add a magic memcpy first
> to the vDSO, which the kernel recognizes based on the code addresses,
> and suppresses sending the signal for it.

Aren't we over-thinking this? We can just terminate the trace if we're
missing reliable (DWARF/ORC/etc.) unwind information for a frame. Why
would we expect a segfault and try to recover if we're unwinding using
debug information? That debug information would have to be wrong for
to segfault.

Granted, if we're trying to traverse frame pointers, that's a different
story, but...

> Maybe we'd have to add a magic memcpy first
> to the vDSO, which the kernel recognizes based on the code addresses,
> and suppresses sending the signal for it.

...if we implemented by 2018 proposed for shared signal handlers, we
could arrange that "magic memcpy" in userspace without the kernel's
help: libc's SIGSEGV signal handler (which, under my proposal, could
exist with other SIGSEGV handlers in the same process) could recognize
and ignore that "magic memcpy" function on its own by looking at
si_addr. Why do in the kernel what we can do in userspace?
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue