On Thu, Mar 31, 2022 at 11:54 AM Mark Rutland <mark.rutland@xxxxxxx> wrote: > > That is an impressive number. TBH, I'm shocked that this has *that* much of an > improvement, and I suspect this means we're doing something unnecssarily > expensive in the regular unwinder. > > I've given some specific comments on patches, but a a high-level, I don't want > to add yet another unwind mechanism. For maintenance and correctness reasons, > we've spent the last few years consolidating various unwinders, which this > unfortunately goes against. > > I see that there are number of cases this unwinder will fall afoul of (e.g. > kretprobes and ftrace graph trampolines), and making those work correctly will > require changes elsewhere (e.g. as we rely upon a snapshot of the FP to > disambiguate cases today). Do I understand correctly that kretprobes and ftrace modify frames saved on SCS? So, if either is enabled, SCS frames might contain their addresses instead of actual PCs? If so, this is good enough for our use case. Having kretprobes or ftrace enabled is an unusual setting and there's no requirement to support it. The goal is to have stack trace collection working in most cases during a normal usage of an Android device. Being not feature-complete and not covering all possible peculiar cases is fine. > I'm also very much not keen on having to stash things in the entry assembly for > this distinct unwinder. I'll drop these changes, I'll respond on that patch. > Going forward, I'm also planning on making changes to the way we unwind across > exception boundaries (e.g. to report the LR and FP), and as that depends on > finding the pt_regs based on the FP, that's not going to work with SCS. > > So at a high level, I don't want to add an SCS based unwinder. > > However, I'm very much open to how we could improve the standard unwinder to be > faster, which would be more generally beneficial. I can see that there are some > things we could reasonably do with simple refactoring. The intention of adding an SCS-based unwinder it to use in production together with MTE-based KASAN. Thus, it needs to be as fast as possible. I doubt even a very optimized FP-based unwinder will compare with a simple loop over SCS frames. It seems a pity to let the kernel maintain the current call trace via SCS and then not to use it to collect stack traces. Would it be acceptable if we keep the SCS unwinder code in mm/kasan and not integrate with the common stacktrace machanisms? Thanks!