On Mon, Sep 27, 2021 at 04:07:51PM +0000, Sean Christopherson wrote: > I was asking about the exact location to confirm that the explosion is indeed > from exception fixup, which is the "unwinder scenario get confused" I was thinking > of. Based on the disassembly from syzbot, that does indeed appear to be the case > here, i.e. this > > 2a: 4c 8b 21 mov (%rcx),%r12 > > is from exception fixup from somewhere in __d_lookup (can't tell exactly what > it's from, maybe KASAN?). > > > Is there more info on this "the unwinder gets confused"? Bug filed > > somewhere or an email thread? Is it on anybody's radar? > > I don't know if there's a bug report or if this is on anyone's radar. The issue > I've encountered in the past, and what I'm pretty sure is being hit here, is that > the ORC unwinder doesn't play nice with out-of-line fixup code, presumably because > there are no tables for the fixup. I believe kvm_fastop_exception() gets blamed > because it's the first label that's found when searching back through the tables. The ORC unwinder actually knows about .fixup, and unwinding through the .fixup code worked here, as evidenced by the entire stacktrace getting printed. Otherwise there would have been a bunch of question marks in the stack trace. The problem reported here -- falsely printing kvm_fastop_exception -- is actually in the arch-independent printing of symbol names, done by __sprint_symbol(). Most .fixup code fragments are anonymous, in the sense that they don't have symbols associated with them. For x86, here are the only defined symbols in .fixup: ffffffff81e02408 T kvm_fastop_exception ffffffff81e02728 t .E_read_words ffffffff81e0272b t .E_leading_bytes ffffffff81e0272d t .E_trailing_bytes ffffffff81e02734 t .E_write_words ffffffff81e02740 t .E_copy There's a lot of anonymous .fixup code which happens to be placed in the gap between "kvm_fastop_exception" and ".E_read_words". The kernel symbol printing code will go backwards from the given address and will print the first symbol it finds. So any anonymous code in that gap will falsely be reported as kvm_fastop_exception(). I'm thinking the ideal way to fix this would be getting rid of the .fixup section altogether, and instead place a function's corresponding fixup code in a cold part of the original function, with the help of asm_goto and cold label attributes. That way, the original faulting function would be printed instead of an obscure reference to an anonymous .fixup code fragment. It would have other benefits as well. For example, not breaking livepatch... I'll try to play around with it. -- Josh