On Wed, May 13, 2020 at 5:11 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, May 13, 2020 at 4:36 PM Borislav Petkov <bp@xxxxxxx> wrote: > > > > > > Looking at them, they do have an mb() too so how about this then > > instead? > > > > #define prevent_tail_call_optimization() mb() > > Yeah, I think a full mb() is likely safe, because that's pretty much > always going to be a real instruction with real semantics, and no > amount of link-time optimizations can move it around a call > instruction. Are you sure LTO treats empty asm statements differently than full memory barriers in regards to preventing tail calls? (I'll take your word for it, I don't actually know, but seeing an example of real code run through a production compiler is much much more convincing). The TL;DR of the very long thread is that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 is a proper fix, on the GCC side. Adding arbitrary empty asm statements to work around it? Hacks. Full memory barriers? Hacks. I'm happy that GCC does an optimization that Clang does not. At the same time, it sucks to pay a penalty for a bug we don't trigger. This is the same reason why `asm_volatile_goto` expands differently between GCC and Clang (and why I tried to undo that like a year ago). If Clang realizes the same optimization GCC is doing here (related to tailcalls) tomorrow, well we already support __attribute__((no_stack_protector)) which can be added to the callees we don't want tail called in this case (i.e. allowing tail calls). I should send a patch adding that to include/linux/compiler_attributes.h and annotate the callees in question, before we forget about this issue. Sprinkling empty asm statements or full memory barriers should be treated with the same hesitancy as adding sleep()s to "work around" concurrency bugs. Red flag. And LTO is fun; we've been shipping it in Android for years (and need to attempt upstreaming again). Just today we found an ODR violation in one of the most important symbols in the kernel. Will be sending a patch for that tomorrow. > > I could imagine some completely UP in-order CPU that doesn't need to > serialize with anything at all, and even "mb()" might be empty. I > think you can compile old ARM kernels for that. But realistically I > think we can ignore them at least for now - I'm not sure the link-time > optimization will even do things like that tailcall conversion, and > I'm not convinced that old pre-ARMv7 systems will be relevant by the > time (if) it ever does. > > Linus -- Thanks, ~Nick Desaulniers