On Mon, Mar 7, 2016 at 7:30 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, Mar 7, 2016 at 10:07 AM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: >> >> Of course, there are other ways to save a single flag value (such as >> setz). It's up to the compiler developers to decide what they think is >> best. > > Using 'setcc' to save eflags somewhere is definitely the right thing to do. > > Using pushf/popf in generated code is completely insane (unless done > very localized in a controlled area). > > It is, in fact, insane and wrong even in user space, since eflags does > contain bits that user space itself might be modifying. > > In fact, even IF may be modified with iopl 3 (thing old X server > setups), but ignoring that flag entirely, you have AC that acts in > very similar ways (system-wide alignment control) that user space > might be using to make sure it doesn't have unaligned accesses. > > It's rare, yes. But still - this isn't really limited to just the kernel. > > But perhaps more importantly, I suspect using pushf/popf isn't just > semantically the wrong thing to do, it's just plain stupid. It's > likely slower than the obvious 'setcc' model. Agner Fog's table shows > it "popf" as being 25-30 uops on several microarchitectures. Looks > like it's often microcode. > > Now, pushf/popf may well be fairly cheap on *some* uarchitectures, but > it really sounds like a bad idea to use it when not absolutely > required. And that is completely independent of the fact that is > screws up the IF bit. > > But yeah, for the kernel we at a minimum need a way to disable that > code generation, even if the clang guys might have some insane reason > to keep it for other cases. > I am testing my new llvm-toolchain v3.8.1 and a pending x86/hweight fix [1] encouraged me to look at this again. In [2] I found a simple test program from Michael Hordijk. ( See thread "[LLVMdev] optimizer clobber EFLAGS". ) This is what I see... $ objdump -S clang-eflag.o clang-eflag.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <bar>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 53 push %rbx 5: 50 push %rax 6: e8 00 00 00 00 callq b <bar+0xb> b: ff 0d 00 00 00 00 decl 0x0(%rip) # 11 <bar+0x11> 11: 9c pushfq 12: 5b pop %rbx 13: e8 00 00 00 00 callq 18 <bar+0x18> 18: b8 01 00 00 00 mov $0x1,%eax 1d: 53 push %rbx 1e: 9d popfq 1f: 75 07 jne 28 <bar+0x28> 21: e8 00 00 00 00 callq 26 <bar+0x26> 26: 31 c0 xor %eax,%eax 28: 48 83 c4 08 add $0x8,%rsp 2c: 5b pop %rbx 2d: 5d pop %rbp 2e: c3 retq So, the issue is still alive. What do you mean by "for the kernel we at a minimum need a way to disable that code generation"? Can this be fixed in the Linux-kernel? I asked parallelly the people involved in [2] if there are any news on that. - Sedat - [1] http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=f5967101e9de12addcda4510dfbac66d7c5779c3 [2] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088766.html
// Using Clang/LLVM 3.6.0 we are observing a case where the optimizations // are clobbering EFLAGS on x86_64. This is inconvenient when the status of // bit 9 (IF), which controls interrupts, changes. // // Here's a simple test program. Assume that the external function foo() // modifies the IF bit in EFLAGS. // // And it's compiled using the following command line: // // $ clang -O2 -c -o clang-eflag.o clang-eflag.c // // Check objdump output using the following command line: // // $ objdump -S clang-eflag.o // // For more details see "[LLVMdev] optimizer clobber EFLAGS" // <http://lists.llvm.org/pipermail/llvm-dev/2015-July/088766.html> #include <stdlib.h> #include <stdbool.h> void foo(void); int a; int bar(void) { foo(); bool const zero = a -= 1; asm volatile ("" : : : "cc"); foo(); if (zero) { return EXIT_FAILURE; } foo(); return EXIT_SUCCESS; }
Attachment:
clang-eflag.o
Description: application/object