On Tue, Jan 17, 2023 at 10:34 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers > <ndesaulniers@xxxxxxxxxx> wrote: > > > > On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > Side note: that's not something new or unusual. It's been the case > > > since I started testing clang - we have several code-paths where we > > > use "unlikely()" to try to get very unlikely cases to be out-of-line, > > > and clang just mostly ignores it, or treats it as a very weak hint. I > > > think the only way to get clang to treat it as a *strong* hint is to > > > use PGO. > > > > I'd be surprised if that were intentional or by design. > > > > Do you guys have a bug report we could look at? > > Heh. I actually sent you an example long ago. Let me go fish it out of > my mail archives and quote some of it below so that you can find it in > yours.. > > Linus > > [ Time passes. Found this email to you and Bill Wendling from Feb 16, > 2020, Message-ID > CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@xxxxxxxxxxxxxx ]: > > Anyway, I'm looking at clang code generation, and comparing it with > gcc on one of my "this has been optimized to hell and back" functions: > __d_lookup_rcu(). > > It looks like clang does frame pointers, and ignores our > likely/unlikely annotations. > > So this code: > > if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) { > int tlen; > const char *tname; > ...... > > doesn't actually jump out of line, but instead generates the unlikely > case as the fallthrough: > > testb $2, (%r12) > je .LBB50_9 > ... unlikely code goes here... Perhaps that was compiler version or config specific? $ make LLVM=1 -j128 defconfig fs/dcache.o $ llvm-objdump -d --no-show-raw-insn --disassemble-symbols=__d_lookup_rcu fs/dcache.o 0000000000003210 <__d_lookup_rcu>: 3210: endbr64 3214: pushq %rbp 3215: pushq %r15 3217: pushq %r14 3219: pushq %r12 321b: pushq %rbx 321c: testb $0x2, (%rdi) 321f: jne 0x32d7 <__d_lookup_rcu+0xc7> ... 32d7: popq %rbx 32d8: popq %r12 32da: popq %r14 32dc: popq %r15 32de: popq %rbp 32df: jmp 0x3300 <__d_lookup_rcu_op_compare> That looks like what you want, yeah? Your original report was from nearly 3 years ago; could have fixed a few instances of branch weights not getting propagated since then. What's going on in this case in this thread? I know we don't support hot/cold attributes on labels yet, but if static_branch_likely (or friends) is being used, we assign the indirect branches a 0% likeliness/branch-weight. > > and then the likely case ends up having unfortunate reloads inside the > hot loop. Possibly because it has one fewer free registers than gcc > because of the frame pointer. > > I didn't look into _why_ clang generates frame pointers but gcc > doesn't. It may be just a compiler default, I think we don't end up > explicitly asking either way. > > [ And then Bill replied with this ] > > It's not a no-op. We add branch probabilities to the IR, whether > they're honored or not depends on the transformation. But they > *should* be honored when available. I've seen in the past that instead > of moving unlikely blocks out of the way (like gcc, which moves them > below the function's "ret" instruction), LLVM will do something like > this: > > <normal code> > <jmp to loop conditional test> > <unlikely code> > <more likely code> > <loop conditional test> > <...> > > I.e. the loop is rotated and the unlikely code is first and the hotter > code is closer together but between the unlikely and conditional test. > Could this be what's going on? Otherwise, maybe clang decided that > it's not beneficial to move the code out-of-line because the benefit > was minimal? (I'm guessing here.) -- Thanks, ~Nick Desaulniers