Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 17, 2023 at 10:34 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers
> <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Side note: that's not something new or unusual. It's been the case
> > > since I started testing clang - we have several code-paths where we
> > > use "unlikely()" to try to get very unlikely cases to be out-of-line,
> > > and clang just mostly ignores it, or treats it as a very weak hint. I
> > > think the only way to get clang to treat it as a *strong* hint is to
> > > use PGO.
> >
> > I'd be surprised if that were intentional or by design.
> >
> > Do you guys have a bug report we could look at?
>
> Heh. I actually sent you an example long ago. Let me go fish it out of
> my mail archives and quote some of it below so that you can find it in
> yours..
>
>               Linus
>
> [ Time passes. Found this email to you and Bill Wendling from Feb 16,
> 2020, Message-ID
> CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@xxxxxxxxxxxxxx ]:
>
>   Anyway, I'm looking at clang code generation, and comparing it with
>   gcc on one of my "this has been optimized to hell and back" functions:
>   __d_lookup_rcu().
>
>   It looks like clang does frame pointers, and ignores our
>   likely/unlikely annotations.
>
>   So this code:
>
>                 if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) {
>                         int tlen;
>                         const char *tname;
>                         ......
>
>   doesn't actually jump out of line, but instead generates the unlikely
>   case as the fallthrough:
>
>         testb   $2, (%r12)
>         je      .LBB50_9
>         ... unlikely code goes here...


Perhaps that was compiler version or config specific?

$ make LLVM=1 -j128 defconfig fs/dcache.o
$ llvm-objdump -d  --no-show-raw-insn
--disassemble-symbols=__d_lookup_rcu fs/dcache.o
0000000000003210 <__d_lookup_rcu>:
    3210:      endbr64
    3214:      pushq %rbp
    3215:      pushq %r15
    3217:      pushq %r14
    3219:      pushq %r12
    321b:      pushq %rbx
    321c:      testb $0x2, (%rdi)
    321f:      jne 0x32d7 <__d_lookup_rcu+0xc7>
...
    32d7:      popq %rbx
    32d8:      popq %r12
    32da:      popq %r14
    32dc:      popq %r15
    32de:      popq %rbp
    32df:      jmp 0x3300 <__d_lookup_rcu_op_compare>

That looks like what you want, yeah?

Your original report was from nearly 3 years ago; could have fixed a
few instances of branch weights not getting propagated since then.

What's going on in this case in this thread? I know we don't support
hot/cold attributes on labels yet, but if static_branch_likely (or
friends) is being used, we assign the indirect branches a 0%
likeliness/branch-weight.

>
>   and then the likely case ends up having unfortunate reloads inside the
>   hot loop. Possibly because it has one fewer free registers than gcc
>   because of the frame pointer.
>
>   I didn't look into _why_ clang generates frame pointers but gcc
>   doesn't. It may be just a compiler default, I think we don't end up
>   explicitly asking either way.
>
> [ And then Bill replied with this ]
>
>   It's not a no-op. We add branch probabilities to the IR, whether
>   they're honored or not depends on the transformation. But they
>   *should* be honored when available. I've seen in the past that instead
>   of moving unlikely blocks out of the way (like gcc, which moves them
>   below the function's "ret" instruction), LLVM will do something like
>   this:
>
>     <normal code>
>     <jmp to loop conditional test>
>         <unlikely code>
>         <more likely code>
>     <loop conditional test>
>     <...>
>
>   I.e. the loop is rotated and the unlikely code is first and the hotter
>   code is closer together but between the unlikely and conditional test.
>   Could this be what's going on? Otherwise, maybe clang decided that
>   it's not beneficial to move the code out-of-line because the benefit
>   was minimal? (I'm guessing here.)



-- 
Thanks,
~Nick Desaulniers




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux