Re: [PATCH] alpha: simplify and optimize sched_find_first_bit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 8, 2010 at 3:15 PM, Richard Henderson <rth@xxxxxxxxxxx> wrote:
> On 04/08/2010 11:34 AM, mattst88@xxxxxxxxx wrote:
>> +     asm(
>> +             "cmoveq %0,64,%1        # ofs = (b[0] ? ofs : 64);\n"
>> +             "cmoveq %0,%2,%0        # temp = (b[0] ? b[0] : b[1]);\n"
>> +             "cttz   %0,%0           # output = cttz(temp);\n "
>> +             : "=r" (output), "=r" (ofs)
>> +             : "r" (b[1]), "0" (b[0]), "1" (0)
>
> I must say I'd also prefer a comment like
>
>        /* This is equivalent to
>                ofs = (b[0] ? 0 : 64);
>                tmp = (b[0] ? b[0] : b[1]);
>           but is a bit faster than what GCC would produce on its own.  */
>        asm("cmoveq %0,64,%1\n\tcmoveq %0,%2,%0"
>            : "=r"(output), "=r"(ofs)
>            : "r"(b[1]), "0"(b[0]), "1"(0));
>
> ... except that I can't see that it is, at least for mainline gcc.
>
> [anchor:~] cat z.c
> long foo(const unsigned long *b)
> {
>  unsigned long b0, b1, ofs, tmp;
>
>  b0 = b[0];
>  b1 = b[1];
>  ofs = (b0 ? 0 : 64);
>  tmp = (b0 ? b0 : b1);
>
>  /* tmp = __ffs(tmp); -- elided for clarity wrt ev5 vs ev67 */
>  return tmp + ofs;
> }
>
> -mcpu=ev5 -Os (to avoid nop padding):
>        ldq $2,0($16)
>        ldq $1,8($16)
>        lda $0,64($31)
>        cmovne $2,0,$0
>        cmovne $2,$2,$1
>        addq $0,$1,$0
>
> -mcpu=ev6 -Os:
>        ldq $0,0($16)
>        ldq $1,8($16)
>        cmovne $0,$0,$1
>        cmpeq $0,0,$0
>        sll $0,6,$0
>        addq $0,$1,$0
>
> I seem to recall that cmov is slightly more expensive on ev6,
> so gcc doesn't prefer it and came up with an equivalent using
> cmpeq+sll.
>
> If some previous version of gcc isn't so smart, I'm ok with
> continuing to use the asm.
>
>
> r~
>

So with your test program, the code generation results are:

4.3.4 -Os: good
4.3.4 -O1: bad
4.3.4 -O2: good
4.3.4 -O3: good

4.4.3 -Os: bad
4.4.3 -O1: bad
4.4.3 -O2: bad
4.4.3 -O3: good

4.5.0 -Os: good
4.5.0 -O1: bad
4.5.0 -O2: good
4.5.0 -O3: good

 o -O3 is produces good code in all versions.
 o -O1 is bad in all versions.
 o -Os and -O2 regressed from 4.3.4 to 4.4.3,
   but are back to 4.3.4 quality as of 4.5.0.

All produced cmov instructions just as you said.

My patch doesn't help any of the bad cases and even causes some that
were good to produce worse code, so it's not useful. Does any of this
look like it should warrant a gcc bug report, Richard?

I'll send a patch just to update sched_find_first_bit to search just
the first 100-bits.

Thanks!
Matt

Attachment: test
Description: Binary data


[Index of Archives]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux