Re: [PATCH] Improve atomic.h implementation robustness

"Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx> · Wed, 1 Dec 2004 21:50:45 +0000 (GMT)

On Wed, 1 Dec 2004, Ralf Baechle wrote:

> this problem here is specific to inline assembler.  The splitlock code for
> a reasonable CPU is:
> 
> static __inline__ void atomic_add(int i, atomic_t * v)
> {
>         unsigned long temp;
> 
>         __asm__ __volatile__(
>         "1:     ll      %0, %1          # atomic_add            \n"
>         "       addu    %0, %2                                  \n"
>         "       sc      %0, %1                                  \n"
>         "       beqz    %0, 1b                                  \n"
>         : "=&r" (temp), "=m" (v->counter)
>         : "Ir" (i), "m" (v->counter));
> }
> 
> For the average atomic op generated code is going to look about like:
> 
> 80100634:       lui     a0,0x802c
> 80100638:       ll      a0,-24160(a0)
> 8010063c:       addu    a0,a0,v0
> 80100640:       lui     at,0x802c
> 80100644:       addu    at,at,v1
> 80100648:       sc      a0,-24160(at)
> 8010064c:       beqz    a0,80100634 <init+0x194>
> 80100650:       nop
> 
> It's significantly worse for 64-bit due to the excessive code sequence
> generated for loading a 64-bit address.  One outside CKSEGx that is.

 Only for old compilers.  For current (>= 3.4) ones you can use the "R"  
constraint and get exactly what you need.  Rewriting inline asms to use
"R" for GCC >= 3.4 has actually been on my to-do list for some time;  
predating the current working implementation even.

> On 32-bit Thiemo's patch would cut that down to something like:
> 
> 80100630:       lui     t0,0x802c
> 80100634:       addiu	t0,t0,-24160
> 80100638:       ll      a0,0(t0)
> 8010063c:       addu    a0,a0,v0
> 80100648:       sc      a0,0(to)
> 8010064c:       beqz    a0,80100638 <init+0x194>
> 80100650:       nop

 Plus it clobbers memory requiring a writeback and a refetch of all
unrelated variables that have happened to be cached in registers.

> On 64-bit the savings would be even more significant.  But what we actually
> want would be using the "o" constraint.  Which just at least on the
> compilers where I've tried it, didn't produce code any different from "m".

 No surprise as the "o" constraint doesn't mean anything particular for
MIPS.  All addresses are offsettable -- there is no addressing mode that
would preclude it, so "o" is exactly the same as "m".

> The expected code would be something like:
> 
> 80100634:       lui     t0,0x802c
> 80100638:       ll      a0,-24160(t0)
> 8010063c:       addu    a0,a0,v0
> 80100648:       sc      a0,-24160(to)
> 8010064c:       beqz    a0,80100634 <init+0x194>
> 80100650:       nop
> 
> So another instruction less.

 That's exactly what's emitted with "R".  Should I accelerate my work on
it?  It's nothing that would require a lot of effort -- it's more boring 
than challenging.

  Maciej