On Wed, 1 Dec 2004, Ralf Baechle wrote: > this problem here is specific to inline assembler. The splitlock code for > a reasonable CPU is: > > static __inline__ void atomic_add(int i, atomic_t * v) > { > unsigned long temp; > > __asm__ __volatile__( > "1: ll %0, %1 # atomic_add \n" > " addu %0, %2 \n" > " sc %0, %1 \n" > " beqz %0, 1b \n" > : "=&r" (temp), "=m" (v->counter) > : "Ir" (i), "m" (v->counter)); > } > > For the average atomic op generated code is going to look about like: > > 80100634: lui a0,0x802c > 80100638: ll a0,-24160(a0) > 8010063c: addu a0,a0,v0 > 80100640: lui at,0x802c > 80100644: addu at,at,v1 > 80100648: sc a0,-24160(at) > 8010064c: beqz a0,80100634 <init+0x194> > 80100650: nop > > It's significantly worse for 64-bit due to the excessive code sequence > generated for loading a 64-bit address. One outside CKSEGx that is. Only for old compilers. For current (>= 3.4) ones you can use the "R" constraint and get exactly what you need. Rewriting inline asms to use "R" for GCC >= 3.4 has actually been on my to-do list for some time; predating the current working implementation even. > On 32-bit Thiemo's patch would cut that down to something like: > > 80100630: lui t0,0x802c > 80100634: addiu t0,t0,-24160 > 80100638: ll a0,0(t0) > 8010063c: addu a0,a0,v0 > 80100648: sc a0,0(to) > 8010064c: beqz a0,80100638 <init+0x194> > 80100650: nop Plus it clobbers memory requiring a writeback and a refetch of all unrelated variables that have happened to be cached in registers. > On 64-bit the savings would be even more significant. But what we actually > want would be using the "o" constraint. Which just at least on the > compilers where I've tried it, didn't produce code any different from "m". No surprise as the "o" constraint doesn't mean anything particular for MIPS. All addresses are offsettable -- there is no addressing mode that would preclude it, so "o" is exactly the same as "m". > The expected code would be something like: > > 80100634: lui t0,0x802c > 80100638: ll a0,-24160(t0) > 8010063c: addu a0,a0,v0 > 80100648: sc a0,-24160(to) > 8010064c: beqz a0,80100634 <init+0x194> > 80100650: nop > > So another instruction less. That's exactly what's emitted with "R". Should I accelerate my work on it? It's nothing that would require a lot of effort -- it's more boring than challenging. Maciej