On Tue, 13 Nov 2012, Jon Masters wrote: > Hi everyone, > > I wrote the following little summary of ARM atomics: > > http://www.jonmasters.org/blog/2012/11/13/arm-atomic-operations/ > > It's not terribly comprehensive, but you could use it to understand what > I did in OpenMPI, and how to poke at other packages. There are several serious mistakes in your assembly example. |START_FUNC(opal_atomic_add_32) | push {r4-r7} Why r4 to r7? You're using only r4 below. | mov r4, r1 | mov r2, r0 | mov r12, lr r12 is not a good choice since it is allowed for callees to trash it. Didn't r12 almost kill you in your sleep already anyway? :-) A good thing to know is that r12 is also known as ip (which doesn't mean "instruction pointer" on ARM). And the __kuser_cmpxchg is documented as clobbering ip aka r12. The only reason why this might have worked for you is because the current implementation does not actually clobber r12, but that is not guaranteed to ever stay that way, especially when the documentation is explicit about it. The best would have been to use "push {r4, lr}" initially. | ldr r3, REFLSYM(5) | LSYM(4) | ldr r0, [r2] | add r1, r0, r4 | blx r3 | cmp r0, #0 The __kuser_cmpxchg documentation also defines the C flag as being set on success. So it is possible to optimize this slightly by omiting the cmp here. | movne r0, #1 Why? You are going to reload the value to increment into r0 after the following instruction anyway. | bne REFLSYM(4) Calling into __kuser_cmpxchg also clobbers r3 so your "blx r3" will branch into lalaland the second time around. | mov lr, r12 | pop {r4-r7} | bx lr | LSYM(5) | .align 2 | .word -61504 You put the label before the .align statement. If ever some padding is inserted to satisfy the requested alignment, the label will refer to the unaligned padding value and not to the desired value. Also, what is the point of using -61504 when 0xffff0fc0 is much less obscur? |END_FUNC(opal_atomic_add_32) So... I'd suggest this version instead: START_FUNC(opal_atomic_add_32) push {r4, lr} mov r4, r1 mov r2, r0 LSYM(4) ldr r0, [r2] ldr r3, REFLSYM(5) add r1, r0, r4 blx r3 bcc REFLSYM(4) pop {r4, lr} bx lr .align 2 LSYM(5) .word 0xffff0fc0 END_FUNC(opal_atomic_add_32) Nicolas _______________________________________________ arm mailing list arm@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/arm