On 11/13/2012 01:24 AM, Nicolas Pitre wrote: > There are several serious mistakes in your assembly example. Thanks for the feedback! Appreciated. Comments below... > |START_FUNC(opal_atomic_add_32) > | push {r4-r7} > > Why r4 to r7? You're using only r4 below. Ah. Because in the rest of the OpenMPI code they're doing similar on function entry (saving all 4 of the variable registers). You're right that I only use one of those registers so I could have not bothered to save all 4 of them. Similar reasoning applies to the way I referred to the kernel address for the helpers using a negative number, but you're right that I can fix that too. So let's do that while doing the below. > | mov r4, r1 > | mov r2, r0 > | mov r12, lr > > r12 is not a good choice since it is allowed for callees to trash > it. Didn't r12 almost kill you in your sleep already anyway? :-) In this particular case it's not with the current kernel implementation, but you're right. I even knew that, and yet did it anyway, which was bad, sorry. I'll fix that and send an updated patch out, based on your example, which is more efficient. > Calling into __kuser_cmpxchg also clobbers r3 so your "blx r3" will > branch into lalaland the second time around. This and the r12 use are the ones that bother me the most as they're actual bugs. This will teach me to do this at 4am. > So... I'd suggest this version instead: > > START_FUNC(opal_atomic_add_32) > push {r4, lr} > mov r4, r1 > mov r2, r0 > LSYM(4) > ldr r0, [r2] > ldr r3, REFLSYM(5) > add r1, r0, r4 > blx r3 > bcc REFLSYM(4) > pop {r4, lr} > bx lr > .align 2 > LSYM(5) > .word 0xffff0fc0 > END_FUNC(opal_atomic_add_32) Thanks. I'll fix the writeup, and send out an updated patch. Jon. _______________________________________________ arm mailing list arm@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/arm