> The question is "Are you OK with the existing ABI?" :-) No. As I understand it, r2 doesn't need to be clobbered because glibc doesn't currently clobber it. So, using it in the LWS code would cause an ABI break. That's one register back to userspace. I want to keep r19 and r27 for userspace so the PIC register doesn't have to be saved and restored in the asm (linux-atomic.c is compiled as PIC code). You can have r29. That leaves three free registers for the LWS code: r22, r23 and r29. The LWS ABI has r1, r20-r26 and r28-r31. Userspace has two call-clobbered registers free across the asm in PIC code, and three in non-PIC code. That's enough to efficiently perform the error comparisons. The asm would be more efficient if the registers used for lws_mem, lws_old and lws_new were not written to. This occurs only for the call in the 32-bit runtime with a 64-bit kernel. As it stands, the lws_mem, lws_old and lws_new arguments get reloaded every time around the EAGAIN loop. This is the crucial code in the compare and swap: /* The load and store could fail */ 1: ldw 0(%sr3,%r26), %r28 sub,<> %r28, %r25, %r0 2: stw %r24, 0(%sr3,%r26) The sub,<> instruction uses a 32-bit compare/subtract condition, so the clipping of r25 isn't necessary. Similarly, the stw instruction ignores the most significant 32-bits of r24. The value in r26 needs clipping but you have three free registers, and it looks like r1 is also free at this point in the code. You can deposit the least significant 32-bits of r26 into a field of zeros in another register in one instruction. It looks like lws_compare_and_swap64 and lws_compare_and_swap32 become more or less functionally identical. The above would become something like: #ifdef CONFIG_64BIT depd,z %r26,63,32,%r1 1: ldw 0(%sr3,%r1), %r28 sub,<> %r28, %r25, %r0 2: stw %r24, 0(%sr3,%r1) #else 1: ldw 0(%sr3,%r26), %r28 sub,<> %r28, %r25, %r0 2: stw %r24, 0(%sr3,%r26) #endif The argument clipping in the current code would be removed. As a result, the branch to lws_compare_and_swap can be eliminated in the 64-bit path. It's my impression that the tightness of the loop for the compare/exchange operation is important. Dave -- J. David Anglin dave.anglin@xxxxxxxxxxxxxx National Research Council of Canada (613) 990-0752 (FAX: 952-6602) -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html