Re: On ARM atomics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 13 Nov 2012, Jon Masters wrote:

> Hi everyone,
> 
> I wrote the following little summary of ARM atomics:
> 
> http://www.jonmasters.org/blog/2012/11/13/arm-atomic-operations/
> 
> It's not terribly comprehensive, but you could use it to understand what
> I did in OpenMPI, and how to poke at other packages.

There are several serious mistakes in your assembly example.

|START_FUNC(opal_atomic_add_32)
|       push    {r4-r7}

Why r4 to r7?  You're using only r4 below.

|       mov     r4, r1
|       mov     r2, r0
|       mov     r12, lr

r12 is not a good choice since it is allowed for callees to trash 
it.  Didn't r12 almost kill you in your sleep already anyway?  :-)

A good thing to know is that r12 is also known as ip (which doesn't mean 
"instruction pointer" on ARM).  And the __kuser_cmpxchg is documented as 
clobbering ip aka r12.  The only reason why this might have worked for 
you is because the current implementation does not actually clobber r12, 
but that is not guaranteed to ever stay that way, especially when the 
documentation is explicit about it.

The best would have been to use "push {r4, lr}" initially.

|       ldr     r3, REFLSYM(5)
|       LSYM(4)
|       ldr     r0, [r2]
|       add     r1, r0, r4
|       blx     r3
|       cmp     r0, #0

The __kuser_cmpxchg documentation also defines the C flag as being set 
on success.  So it is possible to optimize this slightly by omiting the 
cmp here.

|       movne   r0, #1

Why?  You are going to reload the value to increment into r0 after the 
following instruction anyway.

|       bne     REFLSYM(4)

Calling into __kuser_cmpxchg also clobbers r3 so your "blx r3" will 
branch into lalaland the second time around.

|       mov     lr, r12
|       pop     {r4-r7}
|       bx      lr
|       LSYM(5)
|       .align  2
|       .word   -61504

You put the label before the .align statement.  If ever some padding is 
inserted to satisfy the requested alignment, the label will refer to the 
unaligned padding value and not to the desired value.

Also, what is the point of using -61504 when 0xffff0fc0 is much less 
obscur?

|END_FUNC(opal_atomic_add_32)

So... I'd suggest this version instead:

START_FUNC(opal_atomic_add_32)
        push    {r4, lr}
        mov     r4, r1
        mov     r2, r0
        LSYM(4)
        ldr     r0, [r2]
        ldr     r3, REFLSYM(5)
        add     r1, r0, r4
        blx     r3
	bcc     REFLSYM(4)
	pop     {r4, lr}
        bx      lr
        .align  2
        LSYM(5)
        .word   0xffff0fc0
END_FUNC(opal_atomic_add_32)


Nicolas
_______________________________________________
arm mailing list
arm@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/arm



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM (Vger)]     [Linux ARM]     [ARM Kernel]     [Fedora User Discussion]     [Older Fedora Users Discussion]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Maintainers]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Announce]     [Fedora Package Review]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Coolkey]     [Yum Users]     [Tux]     [Yosemite News]     [Linux Apps]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]

Powered by Linux