Atomic operations on ARM

lepbtetfmvbz@xxxxxxxxxxxxx (Jyri Sarha) · Fri, 1 Feb 2008 15:22:14 +0200 (EET)

Hi,

I have written inline assembler implementations of pa_atomic
operations for arm for ARM6 and above. For compatibility with older
ARMs I have also written versions using ARM-Linux kernel helper
functions (see http://0pointer.de/blog#atomic-rt). 

The both implementations run almost perfectly now. However there is a
thing to note about compare and exchange implementations for 
ARM6 and above. The semantics of the usual ARM ldrex-strexeq instruction 
sequence is not identical to x86 implementations of the same thing, e.g.
the exchange is not totally atomic after all.

The strexeq-instruction has two conditions the equality to the old value 
and the exclusiveness of the operation (e.g. if the value in memory was 
tampered between the operations). The operation fails if either of
these conditions fail, e.g. the value in memory is unchanged. So it is 
possible that the old-value-condition is met, but the exclusiveness- 
condition fails, but even the tampered memory value would meet the 
old-value-condition.

The above applies also to kernel helper implementation of atomic exchange 
for ARM6 and above.

Because of the above problem (I suspect) this assertion in pulsecore/async.c 
fails sometimes under heavy load:
    /* Guaranteed to succeed if we only have a single reader */
    pa_assert_se(pa_atomic_ptr_cmpxchg(&cells[idx], ret, NULL));

The assertion failure has happened with the both kernel helper and inline asm 
versions (they are identical in ARM6 environment anyway). The failures
are not very common thou.

The atomic compare and exchange can also be written in a way that it retries
the operation if the exclusiveness-condition fails but the equality-condition
was ok, which would resemble real atomicity more. The inline assembler version
would then look like this:

static inline int pa_atomic_cmpxchg(pa_atomic_t *a, int old_i, int new_i) {
    unsigned long not_equal, not_exclusive;

    pa_memory_barrier();
    do {
        __asm__ __volatile__("@ pa_atomic_cmpxchg\n"
                             "1: ldrex  %0, [%2]\n"
                             "   subs   %0, %0, %3\n"
                             "   mov    %1, %0\n"
                             "   strexeq %0, %4, [%2]\n"
                             : "=&r" (not_exclusive), "=&r" (not_equal)
                             : "r" (&a->value), "Ir" (old_i), "r" (new_i)
                             : "cc");
    } while(not_exclusive && !not_equal);
    pa_memory_barrier();

    return !not_equal;
}

A similar kind of external loop can also be added to kernel helper
function, but if the kernel helper in fact makes a systemcall it is
unnecessary. I wonder if all this is worth the trouble. 

So what should be done?

1. Change the above line in pulsecore/async.c to use pa_atomic_store
instead and try to look if there are other similar places.

2. Write loops like above to ARM specific implementations atomic
compare and exchange.

Any way I'll produce a proper ARM atomic ops patch as soon as I am happy 
with it. However it may take a while because I am still only learning
the autoconf magic and I have some other tasks I should take care of too.

Cheers,
       Jyri

// Jyri Sarha -- my.name at nokia.com