Re: Atomic accesses on ARM microcontrollers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thanks for trying to help here, though I think perhaps we are talking
slightly at cross-purposes.

On 09/10/2020 23:35, Toby Douglass wrote:
> On 09/10/2020 20:28, David Brown wrote:
> 
> Hi, David.
> 
> I would like - but cannot - reply to the list, as their email server
> does not handle encrypted email.

I've put the help list on the cc to my reply - I assume that's okay for
you.  (Your email to me was not encrypted, unless I am missing something.)

> 
>> I work primarily with microcontrollers, with 32-bit ARM Cortex-M devices
>> being the most common these days.  I've been trying out atomics in gcc,
>> and I find it badly lacking.
> 
> The 4.1.2 atomics or the later, replacement API?
> 

I am not sure what you mean here, or what "4.1.2" refers to - it doesn't
match either the gcc manual or the C standards as far as I can see.

>> (I've tried C11 <stdatomic.h>, C++11
>> <atomic>, and the gcc builtins - they all generate the same results,
>> which is to be expected.)  I'm concentrating on plain loads and stores
>> at the moment, not other atomic operations.
> 
> Now, it's been about two years since I was working on this stuff, so I
> may well be wrong, but I recall there's no such thing as an actual,
> simple, atomic load or store.
> 
> You can issue a load, or a store, and you can control the order in which
> events occur around it, and you can also force the load or store to
> complete by issuing a later operation which forces the load or store to
> be completed - so there's not an actual, direct, "atomic load" or
> "atomic store".

Yes, I know that atomics are used like this to correlate operations
between different threads and ensure specific orders.  And they are
vital for that purpose.

However, "atomic" also has a simpler, more fundamental and clearer
meaning with a wider applicability - it means an operation that cannot
be divided (or at least, cannot be /observed/ to be divided).  This is
the meaning that is important to me here.  And yes, you /can/ describe
this in terms of loads and stores without any reference to ordering or
other aspects.  What it means is that if thread A stores a value in the
atomic variable ax, and thread B attempts to read the value in ax, then
B will read either the entire old value before the write, or the entire
new value after the write - it will never read an inconsistent partial
write.

Other atomic operations require atomic read-modify-write semantics, or
require ordering of operations on different objects.  But for many uses,
simple atomic loads and stores is enough.

> 
>> These microcontrollers are all single core, so memory ordering does not
>> matter.
> 
> I am not sure this is true.  A single thread must make the world appear
> as if events occur in the order specified in the source code, but I bet
> you this already not true for interrupts.
> 

It is true even for interrupts.

In any single processor core, regardless of any re-ordering done by the
cpu, the operations will be carried out logically in the order they are
given.  Any write operation followed by a read operation (to the same
address) will be result in the read giving the value written.

This is not necessarily true for different cores (including virtual
cores on SMT systems) - ensuring that each core has a synchronised view
of the other core's write buffers, instruction re-ordering, etc., would
severely limit performance.  That's why you need memory ordering atomics
on multi-core systems, but not on single-core systems.

(Even on a single core, there can be other memory masters such as DMA
that complicate orderings - but that's a different matter, and handled
in a different manner.  C11/C++11 atomics are neither necessary nor
sufficient for non-cpu memory masters.)

Interrupts, with few exceptions, come either before or after an
instruction has executed.  (Some cpus support interruptible and
resumable instructions - for the Cortex M, that applies to load/store
multiple registers.  Some support restartable instructions - for the
Cortex M, that includes division and load/store double register.)  The
observable behaviour of an interrupt is basically like inserting a "call
to subroutine" instruction in the middle of the normal logical
instruction stream.


>> For 8-bit, 16-bit and 32-bit types, atomic accesses are just simple
>> loads and stores.  These are generated fine.
> 
> I wonder if they really are.

They are.

>  It may be for example they can be
> re-ordered with regard to each other, and this is not being prevented. 

Do you mean the kind of re-ordering the compiler does for code?  That is
not in question here - at least, not to me.  I know what kinds of
reorders are done, and how to prevent them if necessary.  (On a single
core, "volatile" is all you need - though there are more efficient ways.
 One of the reasons for wanting to use C11/C++11 atomics is to be able
to control order as I want.)  But as I said earlier, I am concerned here
primarily with the atomicity of the accesses, not their order.

And while the cpu and memory system can include write store buffers,
caches, etc., that can affect the order of data hitting the memory,
these are not an issue in a single core system.  (They /are/ important
for multi-core systems.)

> Also, I still don't quite think there *are* atomic loads/stores as such
> - although having said that I'm now remembering the LOCK prefix on
> Intel, which might be usable with a load.  That would then lock the
> cache line and load - but, ah yes, it doesn't *mean* anything to
> atomically load.  The very next micro-second you value could be replaced
> a new write.

Replacing values is not an issue.  The important part is the atomicity
of the action.  When thread A reads variable ax, it doesn't matter if
thread B (or an interrupt, or whatever) has changed ax just before the
read, or just after the read - it matters that it cannot change it
/during/ the read.  The key is /consistent/ values, not most up-to-date
values.

> 
>> But for 64-bit and above, there are library calls to a compiler-provided
>> library.
> 
> Oh ho ho ho yes.  This is why I had to roll my own.  When the processor
> doesn't do what the API offers, rather than say no, a *NON LOCK FREE
> ALTERNATIVE IS USED* - and this is WRONG.
> 
>> For larger types, the situation is far, far worse.  Not only is the
>> library code inefficient on these devices (disabling and re-enabling
>> global interrupts is the optimal solution in most cases, with load/store
>> with reservation being a second option), but it is /wrong/.  The library
>> uses spin locks (AFAICS) - on a single core system, that generally means
>> deadlocking the processor.  That is worse than useless.
>>
>> Is there any way I can replace this library with my own code here, while
>> still using the language atomics?
> 
> Sounds terrifying.
> 
> Have a look here;
> 
> https://www.liblfds.org
> 
> Download the latest version, and have a look at the atomic abstraction
> header for ARM32.  It may have what you need.

I had a look through the github sources, but could not find anything
relevant.  But obviously that library has a lot more code and features
than I am looking for.

To be clear here, I am not looking for lock-free data structures.  I am
looking for simple atomic accesses.  And I am happy to implement these
myself.  For 64-bit types, it's little more than a single line of inline
assembly (and even that is only to guarantee the code that the compiler
is likely to generate automatically, given the right source code).  For
bigger types, it's load/store with reservation instructions or disabling
and enabling interrupts.

Thanks,

David



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux