Re: Atomic accesses on ARM microcontrollers

David Brown <david.brown@xxxxxxxxxxxx> · Sun, 11 Oct 2020 14:16:07 +0200

On 10/10/2020 22:05, Toby Douglass wrote:
> On 10/10/2020 21:43, David Brown wrote:
>> On 09/10/2020 23:35, Toby Douglass wrote:
>>> On 09/10/2020 20:28, David Brown wrote:
> 
>>> I would like - but cannot - reply to the list, as their email server
>>> does not handle encrypted email.
>>
>> I've put the help list on the cc to my reply - I assume that's okay for
>> you.
> 
> Yes.
> 
>> (Your email to me was not encrypted, unless I am missing something.)
> 
> I mean TLS for SMTP, as opposed to say PGP.
> 

Ah, you have your own mail server that sends directly to the receiving
server?  I always set up my mail servers to send via my ISP's server (a
"smarthost" in Debian setup terms).  That makes this kind of thing an SEP.

>>>> I work primarily with microcontrollers, with 32-bit ARM Cortex-M
>>>> devices
>>>> being the most common these days.  I've been trying out atomics in gcc,
>>>> and I find it badly lacking.
>>>
>>> The 4.1.2 atomics or the later, replacement API?
>>
>> I am not sure what you mean here, or what "4.1.2" refers to - it doesn't
>> match either the gcc manual or the C standards as far as I can see.
> 
> GCC introduced its first API for atomics in version 4.1.2, these guys;
> 

Jonathan Wakely explained the reference.  I've read the manuals for a
/lot/ of gcc versions over the years, but I don't have all the details
in my head!

> https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
> 
> Then in a later version, which I can't remember offhand, a second and
> much evolved version of the API was introduced.
> 

Yes.

>> However, "atomic" also has a simpler, more fundamental and clearer
>> meaning with a wider applicability - it means an operation that cannot
>> be divided (or at least, cannot be /observed/ to be divided).  This is
>> the meaning that is important to me here.
> 
> Ah and you mentioned atomically writing larger objects, so we're past
> just caring about say word tearing.
> 

Yes.  Sizes up to 4 bytes can be accessed atomically on this processor
using "normal" operations, and 8 byte accesses are atomic if specific
instructions are used.  (gcc generates these for non-atomic accesses.)
I am hoping to be able to put together a solution for using standard
C11/C++11 atomic types of any size and know that these actually work
correctly.  It is not essential - I can make my own types, functions,
etc., and use them as needed.  But it would be nice and convenient to be
able to use the standard types and functions.

>From Jonathan's replies, it seems I can simply make my own libatomic
implementations and use them.

>> What it means is that if thread A stores a value in the
>> atomic variable ax, and thread B attempts to read the value in ax, then
>> B will read either the entire old value before the write, or the entire
>> new value after the write - it will never read an inconsistent partial
>> write.
> 
> I could be wrong, but I think the only way you can do this with atomics
> is copy-on-write.  Make a new copy of the data, and use an atomic to
> flip a pointer, so the readers move atomically from the old version to
> the new version.

I've been thinking a bit more about this, inspired by your post here.
And I believe you are correct - neither ldrex/strex nor load/store
double register is sufficient for 64-bit atomic accesses on the 32-bit
ARM, even for plain reads and writes.  That's annoying - I had thought
the double register read/writes were enough.  But if the store double
register is interruptible with a restart (and I can't find official
documentation on the matter for the Cortex-M7), then an interrupted
store could lead to an inconsistent read by the interrupting code.

I guess I am back to the good old "disable interrupts" solution so
popular in the microcontroller world.  That always works.

> 
>>>> These microcontrollers are all single core, so memory ordering does not
>>>> matter.
>>>
>>> I am not sure this is true.  A single thread must make the world appear
>>> as if events occur in the order specified in the source code, but I bet
>>> you this already not true for interrupts.
>>
>> It is true even for interrupts.
> 
> [snip]
> 
> Thankyou for the insights.  I've done hardly any bare-metal work, so I'm
> not familiar with the actual practicalities of interrupts and their
> effect in these matters.
> 
>>>    It may be for example they can be
>>> re-ordered with regard to each other, and this is not being prevented.
>>
>> Do you mean the kind of re-ordering the compiler does for code?
> 
> I was thinking here of the processor.
> 
>> That is
>> not in question here - at least, not to me.  I know what kinds of
>> reorders are done, and how to prevent them if necessary.  (On a single
>> core, "volatile" is all you need - though there are more efficient ways.
> 
> I'm not sure about that.  I'd need to revisit the subject though to
> rebuild my knowledge, so I can't make any assertion here - only that I
> know I don't know one way or the other.
> 

One thing we can all be sure about - this stuff is difficult, it needs a
/lot/ of thought, and the documentation is often poor on the critical
details.

>> And while the cpu and memory system can include write store buffers,
>> caches, etc., that can affect the order of data hitting the memory,
>> these are not an issue in a single core system.  (They /are/ important
>> for multi-core systems.)
> 
> Yes, I think so too, but to be clear we mean single physical and single
> logical core; no hyperthreading.
> 

Yes, absolutely.

>>> Also, I still don't quite think there *are* atomic loads/stores as such
>>> - although having said that I'm now remembering the LOCK prefix on
>>> Intel, which might be usable with a load.  That would then lock the
>>> cache line and load - but, ah yes, it doesn't *mean* anything to
>>> atomically load.  The very next micro-second you value could be replaced
>>> a new write.
>>
>> Replacing values is not an issue.  The important part is the atomicity
>> of the action.  When thread A reads variable ax, it doesn't matter if
>> thread B (or an interrupt, or whatever) has changed ax just before the
>> read, or just after the read - it matters that it cannot change it
>> /during/ the read.  The key is /consistent/ values, not most up-to-date
>> values.
> 
> Yes.  I can see this from your earlier explanation regarding what you're
> looking for with atomic writes.
> 
>> I had a look through the github sources, but could not find anything
>> relevant.  But obviously that library has a lot more code and features
>> than I am looking for.
> 
> I was only thinking of a single header file which contains the atomics
> for ARM32.  However, it's not useful to you for what you're looking for
> with atomic writes.
> 

Thank you anyway - and thank you for making me think a little more,
correcting a mistake I made!