The 01/04/2022 15:12, Florian Weimer via Libc-alpha wrote: > I've tried to implement a software transactional memory algorithm > > The characteristics are: > > * Single writer, multiple readers. > > * Two copies of the data. > > * A 64-bit version counter tracks modifications. The lowest bit of the > counter tells the reader which copy of the data to read. > > * The writer increments the counter by 2, modifies the STM-protected > data, and increments the counter by 1. > > * The reader loads the counter, reads the STM-protected data, loads the > counter for a second time, and retries if the counter does not match. > > I've attached a model implementation. The glibc implementation has a > wrapper around the counter updates to support 32-bit implementations as > well. In both implementations, the writer uses release MO stores for > the version update, and the reader uses acquire MO loads. The stores > and loads of the STM data itself are unordered (not even volatile). > > It turns out that this does not work: In the reader, loads of the > STM-protected data can be reordered past the final acquire MO load of > the STM version. As a result, the reader can observe incoherent data. > In practice, I only saw this on powerpc64le, where the *compiler* > performed the reordering: > > _dl_find_object miscompilation on powerpc64le > <https://sourceware.org/bugzilla/show_bug.cgi?id=28745> > > Emprically, my stm.c model does not exhibit this behavior. > > To fix this, I think it is sufficient to add a release fence just before > the second version load in the reader. However, from a C memory model > perspective, I don't quite see what this fence would synchronize > with. 8-/ And of course once there is one concurrency bug, there might > be others as well. Do I need to change the writer to use > acquire-release MO for the version updates? > > I think there should be a canned recipe for this scenario (single > writer, two data copies), but I couldn't find one. this reminds me the discussion in section 4 and 5 of https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf there are no 2 data copies here but i think the reasoning about the synchronization may apply.