Re: [PATCH 00/13] [RFC] Rust support

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Mon, 19 Apr 2021 11:02:12 +0200

On 19/04/21 10:26, Peter Zijlstra wrote:
On Mon, Apr 19, 2021 at 09:53:06AM +0200, Paolo Bonzini wrote:
On 19/04/21 09:32, Peter Zijlstra wrote:
On Sat, Apr 17, 2021 at 04:51:58PM +0200, Paolo Bonzini wrote:
On 16/04/21 09:09, Peter Zijlstra wrote:
Well, the obvious example would be seqlocks. C11 can't do them

Sure it can.  C11 requires annotating with (the equivalent of) READ_ONCE all
reads of seqlock-protected fields, but the memory model supports seqlocks
just fine.

How does that help?

IIRC there's two problems, one on each side the lock. On the write side
we have:

	seq++;
	smp_wmb();
	X = r;
	Y = r;
	smp_wmb();
	seq++;

Which C11 simply cannot do right because it does't have wmb.

It has atomic_thread_fence(memory_order_release), and
atomic_thread_fence(memory_order_acquire) on the read side.

https://godbolt.org/z/85xoPxeE5

void writer(void)
{
     atomic_store_explicit(&seq, seq+1, memory_order_relaxed);
     atomic_thread_fence(memory_order_acquire);

This needs to be memory_order_release.  The only change in the resulting 
assembly is that "dmb ishld" becomes "dmb ish", which is not as good as 
the "dmb ishst" you get from smp_wmb() but not buggy either.

The read side can use "dmb ishld" so it gets the same code as Linux.

LWN needs a "C11 memory model for kernel folks" article.  In the 
meanwhile there is 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html 
which is the opposite (Linux kernel memory model for C11 folks).

Paolo

     X = 1;
     Y = 2;

     atomic_store_explicit(&seq, seq+1, memory_order_release);
}

gives:

writer:
         adrp    x1, .LANCHOR0
         add     x0, x1, :lo12:.LANCHOR0
         ldr     w2, [x1, #:lo12:.LANCHOR0]
         add     w2, w2, 1
         str     w2, [x0]
         dmb     ishld
         ldr     w1, [x1, #:lo12:.LANCHOR0]
         mov     w3, 1
         mov     w2, 2
         stp     w3, w2, [x0, 4]
         add     w1, w1, w3
         stlr    w1, [x0]
         ret

Which, afaict, is completely buggered. What it seems to be doing is
turning the seq load into a load-acquire, but what we really need is to
make sure the seq store (increment) is ordered before the other stores.