Re: [WIP 0/3] Memory model and atomic API in Rust

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Sun, 24 Mar 2024 11:22:41 -0400

On Sat, Mar 23, 2024 at 05:40:23PM -0400, comex wrote:
> That may be true, but the LLVM issue you cited isn’t a good example.  
> In that issue, the function being miscompiled doesn’t actually use any 
> barriers or atomics itself; only the scaffolding around it does.  The 
> same issue would happen even if the scaffolding used LKMM atomics.
> 
> For anyone curious: The problematic optimization involves an 
> allocation (‘p’) that is initially private to the function, but is 
> returned at the end of the function.  LLVM moves a non-atomic store to 
> that allocation across an external function call (to ‘foo’).  This 
> reordering would be blatantly invalid if any other code could observe 
> the contents of the allocation, but is valid if the allocation is 
> private to the function.  LLVM assumes the latter: after all, the 
> pointer to it hasn’t escaped.  Yet.  Except that in a weak memory 
> model, the escape can ‘time travel’...

It's hard to understand exactly what you mean, but consider the 
following example:

int *globalptr;
int x;

int *f() {
	int *p = kzalloc(sizeof(int));

	L1: *p = 1;
	L2: foo();
	return p;
}

void foo() {
	smp_store_release(&x, 2);
}

void thread0() {
	WRITE_ONCE(globalptr, f());
}

void thread1() {
	int m, n;
	int *q;

	m = smp_load_acquire(&x);
	q = READ_ONCE(globalptr);
	if (m && q)
		n = *q;
}

(If you like, pretend each of these function definitions lives in a 
different source file -- it doesn't matter.)

With no optimization, whenever thread1() reads *q it will always obtain 
1, thanks to the store-release in foo() and the load-acquire() in 
thread1().  But if the compiler swaps L1 and L2 in f() then this is not 
guaranteed.  On a weakly ordered architecture, thread1() could then get 
0 from *q.

I don't know if this is what you meant by "in a weak memory model, the 
escape can ‘time travel'".  Regardless, it seems very clear that any 
compiler which swaps L1 and L2 in f() has a genuine bug.

Alan Stern