Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers

Boqun Feng <boqun.feng@xxxxxxxxx> · Thu, 26 Sep 2024 21:38:36 -0700

On Fri, Sep 27, 2024 at 03:20:40AM +0200, Mathieu Desnoyers wrote:
> On 2024-09-26 18:12, Linus Torvalds wrote:
> > On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser
> > <jonas.oberhauser@xxxxxxxxxxxxxxx> wrote:
> > > 
> > > No, the issue introduced by the compiler optimization (or by your
> > > original patch) is that the CPU can speculatively load from the first
> > > pointer as soon as it has completed the load of that pointer:
> > 
> > You mean the compiler can do it. The inline asm has no impact on what
> > the CPU does. The conditional isn't a barrier for the actual hardware.
> > But once the compiler doesn't try to do it, the data dependency on the
> > address does end up being an ordering constraint on the hardware too
> > (I'm happy to say that I haven't heard from the crazies that want
> > value prediction in a long time).
> > 
> > Just use a barrier.  Or make sure to use the proper ordered memory
> > accesses when possible. Don't use an inline asm for the compare - we
> > don't even have anything insane like that as a portable helper, and we
> > shouldn't have it.
> 
> How does the compiler barrier help in any way here ?
> 
> I am concerned about the compiler SSA GVN (Global Value Numbering)
> optimizations, and I don't think a compiler barrier solves anything.
> (or I'm missing something obvious)

I think you're right, a compiler barrier doesn't help here:

	head = READ_ONCE(p);
	smp_mb();
	WRITE_ONCE(*slot, head);

	ptr = READ_ONCE(p);
	if (ptr != head) {
		...
	} else {
		barrier();
		return ptr;
	}

compilers can replace 'ptr' with 'head' because of the equality, and
even putting barrier() here cannot prevent compiler to rewrite the else
branch into:

	else {
		barrier();
		return head;
	}

because that's just using a different register, unrelated to memory
accesses.

Jonas, am I missing something subtle? Or this is different than what you
proposed?

Regards,
Boqun

> 
> I was concerned about the suggestion from Jonas to use "node2"
> rather than "node" after the equality check as a way to ensure
> the intended register is used to return the pointer, because after
> the SSA GVN optimization pass, AFAIU this won't help in any way.
> I have a set of examples below that show gcc use the result of the
> first load, and clang use the result of the second load (on
> both x86-64 and aarch64). Likewise when a load-acquire is used as
> second load, which I find odd. Hopefully mixing this optimization
> from gcc with speculation still abide by the memory model.
> 
> Only the asm goto approach ensures that gcc uses the result from
> the second load.
> 
[...]