On 2024-09-27 18:44, Linus Torvalds wrote:
On Thu, 26 Sept 2024 at 18:38, Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
Note that ADDRESS_EQ() only hide first parameter, so this should be ADDRESS_EQ(b, a).
Yeah, please stop making things unnecessarily complicated.
Just use a barrier(). Please. Stop these stupid games until you can
show why it matters.
The barrier() is ineffective at fixing the issue.
It does not prevent the compiler CSE from losing the
address dependency:
int fct_2_volatile_barriers(void)
{
int *a, *b;
do {
a = READ_ONCE(p);
asm volatile ("" : : : "memory");
b = READ_ONCE(p);
} while (a != b);
asm volatile ("" : : : "memory"); <----- where you would like your barrier
return *b;
}
With gcc 14.2 (arm64):
fct_2_volatile_barriers:
adrp x0, .LANCHOR0
add x0, x0, :lo12:.LANCHOR0
.L2:
ldr x1, [x0] <------ x1 populated by first load.
ldr x2, [x0]
cmp x1, x2
bne .L2
ldr w0, [x1] <------ x1 is used for access which should depend on b.
ret
On weakly-ordered architectures, this lets CPU speculation
use the result from the first load to speculate
"ldr w0, [x1]" before "ldr x2, [x0]".
Based on the RCU documentation, the control dependency
does not prevent the CPU from speculating loads.
There are a few ways to fix this:
- Compile everything with -fno-cse-follow-jumps.
- Make the compiler unaware of the relationship between the
address equality and address-dependent use of b. This can
be done either using ADDRESS_EQ() or asm goto.
I prefer ADDRESS_EQ() because it is arch-agnostic. I don't
care that it adds one more register movement instruction.
And by "why it matters" I mean "major difference in code generation",
not some "it uses one more register and has to spill" kind of small
detail.
Why it matters is because gcc generates code that does not
preserve address dependency of the second READ_ONCE().
At this point, I'm not even convinced the whole hazard pointer
approach makes sense. And you're not helping by making it more
complicated than it needs to be.
I'm preparing a small series that aims to show how a minimal
hazard pointer implementation can help improve common scenarios:
- Guarantee object existence on pointer dereference to increment
a reference count:
- replace locking used for that purpose in drivers (e.g. usb),
- replace RCU + inc_not_zero pattern,
- rtmutex: I suspect we can improve situations where locks need
to be taken in reverse dependency chain order by guaranteeing
existence of first and second locks in traversal order,
allowing them to be locked in the correct order (which is
reverse from traversal order) rather than try-lock+retry on
nested lock. This can be done with hazard pointers without
requiring object reclaim to be delayed by an RCU grace period.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com