On 2024-09-27 06:28, Boqun Feng wrote:
On Fri, Sep 27, 2024 at 09:37:50AM +0800, Boqun Feng wrote:
On Fri, Sep 27, 2024, at 9:30 AM, Mathieu Desnoyers wrote:
On 2024-09-27 02:01, Boqun Feng wrote:
#define ADDRESS_EQ(var, expr) \
({ \
bool _____cmp_res = (unsigned long)(var) == (unsigned long)(expr); \
\
OPTIMIZER_HIDE_VAR(var); \
_____cmp_res; \
})
If the goal is to ensure gcc uses the register populated by the
second, I'm afraid it does not work. AFAIU, "hiding" the dependency
chain does not prevent the SSA GVN optimization from combining the
Note it's not hiding the dependency, rather the equality,
registers as being one and choosing one arbitrary source. "hiding"
after OPTIMIZER_HIDE_VAR(var), compiler doesn't know whether 'var' is
equal to 'expr' anymore, because OPTIMIZER_HIDE_VAR(var) uses "=r"(var)
to indicate the output is overwritten. So when 'var' is referred later,
compiler cannot use the register for a 'expr' value or any other
register that has the same value, because 'var' may have a different
value from the compiler's POV.
the dependency chain before or after the comparison won't help here.
int fct_hide_var_compare(void)
{
int *a, *b;
do {
a = READ_ONCE(p);
asm volatile ("" : : : "memory");
b = READ_ONCE(p);
} while (!ADDRESS_EQ(a, b));
Note that ADDRESS_EQ() only hide first parameter, so this should be ADDRESS_EQ(b, a).
I replaced ADDRESS_EQ(a, b) with ADDRESS_EQ(b, a), and the compile
result shows it can prevent the issue:
I see, yes. It prevents the issue by making the compiler create
a copy of the value "modified" by the asm before doing the equality
comparison.
This means the compiler cannot derive the value for b from the first
load when b is used after after the equality comparison.
The only downside of OPTIMIZER_HIDE_VAR() is that it adds an extra
"mov" instruction to move the content across registers. I don't think
it matters performance wise though, so that solution is appealing
because it is arch-agnostic.
One small improvement over your proposed solution would be to apply
OPTIMIZER_HIDE_VAR() on both inputs. Because this is not a volatile
asm, it is simply optimized away if var1 or var2 is unused following
the equality comparison. It is more convenient to prevent replacement
of both addresses being compared by the other rather than providing
the guarantee only on a single parameter:
#define OPTIMIZER_HIDE_VAR(var) \
__asm__ ("" : "+r" (var))
#define ADDRESS_EQ(var1, var2) \
({ \
bool _____cmp_res = (var1) == (var2); \
\
OPTIMIZER_HIDE_VAR(var1); \
OPTIMIZER_HIDE_VAR(var2); \
_____cmp_res; \
})
Thanks,
Mathieu
gcc 14.2 x86-64:
fct_hide_var_compare:
.L2:
mov rcx, QWORD PTR p[rip]
mov rdx, QWORD PTR p[rip]
mov rax, rdx
cmp rcx, rdx
jne .L2
mov eax, DWORD PTR [rax]
ret
gcc 14.2.0 ARM64:
fct_hide_var_compare:
adrp x2, p
add x2, x2, :lo12:p
.L2:
ldr x3, [x2]
ldr x1, [x2]
mov x0, x1
cmp x3, x1
bne .L2
ldr w0, [x0]
ret
Link to godbolt:
https://godbolt.org/z/a7jsfzjxY--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com