Re: [PATCH] Alpha: Emulate unaligned LDx_L/STx_C for data consistency

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 19 Feb 2025 15:40:00 -0800

On Wed, 19 Feb 2025 at 04:46, Maciej W. Rozycki <macro@xxxxxxxxxxx> wrote:
>
> 1. A trapping unaligned LDx_L operation results in the pair of adjacent
>    aligned whole data quantities spanned being read and stored for the
>    reference with a subsequent STx_C operation, along with the width of
>    the data accessed and its virtual address, and the task referring or
>    NULL if the kernel.  The validity marker is set.

So I have a couple of comments. I don't care deeply because I do think
alpha is dead, but for completeness:

 (a) I don't think the task checking is correct.

You're only checking the task pointer, so what can happen is that a
task exits and another starts up with the same task pointer value, and
it all matches across one task doing a ld_l and another doing a st_c.

Does this matter? No. You'd literally have to *try* to create that
situation with identical mis-aligned addresses and data contents, and
an exit after a 'ld_l', and doing a 'st_c' in the new task without the
matching ld_l.

So I suspect this works in practice, but it's still worth mentioning.

 (b) this is not truly atomic wrt concurrent aligned non-trapping
operations to the same words. Or in fact to current trapping ones,
since you end up inevitably releasing the spinlock before the final
stc emulation.

I think this is fundamental and non-fixable, because the stc is done
as two operations, and the first can succeed with the second failing
(or they can both succeed, just interleaved with other accesses).

Again, I don't think we care, and it works in practice, but it does
mean that I *really* think that:

 (c) you should not handle the kernel case at all.

If the kernel does an unaligned ld_l/st_c, that's a fundamental kernel
bug. Don't emulate it. Particularly when the emulation fundamentally
is not truly atomic wrt other accesses.

Finally:

 (d) I think you're doing a few too many inline asms by hand, and
you're masking the results too much.

On the read-side emulation, why do you do that

+               "1:     ldl %3,0(%5)\n"
+               "2:     ldl %4,4(%5)\n"
+               "       srl %3,%6,%1\n"
+               "       sll %4,%7,%2\n"
+               "       zapnot %1,15,%1\n"
+               "       zapnot %2,15,%2\n"

at all? Just do two aligned loads, and don't mask the bytes around
them. A *real* ldl/stc will fail not just when the exact bytes are
different, but when somebody has touched the same cacheline. So if the
aligned values have changed, you should fail the stc even if the
change was in other bytes.

And doing two aligned loads don't need any inline asm at all.

On the st_c side, I think you're repeating the same inline asm twice,
and should have a single helper.

Is this a NAK for the patch? No. But I do think it should be massaged a bit.

            Linus