On Wed, 19 Feb 2025 at 04:46, Maciej W. Rozycki <macro@xxxxxxxxxxx> wrote: > > 1. A trapping unaligned LDx_L operation results in the pair of adjacent > aligned whole data quantities spanned being read and stored for the > reference with a subsequent STx_C operation, along with the width of > the data accessed and its virtual address, and the task referring or > NULL if the kernel. The validity marker is set. So I have a couple of comments. I don't care deeply because I do think alpha is dead, but for completeness: (a) I don't think the task checking is correct. You're only checking the task pointer, so what can happen is that a task exits and another starts up with the same task pointer value, and it all matches across one task doing a ld_l and another doing a st_c. Does this matter? No. You'd literally have to *try* to create that situation with identical mis-aligned addresses and data contents, and an exit after a 'ld_l', and doing a 'st_c' in the new task without the matching ld_l. So I suspect this works in practice, but it's still worth mentioning. (b) this is not truly atomic wrt concurrent aligned non-trapping operations to the same words. Or in fact to current trapping ones, since you end up inevitably releasing the spinlock before the final stc emulation. I think this is fundamental and non-fixable, because the stc is done as two operations, and the first can succeed with the second failing (or they can both succeed, just interleaved with other accesses). Again, I don't think we care, and it works in practice, but it does mean that I *really* think that: (c) you should not handle the kernel case at all. If the kernel does an unaligned ld_l/st_c, that's a fundamental kernel bug. Don't emulate it. Particularly when the emulation fundamentally is not truly atomic wrt other accesses. Finally: (d) I think you're doing a few too many inline asms by hand, and you're masking the results too much. On the read-side emulation, why do you do that + "1: ldl %3,0(%5)\n" + "2: ldl %4,4(%5)\n" + " srl %3,%6,%1\n" + " sll %4,%7,%2\n" + " zapnot %1,15,%1\n" + " zapnot %2,15,%2\n" at all? Just do two aligned loads, and don't mask the bytes around them. A *real* ldl/stc will fail not just when the exact bytes are different, but when somebody has touched the same cacheline. So if the aligned values have changed, you should fail the stc even if the change was in other bytes. And doing two aligned loads don't need any inline asm at all. On the st_c side, I think you're repeating the same inline asm twice, and should have a single helper. Is this a NAK for the patch? No. But I do think it should be massaged a bit. Linus