On Fri, Jul 21, 2023 at 01:43:06PM +1200, Michael Schmitz wrote:
Ah, it's not supposed to be cleared. The way this works is that bit 0
is the lock bit; if someone's waiting on the folio, they set bit 7. If
bit 7 is set when we clear bit 0, we look on the wait queue. If there's
nobody on the wait queue, we clear bit 7.
Right, that's what I meant to say. I'd only seen cases where bit 0 had been
set and was cleared. This isn't an actual production system of sorts, just
an ARAnyM instance I can fire up quickly to see patched kernels crash
horribly.
Well, I appreciate the testing!
This is what I have tests running on right now:
static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
volatile unsigned long *p)
{
unsigned char *cp = (unsigned char *) p;
char result;
char mask = 1 << nr; /* nr guaranteed to be < 7 */
__asm__ __volatile__ ("eori.b %1, %2; smi %0"
: "=d" (result)
: "i" (mask), "o" (*(cp+3))
: "memory");
return result;
}
I thought it a little odd to use an unsigned char when we're testing
to see if it's negative, so I went with this:
static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
volatile unsigned long *p)
{
char result;
char mask = 1 << nr; /* nr guaranteed to be < 7 */
char *cp = (char *)p + 3; /* m68k is big-endian */
__asm__ __volatile__ ("eori.b %1, %2; smi %0"
: "=d" (result)
: "i" (mask), "o" (*cp)
: "memory");
return result;
}
I'm sure you can do all the casting to char and increment by 3 in the asm
argument...
I'd rather not. I looked at doing the offset by three inside the asm,
but it seems like gcc is smart enough to do that without help:
000006e0 <folio_unlock>:
6e0: 206f 0004 moveal %sp@(4),%a0
6e4: 0a28 0001 0003 eorib #1,%a0@(3)
6ea: 5bc0 smi %d0
6ec: 4a00 tstb %d0
6ee: 670a beqs 6fa <folio_unlock+0x1a>
6f0: 42a7 clrl %sp@-
6f2: 2f08 movel %a0,%sp@-
6f4: 4eba fcec jsr %pc@(3e2 <folio_wake_bit>)
6f8: 508f addql #8,%sp
6fa: 4e75 rts
You'll note the smi/tstb pair are unnecessary. It could simply BPL to
the RTS instruction, but we can't tell GCC that because we don't have
the __GCC_ASM_FLAG_OUTPUTS__ feature.
By the way, before this optimisation, it was this:
000006fc <folio_unlock>:
6fc: 206f 0004 moveal %sp@(4),%a0
700: 08a8 0000 0003 bclr #0,%a0@(3)
706: 2010 movel %a0@,%d0
708: 4a00 tstb %d0
70a: 6c0a bges 716 <folio_unlock+0x1a>
70c: 42a7 clrl %sp@-
70e: 2f08 movel %a0,%sp@-
710: 4eba fcd0 jsr %pc@(3e2 <folio_wake_bit>)
714: 508f addql #8,%sp
716: 4e75 rts
which is the same number of instructions, but one more memory reference.
It's a read-after-write hazard, but I don't know if that affects any
m68k implementation; my impression is that even on an '060 there aren't
any real performance implications. Kudos to gcc for figuring out that
testing bit 7 can be done with the tstb instruction.
If there's a simple way to exercise this code path using standard Unix tools
(or stress-ng which I ought to have somewhere), drop me a hint.
Oh, it's so common to have a waiter on a folio unlock that just making
it to the login prompt is enough to declare comfidently that this works.
CPU implementations with memory barriers and such fanciness are a little
harder to be confident in, but this looks good to me. I generally run
xfstests, but that's just because I have it all set up and ready to go.
I'll drop your Tested-by on this if that's OK? If you want a
Co-developed-by credit, that's fine with me too!