Re: clear_bit_unlock_is_negative_byte

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew,

Am 22.07.2023 um 05:03 schrieb Matthew Wilcox:
I thought it a little odd to use an unsigned char when we're testing
to see if it's negative, so I went with this:

static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
                volatile unsigned long *p)
{
        char result;
        char mask = 1 << nr;            /* nr guaranteed to be < 7 */
        char *cp = (char *)p + 3;       /* m68k is big-endian */

        __asm__ __volatile__ ("eori.b %1, %2; smi %0"
                : "=d" (result)
                : "i" (mask), "o" (*cp)
                : "memory");
        return result;
}

I don't think signedness make a difference here, as we don't leave anything about the test to the compiler.

I'm sure you can do all the casting to char and increment by 3 in the asm
argument...

I'd rather not.  I looked at doing the offset by three inside the asm,
but it seems like gcc is smart enough to do that without help:

000006e0 <folio_unlock>:
     6e0:       206f 0004       moveal %sp@(4),%a0
     6e4:       0a28 0001 0003  eorib #1,%a0@(3)
     6ea:       5bc0            smi %d0
     6ec:       4a00            tstb %d0
     6ee:       670a            beqs 6fa <folio_unlock+0x1a>
     6f0:       42a7            clrl %sp@-
     6f2:       2f08            movel %a0,%sp@-
     6f4:       4eba fcec       jsr %pc@(3e2 <folio_wake_bit>)
     6f8:       508f            addql #8,%sp
     6fa:       4e75            rts

You'll note the smi/tstb pair are unnecessary.  It could simply BPL to
the RTS instruction, but we can't tell GCC that because we don't have
the __GCC_ASM_FLAG_OUTPUTS__ feature.

I see. __GCC_ASM_FLAG_OUTPUTS__ appears to be rare enough to assume m68k won't get to use it this decade.


By the way, before this optimisation, it was this:

000006fc <folio_unlock>:
     6fc:       206f 0004       moveal %sp@(4),%a0
     700:       08a8 0000 0003  bclr #0,%a0@(3)
     706:       2010            movel %a0@,%d0
     708:       4a00            tstb %d0
     70a:       6c0a            bges 716 <folio_unlock+0x1a>
     70c:       42a7            clrl %sp@-
     70e:       2f08            movel %a0,%sp@-
     710:       4eba fcd0       jsr %pc@(3e2 <folio_wake_bit>)
     714:       508f            addql #8,%sp
     716:       4e75            rts

which is the same number of instructions, but one more memory reference.

Seeing as a0 was just loaded from memory above, and should be in the cache, I doubt there's much performance difference from the additional memory access here. Haven't looked at how many clock cycles each of these instructions take.

It's a read-after-write hazard, but I don't know if that affects any
m68k implementation; my impression is that even on an '060 there aren't

Unless something modifies the memory location pointed to by a0 (during preemption of any kind), I can't see a race here (a0 is saved/restored during interrupts, syscalls and such).
We haven't seen any races here in the past with this code AFAICS.

any real performance implications.  Kudos to gcc for figuring out that
testing bit 7 can be done with the tstb instruction.

If there's a simple way to exercise this code path using standard Unix tools
(or stress-ng which I ought to have somewhere), drop me a hint.

Oh, it's so common to have a waiter on a folio unlock that just making
it to the login prompt is enough to declare comfidently that this works.
CPU implementations with memory barriers and such fanciness are a little
harder to be confident in, but this looks good to me.  I generally run
xfstests, but that's just because I have it all set up and ready to go.

Should be all good then.

I'll drop your Tested-by on this if that's OK?  If you want a
Co-developed-by credit, that's fine with me too!

Tested-by is fine, thanks. I wouldn't want anyone to assume I'm familiar with folios and the like...

Cheers,

	Michael





[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux