On 12/23/21 20:17, John David Anglin wrote: > On 2021-12-23 1:47 p.m., Helge Deller wrote: >> ... >>> In order to do this, we need a mechanism to trigger COW breaks outside the >>> critical region. Fortunately, parisc has the "stbys,e" instruction. When >>> the leftmost byte of a word is addressed, this instruction triggers all >>> the exceptions of a normal store but it does not write to memory. Thus, >>> we can use it to trigger COW breaks outside the critical region without >>> modifying the data that is to be updated atomically. >> ... >>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h >>> index 9cd4dd6e63ad..8f97db995b04 100644 >>> --- a/arch/parisc/include/asm/futex.h >>> +++ b/arch/parisc/include/asm/futex.h >> ... >>> +static inline bool _futex_force_interruptions(unsigned long ua) >>> +{ >>> + bool result; >>> + >>> + __asm__ __volatile__( >>> + "1:\tldw 0(%1), %0\n" >>> + "2:\tstbys,e %%r0, 0(%1)\n" >>> + "\tcomclr,= %%r0, %%r0, %0\n" >>> + "3:\tldi 1, %0\n" >>> + ASM_EXCEPTIONTABLE_ENTRY(1b, 3b) >>> + ASM_EXCEPTIONTABLE_ENTRY(2b, 3b) >>> + : "=&r" (result) : "r" (ua) : "memory" >>> + ); >>> + return result; >> I wonder if we can get rid of the comclr,= instruction by using >> ASM_EXCEPTIONTABLE_ENTRY_EFAULT instead of ASM_EXCEPTIONTABLE_ENTRY, >> e.g.: >> >> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h >> index 8f97db995b04..ea052f013865 100644 >> --- a/arch/parisc/include/asm/futex.h >> +++ b/arch/parisc/include/asm/futex.h >> @@ -21,20 +21,21 @@ static inline unsigned long _futex_hash_index(unsigned long ua) >> * if load and store fault. >> */ >> >> -static inline bool _futex_force_interruptions(unsigned long ua) >> +static inline unsigned long _futex_force_interruptions(unsigned long ua) >> { >> - bool result; >> + register unsigned long error __asm__ ("r8") = 0; >> + register unsigned long temp; >> >> __asm__ __volatile__( >> - "1:\tldw 0(%1), %0\n" >> - "2:\tstbys,e %%r0, 0(%1)\n" >> - "\tcomclr,= %%r0, %%r0, %0\n" >> - "3:\tldi 1, %0\n" >> - ASM_EXCEPTIONTABLE_ENTRY(1b, 3b) >> - ASM_EXCEPTIONTABLE_ENTRY(2b, 3b) >> - : "=&r" (result) : "r" (ua) : "memory" >> + "1:\tldw 0(%2), %0\n" >> + "2:\tstbys,e %%r0, 0(%2)\n" >> + "3:\n" >> + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) >> + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) >> + : "=r" (temp), "=r" (error) >> + : "r" (ua), "1" (error) : "memory" >> ); >> - return result; >> + return error; >> } > I don't think this is a win. > > 1) Register %r8 needs to get loaded with 0. So, that's one instruction. True. On the other hand you don't need the "ldi 1,%0" then, which brings parity. > 2) Register %r8 is a caller saves register. Using it will cause gcc to save and restore it from stack. This may > cause a stack frame to be created when one isn't needed. The save and restore instructions are more > expensive, particularly if they cause a TLB miss. Because of this reason, wouln't it make sense to switch the uaccess functions away from r8 too, and use another temp register in both? > Note that the comclr both clears result and nullifies the following ldi instruction, so it is not executed in the fast path. Yes, but when emulating with qemu, it generates comparism and jumps, while the loading of r8 (see 1)), is trivial. If we change the uaccess functions away from r8, then we can drop comclr (and save one instruction). Helge