On 12/23/21 21:14, John David Anglin wrote: > On 2021-12-23 2:31 p.m., Helge Deller wrote: >> On 12/23/21 20:17, John David Anglin wrote: >>> On 2021-12-23 1:47 p.m., Helge Deller wrote: >>>> ... >>>>> In order to do this, we need a mechanism to trigger COW breaks outside the >>>>> critical region. Fortunately, parisc has the "stbys,e" instruction. When >>>>> the leftmost byte of a word is addressed, this instruction triggers all >>>>> the exceptions of a normal store but it does not write to memory. Thus, >>>>> we can use it to trigger COW breaks outside the critical region without >>>>> modifying the data that is to be updated atomically. >>>> ... >>>>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h >>>>> index 9cd4dd6e63ad..8f97db995b04 100644 >>>>> --- a/arch/parisc/include/asm/futex.h >>>>> +++ b/arch/parisc/include/asm/futex.h >>>> ... >>>>> +static inline bool _futex_force_interruptions(unsigned long ua) >>>>> +{ >>>>> + bool result; >>>>> + >>>>> + __asm__ __volatile__( >>>>> + "1:\tldw 0(%1), %0\n" >>>>> + "2:\tstbys,e %%r0, 0(%1)\n" >>>>> + "\tcomclr,= %%r0, %%r0, %0\n" >>>>> + "3:\tldi 1, %0\n" >>>>> + ASM_EXCEPTIONTABLE_ENTRY(1b, 3b) >>>>> + ASM_EXCEPTIONTABLE_ENTRY(2b, 3b) >>>>> + : "=&r" (result) : "r" (ua) : "memory" >>>>> + ); >>>>> + return result; >>>> I wonder if we can get rid of the comclr,= instruction by using >>>> ASM_EXCEPTIONTABLE_ENTRY_EFAULT instead of ASM_EXCEPTIONTABLE_ENTRY, >>>> e.g.: >>>> >>>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h >>>> index 8f97db995b04..ea052f013865 100644 >>>> --- a/arch/parisc/include/asm/futex.h >>>> +++ b/arch/parisc/include/asm/futex.h >>>> @@ -21,20 +21,21 @@ static inline unsigned long _futex_hash_index(unsigned long ua) >>>> * if load and store fault. >>>> */ >>>> >>>> -static inline bool _futex_force_interruptions(unsigned long ua) >>>> +static inline unsigned long _futex_force_interruptions(unsigned long ua) >>>> { >>>> - bool result; >>>> + register unsigned long error __asm__ ("r8") = 0; >>>> + register unsigned long temp; >>>> >>>> __asm__ __volatile__( >>>> - "1:\tldw 0(%1), %0\n" >>>> - "2:\tstbys,e %%r0, 0(%1)\n" >>>> - "\tcomclr,= %%r0, %%r0, %0\n" >>>> - "3:\tldi 1, %0\n" >>>> - ASM_EXCEPTIONTABLE_ENTRY(1b, 3b) >>>> - ASM_EXCEPTIONTABLE_ENTRY(2b, 3b) >>>> - : "=&r" (result) : "r" (ua) : "memory" >>>> + "1:\tldw 0(%2), %0\n" >>>> + "2:\tstbys,e %%r0, 0(%2)\n" >>>> + "3:\n" >>>> + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) >>>> + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) >>>> + : "=r" (temp), "=r" (error) >>>> + : "r" (ua), "1" (error) : "memory" >>>> ); >>>> - return result; >>>> + return error; >>>> } >>> I don't think this is a win. >>> >>> 1) Register %r8 needs to get loaded with 0. So, that's one instruction. >> True. On the other hand you don't need the "ldi 1,%0" then, which brings parity. >> >>> 2) Register %r8 is a caller saves register. Using it will cause gcc to save and restore it from stack. This may >>> cause a stack frame to be created when one isn't needed. The save and restore instructions are more >>> expensive, particularly if they cause a TLB miss. >> Because of this reason, wouln't it make sense to switch the uaccess functions away from r8 too, >> and use another temp register in both? > Yes. I think it would be best to use %r28. Then, error will be in correct register for return. r29 seems to generate smaller code than r28, so I choosed r29. > The variables temp and error can combined. Then I need the copy %r0,%error in asm code. It's no option to use ldw with target register %r0 ? >>> Note that the comclr both clears result and nullifies the following ldi instruction, so it is not executed in the fast path. >> Yes, but when emulating with qemu, it generates comparism and jumps, while the loading of r8 (see 1)), is trivial. > In this case, the comparison is always true. Does qemu detect that? Doesn't seem so. I think it's quite uncommon to use it that way... :-) > The comclr is equivalent to setting target register to 0 and a branch to .+8. >> >> If we change the uaccess functions away from r8, then we can drop comclr (and save one instruction). > If we can use %r28, we should be able to save one instruction. Yes. > It's slightly less work for me if you change the uaccess and update the asm at the same time. Sure. I'll clean it up and commit to for-next. Helge