Re: [PATCH v2] parisc: Rewrite light-weight syscall and futex code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/23/21 21:14, John David Anglin wrote:
> On 2021-12-23 2:31 p.m., Helge Deller wrote:
>> On 12/23/21 20:17, John David Anglin wrote:
>>> On 2021-12-23 1:47 p.m., Helge Deller wrote:
>>>> ...
>>>>> In order to do this, we need a mechanism to trigger COW breaks outside the
>>>>> critical region. Fortunately, parisc has the "stbys,e" instruction. When
>>>>> the leftmost byte of a word is addressed, this instruction triggers all
>>>>> the exceptions of a normal store but it does not write to memory. Thus,
>>>>> we can use it to trigger COW breaks outside the critical region without
>>>>> modifying the data that is to be updated atomically.
>>>> ...
>>>>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
>>>>> index 9cd4dd6e63ad..8f97db995b04 100644
>>>>> --- a/arch/parisc/include/asm/futex.h
>>>>> +++ b/arch/parisc/include/asm/futex.h
>>>> ...
>>>>> +static inline bool _futex_force_interruptions(unsigned long ua)
>>>>> +{
>>>>> +    bool result;
>>>>> +
>>>>> +    __asm__ __volatile__(
>>>>> +        "1:\tldw 0(%1), %0\n"
>>>>> +        "2:\tstbys,e %%r0, 0(%1)\n"
>>>>> +        "\tcomclr,= %%r0, %%r0, %0\n"
>>>>> +        "3:\tldi 1, %0\n"
>>>>> +        ASM_EXCEPTIONTABLE_ENTRY(1b, 3b)
>>>>> +        ASM_EXCEPTIONTABLE_ENTRY(2b, 3b)
>>>>> +        : "=&r" (result) : "r" (ua) : "memory"
>>>>> +    );
>>>>> +    return result;
>>>> I wonder if we can get rid of the comclr,= instruction by using
>>>> ASM_EXCEPTIONTABLE_ENTRY_EFAULT instead of ASM_EXCEPTIONTABLE_ENTRY,
>>>> e.g.:
>>>>
>>>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
>>>> index 8f97db995b04..ea052f013865 100644
>>>> --- a/arch/parisc/include/asm/futex.h
>>>> +++ b/arch/parisc/include/asm/futex.h
>>>> @@ -21,20 +21,21 @@ static inline unsigned long _futex_hash_index(unsigned long ua)
>>>>     * if load and store fault.
>>>>     */
>>>>
>>>> -static inline bool _futex_force_interruptions(unsigned long ua)
>>>> +static inline unsigned long _futex_force_interruptions(unsigned long ua)
>>>>    {
>>>> -    bool result;
>>>> +    register unsigned long error __asm__ ("r8") = 0;
>>>> +    register unsigned long temp;
>>>>
>>>>        __asm__ __volatile__(
>>>> -        "1:\tldw 0(%1), %0\n"
>>>> -        "2:\tstbys,e %%r0, 0(%1)\n"
>>>> -        "\tcomclr,= %%r0, %%r0, %0\n"
>>>> -        "3:\tldi 1, %0\n"
>>>> -        ASM_EXCEPTIONTABLE_ENTRY(1b, 3b)
>>>> -        ASM_EXCEPTIONTABLE_ENTRY(2b, 3b)
>>>> -        : "=&r" (result) : "r" (ua) : "memory"
>>>> +        "1:\tldw 0(%2), %0\n"
>>>> +        "2:\tstbys,e %%r0, 0(%2)\n"
>>>> +        "3:\n"
>>>> +        ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b)
>>>> +        ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b)
>>>> +        : "=r" (temp), "=r" (error)
>>>> +        : "r" (ua), "1" (error) : "memory"
>>>>        );
>>>> -    return result;
>>>> +    return error;
>>>>    }
>>> I don't think this is a win.
>>>
>>> 1) Register %r8 needs to get loaded with 0. So, that's one instruction.
>> True. On the other hand you don't need the "ldi 1,%0" then, which brings parity.
>>
>>> 2) Register %r8 is a caller saves register. Using it will cause gcc to save and restore it from stack. This may
>>> cause a stack frame to be created when one isn't needed. The save and restore instructions are more
>>> expensive, particularly if they cause a TLB miss.
>> Because of this reason, wouln't it make sense to switch the uaccess functions away from r8 too,
>> and use another temp register in both?
> Yes. I think it would be best to use %r28.  Then, error will be in correct register for return.

r29 seems to generate smaller code than r28, so I choosed r29.

> The variables temp and error can combined.

Then I need the copy %r0,%error in asm code.
It's no option to use ldw with target register %r0 ?

>>> Note that the comclr both clears result and nullifies the following ldi instruction, so it is not executed in the fast path.
>> Yes, but when emulating with qemu, it generates comparism and jumps, while the loading of r8 (see 1)), is trivial.
> In this case, the comparison is always true. Does qemu detect that?

Doesn't seem so. I think it's quite uncommon to use it that way... :-)

> The comclr is equivalent to setting target register to 0 and a branch to .+8.
>>
>> If we change the uaccess functions away from r8, then we can drop comclr (and save one instruction).
> If we can use %r28, we should be able to save one instruction.

Yes.

> It's slightly less work for me if you change the uaccess and update the asm at the same time.

Sure. I'll clean it up and commit to for-next.

Helge




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux