Re: [PATCH v2] parisc: Rewrite light-weight syscall and futex code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/23/21 20:31, Helge Deller wrote:
> On 12/23/21 20:17, John David Anglin wrote:
>> On 2021-12-23 1:47 p.m., Helge Deller wrote:
>>> ...
>>>> In order to do this, we need a mechanism to trigger COW breaks outside the
>>>> critical region. Fortunately, parisc has the "stbys,e" instruction. When
>>>> the leftmost byte of a word is addressed, this instruction triggers all
>>>> the exceptions of a normal store but it does not write to memory. Thus,
>>>> we can use it to trigger COW breaks outside the critical region without
>>>> modifying the data that is to be updated atomically.
>>> ...
>>>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
>>>> index 9cd4dd6e63ad..8f97db995b04 100644
>>>> --- a/arch/parisc/include/asm/futex.h
>>>> +++ b/arch/parisc/include/asm/futex.h
>>> ...
>>>> +static inline bool _futex_force_interruptions(unsigned long ua)
>>>> +{
>>>> +    bool result;
>>>> +
>>>> +    __asm__ __volatile__(
>>>> +        "1:\tldw 0(%1), %0\n"
>>>> +        "2:\tstbys,e %%r0, 0(%1)\n"
>>>> +        "\tcomclr,= %%r0, %%r0, %0\n"
>>>> +        "3:\tldi 1, %0\n"
>>>> +        ASM_EXCEPTIONTABLE_ENTRY(1b, 3b)
>>>> +        ASM_EXCEPTIONTABLE_ENTRY(2b, 3b)
>>>> +        : "=&r" (result) : "r" (ua) : "memory"
>>>> +    );
>>>> +    return result;
>>> I wonder if we can get rid of the comclr,= instruction by using
>>> ASM_EXCEPTIONTABLE_ENTRY_EFAULT instead of ASM_EXCEPTIONTABLE_ENTRY,
>>> e.g.:
>>>
>>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
>>> index 8f97db995b04..ea052f013865 100644
>>> --- a/arch/parisc/include/asm/futex.h
>>> +++ b/arch/parisc/include/asm/futex.h
>>> @@ -21,20 +21,21 @@ static inline unsigned long _futex_hash_index(unsigned long ua)
>>>    * if load and store fault.
>>>    */
>>>
>>> -static inline bool _futex_force_interruptions(unsigned long ua)
>>> +static inline unsigned long _futex_force_interruptions(unsigned long ua)
>>>   {
>>> -    bool result;
>>> +    register unsigned long error __asm__ ("r8") = 0;
>>> +    register unsigned long temp;
>>>
>>>       __asm__ __volatile__(
>>> -        "1:\tldw 0(%1), %0\n"
>>> -        "2:\tstbys,e %%r0, 0(%1)\n"
>>> -        "\tcomclr,= %%r0, %%r0, %0\n"
>>> -        "3:\tldi 1, %0\n"
>>> -        ASM_EXCEPTIONTABLE_ENTRY(1b, 3b)
>>> -        ASM_EXCEPTIONTABLE_ENTRY(2b, 3b)
>>> -        : "=&r" (result) : "r" (ua) : "memory"
>>> +        "1:\tldw 0(%2), %0\n"
>>> +        "2:\tstbys,e %%r0, 0(%2)\n"
>>> +        "3:\n"
>>> +        ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b)
>>> +        ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b)
>>> +        : "=r" (temp), "=r" (error)
>>> +        : "r" (ua), "1" (error) : "memory"
>>>       );
>>> -    return result;
>>> +    return error;
>>>   }
>> I don't think this is a win.
>>
>> 1) Register %r8 needs to get loaded with 0. So, that's one instruction.
>
> True. On the other hand you don't need the "ldi 1,%0" then, which brings parity.
>
>> 2) Register %r8 is a caller saves register. Using it will cause gcc to save and restore it from stack. This may
>> cause a stack frame to be created when one isn't needed. The save and restore instructions are more
>> expensive, particularly if they cause a TLB miss.
>
> Because of this reason, wouln't it make sense to switch the uaccess functions away from r8 too,
> and use another temp register in both?
>
>> Note that the comclr both clears result and nullifies the following ldi instruction, so it is not executed in the fast path.
>
> Yes, but when emulating with qemu, it generates comparism and jumps, while the loading of r8 (see 1)), is trivial.
>
> If we change the uaccess functions away from r8, then we can drop comclr (and save one instruction).

Independend of your patch I think we should go away in uacces from r8.
I did some testing (on 32bit kernel) and r29 seems good.
Opinion?

When changing from r8 to r22:
# scripts/bloat-o-meter vmlinux1 vmlinux
add/remove: 0/0 grow/shrink: 23/85 up/down: 216/-964 (-748)
...

and when changing from r8 to r29:
# scripts/bloat-o-meter vmlinux1 vmlinux
add/remove: 0/0 grow/shrink: 23/86 up/down: 228/-980 (-752)
Function                                     old     new   delta
tty_mode_ioctl                              1468    1512     +44
fat_generic_ioctl                           1428    1464     +36
__receive_fd                                 260     276     +16
vt_do_kdsk_ioctl                             880     892     +12
futex_requeue                               2972    2984     +12
do_tcp_getsockopt.constprop                 4724    4736     +12
blkdev_ioctl                                2644    2656     +12
vt_do_diacrit                               1000    1008      +8
tty_jobctrl_ioctl                           1060    1068      +8
tioclinux                                    700     708      +8
sg_read                                     1256    1264      +8
scsi_ioctl                                  2004    2012      +8
uart_ioctl                                  1468    1472      +4
sys_sendfile64                               204     208      +4
sys_sendfile                                 188     192      +4
sys_rt_sigreturn                             288     292      +4
st_ioctl                                    5436    5440      +4
rtc_dev_read                                 412     416      +4
ipv6_get_msfilter                            252     256      +4
ip_get_mcast_msfilter                        196     200      +4
futex_unlock_pi                              756     760      +4
fillonedir                                   276     280      +4
dev_ifconf                                   256     260      +4
vt_ioctl                                    4084    4080      -4
sg_ioctl                                    2120    2116      -4
n_tty_ioctl                                  316     312      -4
handle_futex_death.part                      404     400      -4
do_ipv6_getsockopt.constprop                2068    2064      -4
count.constprop                              248     244      -4
autofs_root_ioctl                            576     572      -4
write_sysrq_trigger                           96      88      -8
write_port                                   220     212      -8
sys_utime32                                  132     124      -8
sys_time32                                    80      72      -8
sys_settimeofday                             208     200      -8
sys_pselect6_time32                          320     312      -8
sys_pselect6                                 320     312      -8
sys_name_to_handle_at                        488     480      -8
sys_msgsnd                                    80      72      -8
sys_ioctl                                   2396    2388      -8
sys_io_submit                               2944    2936      -8
sys_gettimeofday                             160     152      -8
sys_getresgid                                124     116      -8
sys_getgroups                                140     132      -8
sys_getdents64                               288     280      -8
sys_getdents                                 280     272      -8
sys_getcpu                                   112     104      -8
sock_getsockopt                             2476    2468      -8
serport_ldisc_ioctl                           88      80      -8
schedule_tail                                100      92      -8
raw_getsockopt                               196     188      -8
proc_bus_pci_write                           604     596      -8
proc_bus_pci_read                            624     616      -8
pipe_ioctl                                   184     176      -8
move_addr_to_user                            156     148      -8
mmc_ioctl_cdrom_last_written                  84      76      -8
mm_release                                   180     172      -8
load_elf_binary                             4336    4328      -8
ksys_msgsnd                                   80      72      -8
kpagecount_read                              552     544      -8
kpagecgroup_read                             424     416      -8
kernel_wait4                                 304     296      -8
ip_mc_msfget                                 472     464      -8
generic_ptrace_peekdata                      100      92      -8
futex_wait_setup                             248     240      -8
futex_get_value_locked                        92      84      -8
fat_dir_ioctl                                340     332      -8
do_signal.part                               332     324      -8
do_signal                                   1144    1136      -8
do_recvmmsg                                  632     624      -8
do_msg_fill                                  108     100      -8
do_compat_futimesat                          204     196      -8
check_syscallno_in_delay_branch.constprop     172     164      -8
autofs_expire_multi                           76      68      -8
__sys_socketpair                             560     552      -8
__sys_sendmmsg                               396     388      -8
__save_altstack                               92      84      -8
udp_ioctl                                    168     156     -12
tcp_ioctl                                    512     500     -12
sys_waitid                                   312     300     -12
sys_setgroups                                484     472     -12
sys_getresuid                                140     128     -12
sys_get_robust_list                          220     208     -12
sys_capset                                   460     448     -12
sys_capget                                   344     332     -12
pty_set_lock                                 224     212     -12
futex_cmpxchg_value_locked                   216     204     -12
fpr_set                                      240     228     -12
fault_in_writeable                           176     164     -12
fault_in_readable                            188     176     -12
copy_to_kernel_nofault                       352     340     -12
copy_from_kernel_nofault                     400     388     -12
strncpy_from_kernel_nofault                  284     268     -16
put_cmsg                                     340     324     -16
packet_ioctl                                 344     328     -16
check_zeroed_user                            220     204     -16
cap_validate_magic                           336     320     -16
strncpy_from_user                            484     464     -20
random_ioctl                                 532     512     -20
do_epoll_wait                               1704    1684     -20
vt_do_kdskled                                632     608     -24
unix_ioctl                                   484     460     -24
read_port                                    208     184     -24
ns_ioctl                                     260     236     -24
ata_sas_scsi_ioctl                           704     680     -24
pty_bsd_ioctl                                288     260     -28
arch_ptrace                                  524     496     -28
pty_unix98_ioctl                             312     280     -32
copy_process                                6364    6316     -48
Total: Before=8913722, After=8912970, chg -0.01%




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux