On 12/23/21 20:31, Helge Deller wrote: > On 12/23/21 20:17, John David Anglin wrote: >> On 2021-12-23 1:47 p.m., Helge Deller wrote: >>> ... >>>> In order to do this, we need a mechanism to trigger COW breaks outside the >>>> critical region. Fortunately, parisc has the "stbys,e" instruction. When >>>> the leftmost byte of a word is addressed, this instruction triggers all >>>> the exceptions of a normal store but it does not write to memory. Thus, >>>> we can use it to trigger COW breaks outside the critical region without >>>> modifying the data that is to be updated atomically. >>> ... >>>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h >>>> index 9cd4dd6e63ad..8f97db995b04 100644 >>>> --- a/arch/parisc/include/asm/futex.h >>>> +++ b/arch/parisc/include/asm/futex.h >>> ... >>>> +static inline bool _futex_force_interruptions(unsigned long ua) >>>> +{ >>>> + bool result; >>>> + >>>> + __asm__ __volatile__( >>>> + "1:\tldw 0(%1), %0\n" >>>> + "2:\tstbys,e %%r0, 0(%1)\n" >>>> + "\tcomclr,= %%r0, %%r0, %0\n" >>>> + "3:\tldi 1, %0\n" >>>> + ASM_EXCEPTIONTABLE_ENTRY(1b, 3b) >>>> + ASM_EXCEPTIONTABLE_ENTRY(2b, 3b) >>>> + : "=&r" (result) : "r" (ua) : "memory" >>>> + ); >>>> + return result; >>> I wonder if we can get rid of the comclr,= instruction by using >>> ASM_EXCEPTIONTABLE_ENTRY_EFAULT instead of ASM_EXCEPTIONTABLE_ENTRY, >>> e.g.: >>> >>> diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h >>> index 8f97db995b04..ea052f013865 100644 >>> --- a/arch/parisc/include/asm/futex.h >>> +++ b/arch/parisc/include/asm/futex.h >>> @@ -21,20 +21,21 @@ static inline unsigned long _futex_hash_index(unsigned long ua) >>> * if load and store fault. >>> */ >>> >>> -static inline bool _futex_force_interruptions(unsigned long ua) >>> +static inline unsigned long _futex_force_interruptions(unsigned long ua) >>> { >>> - bool result; >>> + register unsigned long error __asm__ ("r8") = 0; >>> + register unsigned long temp; >>> >>> __asm__ __volatile__( >>> - "1:\tldw 0(%1), %0\n" >>> - "2:\tstbys,e %%r0, 0(%1)\n" >>> - "\tcomclr,= %%r0, %%r0, %0\n" >>> - "3:\tldi 1, %0\n" >>> - ASM_EXCEPTIONTABLE_ENTRY(1b, 3b) >>> - ASM_EXCEPTIONTABLE_ENTRY(2b, 3b) >>> - : "=&r" (result) : "r" (ua) : "memory" >>> + "1:\tldw 0(%2), %0\n" >>> + "2:\tstbys,e %%r0, 0(%2)\n" >>> + "3:\n" >>> + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) >>> + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) >>> + : "=r" (temp), "=r" (error) >>> + : "r" (ua), "1" (error) : "memory" >>> ); >>> - return result; >>> + return error; >>> } >> I don't think this is a win. >> >> 1) Register %r8 needs to get loaded with 0. So, that's one instruction. > > True. On the other hand you don't need the "ldi 1,%0" then, which brings parity. > >> 2) Register %r8 is a caller saves register. Using it will cause gcc to save and restore it from stack. This may >> cause a stack frame to be created when one isn't needed. The save and restore instructions are more >> expensive, particularly if they cause a TLB miss. > > Because of this reason, wouln't it make sense to switch the uaccess functions away from r8 too, > and use another temp register in both? > >> Note that the comclr both clears result and nullifies the following ldi instruction, so it is not executed in the fast path. > > Yes, but when emulating with qemu, it generates comparism and jumps, while the loading of r8 (see 1)), is trivial. > > If we change the uaccess functions away from r8, then we can drop comclr (and save one instruction). Independend of your patch I think we should go away in uacces from r8. I did some testing (on 32bit kernel) and r29 seems good. Opinion? When changing from r8 to r22: # scripts/bloat-o-meter vmlinux1 vmlinux add/remove: 0/0 grow/shrink: 23/85 up/down: 216/-964 (-748) ... and when changing from r8 to r29: # scripts/bloat-o-meter vmlinux1 vmlinux add/remove: 0/0 grow/shrink: 23/86 up/down: 228/-980 (-752) Function old new delta tty_mode_ioctl 1468 1512 +44 fat_generic_ioctl 1428 1464 +36 __receive_fd 260 276 +16 vt_do_kdsk_ioctl 880 892 +12 futex_requeue 2972 2984 +12 do_tcp_getsockopt.constprop 4724 4736 +12 blkdev_ioctl 2644 2656 +12 vt_do_diacrit 1000 1008 +8 tty_jobctrl_ioctl 1060 1068 +8 tioclinux 700 708 +8 sg_read 1256 1264 +8 scsi_ioctl 2004 2012 +8 uart_ioctl 1468 1472 +4 sys_sendfile64 204 208 +4 sys_sendfile 188 192 +4 sys_rt_sigreturn 288 292 +4 st_ioctl 5436 5440 +4 rtc_dev_read 412 416 +4 ipv6_get_msfilter 252 256 +4 ip_get_mcast_msfilter 196 200 +4 futex_unlock_pi 756 760 +4 fillonedir 276 280 +4 dev_ifconf 256 260 +4 vt_ioctl 4084 4080 -4 sg_ioctl 2120 2116 -4 n_tty_ioctl 316 312 -4 handle_futex_death.part 404 400 -4 do_ipv6_getsockopt.constprop 2068 2064 -4 count.constprop 248 244 -4 autofs_root_ioctl 576 572 -4 write_sysrq_trigger 96 88 -8 write_port 220 212 -8 sys_utime32 132 124 -8 sys_time32 80 72 -8 sys_settimeofday 208 200 -8 sys_pselect6_time32 320 312 -8 sys_pselect6 320 312 -8 sys_name_to_handle_at 488 480 -8 sys_msgsnd 80 72 -8 sys_ioctl 2396 2388 -8 sys_io_submit 2944 2936 -8 sys_gettimeofday 160 152 -8 sys_getresgid 124 116 -8 sys_getgroups 140 132 -8 sys_getdents64 288 280 -8 sys_getdents 280 272 -8 sys_getcpu 112 104 -8 sock_getsockopt 2476 2468 -8 serport_ldisc_ioctl 88 80 -8 schedule_tail 100 92 -8 raw_getsockopt 196 188 -8 proc_bus_pci_write 604 596 -8 proc_bus_pci_read 624 616 -8 pipe_ioctl 184 176 -8 move_addr_to_user 156 148 -8 mmc_ioctl_cdrom_last_written 84 76 -8 mm_release 180 172 -8 load_elf_binary 4336 4328 -8 ksys_msgsnd 80 72 -8 kpagecount_read 552 544 -8 kpagecgroup_read 424 416 -8 kernel_wait4 304 296 -8 ip_mc_msfget 472 464 -8 generic_ptrace_peekdata 100 92 -8 futex_wait_setup 248 240 -8 futex_get_value_locked 92 84 -8 fat_dir_ioctl 340 332 -8 do_signal.part 332 324 -8 do_signal 1144 1136 -8 do_recvmmsg 632 624 -8 do_msg_fill 108 100 -8 do_compat_futimesat 204 196 -8 check_syscallno_in_delay_branch.constprop 172 164 -8 autofs_expire_multi 76 68 -8 __sys_socketpair 560 552 -8 __sys_sendmmsg 396 388 -8 __save_altstack 92 84 -8 udp_ioctl 168 156 -12 tcp_ioctl 512 500 -12 sys_waitid 312 300 -12 sys_setgroups 484 472 -12 sys_getresuid 140 128 -12 sys_get_robust_list 220 208 -12 sys_capset 460 448 -12 sys_capget 344 332 -12 pty_set_lock 224 212 -12 futex_cmpxchg_value_locked 216 204 -12 fpr_set 240 228 -12 fault_in_writeable 176 164 -12 fault_in_readable 188 176 -12 copy_to_kernel_nofault 352 340 -12 copy_from_kernel_nofault 400 388 -12 strncpy_from_kernel_nofault 284 268 -16 put_cmsg 340 324 -16 packet_ioctl 344 328 -16 check_zeroed_user 220 204 -16 cap_validate_magic 336 320 -16 strncpy_from_user 484 464 -20 random_ioctl 532 512 -20 do_epoll_wait 1704 1684 -20 vt_do_kdskled 632 608 -24 unix_ioctl 484 460 -24 read_port 208 184 -24 ns_ioctl 260 236 -24 ata_sas_scsi_ioctl 704 680 -24 pty_bsd_ioctl 288 260 -28 arch_ptrace 524 496 -28 pty_unix98_ioctl 312 280 -32 copy_process 6364 6316 -48 Total: Before=8913722, After=8912970, chg -0.01%