Hi Christoph, Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :
Hi all, this series removes the last set_fs() used to force a kernel address space for the uaccess code in the kernel read/write/splice code, and then stops implementing the address space overrides entirely for x86 and powerpc. The file system part has been posted a few times, and the read/write side has been pretty much unchanced. For splice this series drops the conversion of the seq_file and sysctl code to the iter ops, and thus loses the splice support for them. The reasons for that is that it caused a lot of churn for not much use - splice for these small files really isn't much of a win, even if existing userspace uses it. All callers I found do the proper fallback, but if this turns out to be an issue the conversion can be resurrected. Besides x86 and powerpc I plan to eventually convert all other architectures, although this will be a slow process, starting with the easier ones once the infrastructure is merged. The process to convert architectures is roughtly: (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code (2) implement __get_kernel_nofault and __put_kernel_nofault (3) remove the arch specific address limitation functionality Changes since v1: - drop the patch to remove the non-iter ops for /dev/zero and /dev/null as they caused a performance regression - don't enable user access in __get_kernel on powerpc - xfail the set_fs() based lkdtm tests Diffstat:
I'm still sceptic with the results I get. With 5.9-rc2: root@vgoippro:~# time dd if=/dev/zero of=/dev/null count=1M 1048576+0 records in 1048576+0 records out 536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s real 0m 5.59s user 0m 1.40s sys 0m 4.19s With your series: root@vgoippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M 1048576+0 records in 1048576+0 records out 536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s real 0m 7.79s user 0m 2.12s sys 0m 5.66s Top of perf report of a standard perf record: With 5.9-rc2: 20.31% dd [kernel.kallsyms] [k] __arch_clear_user 8.37% dd [kernel.kallsyms] [k] transfer_to_syscall 7.37% dd [kernel.kallsyms] [k] __fsnotify_parent 6.95% dd [kernel.kallsyms] [k] iov_iter_zero 5.72% dd [kernel.kallsyms] [k] new_sync_read 4.87% dd [kernel.kallsyms] [k] vfs_write 4.47% dd [kernel.kallsyms] [k] vfs_read 3.07% dd [kernel.kallsyms] [k] ksys_write 2.77% dd [kernel.kallsyms] [k] ksys_read 2.65% dd [kernel.kallsyms] [k] __fget_light 2.37% dd [kernel.kallsyms] [k] __fdget_pos 2.35% dd [kernel.kallsyms] [k] memset 1.53% dd [kernel.kallsyms] [k] rw_verify_area 1.52% dd [kernel.kallsyms] [k] read_iter_zero With your series: 19.60% dd [kernel.kallsyms] [k] __arch_clear_user 10.92% dd [kernel.kallsyms] [k] iov_iter_zero 9.50% dd [kernel.kallsyms] [k] vfs_write 8.97% dd [kernel.kallsyms] [k] __fsnotify_parent 5.46% dd [kernel.kallsyms] [k] transfer_to_syscall 5.42% dd [kernel.kallsyms] [k] vfs_read 3.58% dd [kernel.kallsyms] [k] ksys_read 2.84% dd [kernel.kallsyms] [k] read_iter_zero 2.24% dd [kernel.kallsyms] [k] ksys_write 1.80% dd [kernel.kallsyms] [k] __fget_light 1.34% dd [kernel.kallsyms] [k] __fdget_pos 0.91% dd [kernel.kallsyms] [k] memset 0.91% dd [kernel.kallsyms] [k] rw_verify_area Christophe