Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christoph,

Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :
Hi all,

this series removes the last set_fs() used to force a kernel address
space for the uaccess code in the kernel read/write/splice code, and then
stops implementing the address space overrides entirely for x86 and
powerpc.

The file system part has been posted a few times, and the read/write side
has been pretty much unchanced.  For splice this series drops the
conversion of the seq_file and sysctl code to the iter ops, and thus loses
the splice support for them.  The reasons for that is that it caused a lot
of churn for not much use - splice for these small files really isn't much
of a win, even if existing userspace uses it.  All callers I found do the
proper fallback, but if this turns out to be an issue the conversion can
be resurrected.

Besides x86 and powerpc I plan to eventually convert all other
architectures, although this will be a slow process, starting with the
easier ones once the infrastructure is merged.  The process to convert
architectures is roughtly:

  (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
  (2) implement __get_kernel_nofault and __put_kernel_nofault
  (3) remove the arch specific address limitation functionality

Changes since v1:
  - drop the patch to remove the non-iter ops for /dev/zero and
    /dev/null as they caused a performance regression
  - don't enable user access in __get_kernel on powerpc
  - xfail the set_fs() based lkdtm tests

Diffstat:



I'm still sceptic with the results I get.

With 5.9-rc2:

root@vgoippro:~# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s
real    0m 5.59s
user    0m 1.40s
sys     0m 4.19s


With your series:

root@vgoippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s
real    0m 7.79s
user    0m 2.12s
sys     0m 5.66s




Top of perf report of a standard perf record:

With 5.9-rc2:

    20.31%  dd       [kernel.kallsyms]  [k] __arch_clear_user
     8.37%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
     7.37%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
     6.95%  dd       [kernel.kallsyms]  [k] iov_iter_zero
     5.72%  dd       [kernel.kallsyms]  [k] new_sync_read
     4.87%  dd       [kernel.kallsyms]  [k] vfs_write
     4.47%  dd       [kernel.kallsyms]  [k] vfs_read
     3.07%  dd       [kernel.kallsyms]  [k] ksys_write
     2.77%  dd       [kernel.kallsyms]  [k] ksys_read
     2.65%  dd       [kernel.kallsyms]  [k] __fget_light
     2.37%  dd       [kernel.kallsyms]  [k] __fdget_pos
     2.35%  dd       [kernel.kallsyms]  [k] memset
     1.53%  dd       [kernel.kallsyms]  [k] rw_verify_area
     1.52%  dd       [kernel.kallsyms]  [k] read_iter_zero

With your series:
    19.60%  dd       [kernel.kallsyms]  [k] __arch_clear_user
    10.92%  dd       [kernel.kallsyms]  [k] iov_iter_zero
     9.50%  dd       [kernel.kallsyms]  [k] vfs_write
     8.97%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
     5.46%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
     5.42%  dd       [kernel.kallsyms]  [k] vfs_read
     3.58%  dd       [kernel.kallsyms]  [k] ksys_read
     2.84%  dd       [kernel.kallsyms]  [k] read_iter_zero
     2.24%  dd       [kernel.kallsyms]  [k] ksys_write
     1.80%  dd       [kernel.kallsyms]  [k] __fget_light
     1.34%  dd       [kernel.kallsyms]  [k] __fdget_pos
     0.91%  dd       [kernel.kallsyms]  [k] memset
     0.91%  dd       [kernel.kallsyms]  [k] rw_verify_area

Christophe



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux