Linus, please pull the following changes since commit 0c8efd610b58cb23cefdfa12015799079aef94ae: Linux 4.16-rc5 (2018-03-11 17:25:09 -0700) which are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next up to commit c9a211951c7c79cfb5de888d7d9550872868b086: bpf: whitelist all syscalls for error injection (2018-04-02 20:16:21 +0200) to remove all in-kernel calls to syscalls except from arch/ . Since the last time I sent the patches out for review,[*] I have solely added a few more ACKs. Jon Corbet raised the question whether the documentation really should go to Documentation/process/adding-syscalls.rst and not to Documentation/process/coding-style.rst (even though, as he said, that isn't quite right either). As most of the existing instances where syscalls were called in the kernel were (1) common codepaths for old and new syscalls, (2) common codepaths for native and compat syscalls, and (3) syscall multiplexers like sys_ipc(), I have kept it at the former location for the time being, but will be happy to submit a follow-up patch to move the documentation bits to a different file. [*] lkml.kernel.org/r/20180329112426.23043-1-linux@xxxxxxxxxxxxxxxxxxxx All these patches have been in -next, but got rebased a few minutes ago to include another ACK in patch 2/109 (no code changes). There were/are a few trivial conflicts against the net, sparc and vfs trees, but not (yet) against what is in your tree up to commit 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc. Thanks, Dominik ---------------------------------------------------------------- System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h. First goes a patch which defines the goal and explains the rationale: syscalls: define and explain goal to not call syscalls in the kernel A few codepaths can trivially be converted to existing in-kernel interfaces: kernel: use kernel_wait4() instead of sys_wait4() kernel: open-code sys_rt_sigpending() in sys_sigpending() kexec: call do_kexec_load() in compat syscall directly mm: use do_futex() instead of sys_futex() in mm_release() x86: use _do_fork() in compat_sys_x86_clone() x86: remove compat_sys_x86_waitpid() Then follow many patches which only affect specfic subsystems each, and replace sys_*() with internal helpers named __sys_*() or do_sys_*(). Let's start with net/: net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall net: socket: add __sys_sendto() helper; remove in-kernel call to syscall net: socket: add __sys_accept4() helper; remove in-kernel call to syscall net: socket: add __sys_socket() helper; remove in-kernel call to syscall net: socket: add __sys_bind() helper; remove in-kernel call to syscall net: socket: add __sys_connect() helper; remove in-kernel call to syscall net: socket: add __sys_listen() helper; remove in-kernel call to syscall net: socket: add __sys_getsockname() helper; remove in-kernel call to syscall net: socket: add __sys_getpeername() helper; remove in-kernel call to syscall net: socket: add __sys_socketpair() helper; remove in-kernel call to syscall net: socket: add __sys_shutdown() helper; remove in-kernel call to syscall net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscall net: socket: add __sys_getsockopt() helper; remove in-kernel call to syscall net: socket: add do_sys_recvmmsg() helper; remove in-kernel call to syscall net: socket: move check for forbid_cmsg_compat to __sys_...msg() net: socket: replace calls to sys_send() with __sys_sendto() net: socket: replace call to sys_recv() with __sys_recvfrom() net: socket: add __compat_sys_recvfrom() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_setsockopt() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_getsockopt() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_recvmmsg() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to compat syscalls The changes in ipc/ are limited to this specific subsystem. The wrappers are named ksys_*() to denote that these functions are meant as a drop-in replacement for the syscalls. ipc: add semtimedop syscall/compat_syscall wrappers ipc: add semget syscall wrapper ipc: add semctl syscall/compat_syscall wrappers ipc: add msgget syscall wrapper ipc: add shmget syscall wrapper ipc: add shmdt syscall wrapper ipc: add shmctl syscall/compat_syscall wrappers ipc: add msgctl syscall/compat_syscall wrappers ipc: add msgrcv syscall/compat_syscall wrappers ipc: add msgsnd syscall/compat_syscall wrappers A few mindless conversions in kernel/ and mm/: kernel: add do_getpgid() helper; remove internal call to sys_getpgid() kernel: add do_compat_sigaltstack() helper; remove in-kernel call to compat syscall kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c sched: add do_sched_yield() helper; remove in-kernel call to sched_yield() mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c mm: add kernel_mbind() helper; remove in-kernel call to syscall mm: add kernel_[sg]et_mempolicy() helpers; remove in-kernel calls to syscalls Then, let's handle those instances internal to fs/ which call syscalls: fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() fs: add do_pipe2() helper; remove internal call to sys_pipe2() fs: add do_renameat2() helper; remove internal call to sys_renameat2() fs: add do_futimesat() helper; remove internal call to sys_futimesat() fs: add do_epoll_*() helpers; remove internal calls to sys_epoll_*() fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4() fs: add do_eventfd() helper; remove internal call to sys_eventfd() fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall fs: add do_vmsplice() helper; remove in-kernel call to syscall fs: add kern_select() helper; remove in-kernel call to sys_select() fs: add do_compat_fcntl64() helper; remove in-kernel call to compat syscall fs: add do_compat_select() helper; remove in-kernel call to compat syscall fs: add do_compat_signalfd4() helper; remove in-kernel call to compat syscall fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall inotify: add do_inotify_init() helper; remove in-kernel call to syscall fanotify: add do_fanotify_mark() helper; remove in-kernel call to syscall fs/quota: add kernel_quotactl() helper; remove in-kernel call to syscall fs/quota: use COMPAT_SYSCALL_DEFINE for sys32_quotactl() Several fs- and some mm-related syscalls are called in initramfs, initrd and init, devtmpfs, and pm code. While at least many of these instances should be converted to use proper in-kernel VFS interfaces in future, convert them mindlessly to ksys_*() helpers or wrappers for now. fs: add ksys_mount() helper; remove in-kernel calls to sys_mount() fs: add ksys_umount() helper; remove in-kernel call to sys_umount() fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}() fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot() fs: add ksys_write() helper; remove in-kernel calls to sys_write() fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir() fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink() hostfs: rename do_rmdir() to hostfs_do_rmdir() fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir() fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove in-kernel calls to syscall fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel calls to syscall fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() wrapper; remove in-kernel calls to syscall fs: add do_faccessat() helper and ksys_access() wrapper; remove in-kernel calls to syscall fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() fs: add ksys_open() wrapper; remove in-kernel calls to sys_open() fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() fs: add ksys_ioctl() helper; remove in-kernel calls to sys_ioctl() fs: add ksys_lseek() helper; remove in-kernel calls to sys_lseek() fs: add ksys_read() helper; remove in-kernel calls to sys_read() fs: add ksys_sync() helper; remove in-kernel calls to sys_sync() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() To reach the goal to get rid of all in-kernel calls to syscalls for x86, we need to handle a few further syscalls called from compat syscalls in x86 and (mostly) from other architectures. Those could be made generic making use of Al Viro's macro trickery. For v4.17, I'd suggest to keep it simple: fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() Then, throw in two fixes for x86: x86: fix sys_sigreturn() return type to be long, not unsigned long x86/sigreturn: use SYSCALL_DEFINE0 (by Michael Tautschnig) ... and clean up the three places where many syscalls are referenced or prototyped (kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h): kexec: move sys_kexec_load() prototype to syscalls.h syscalls: sort syscall prototypes in include/linux/syscalls.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/compat.h syscalls/x86: auto-create compat_sys_*() prototypes kernel/sys_ni: sort cond_syscall() entries kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions Last but not least, add a patch by Howard McLauchlan to whitelist all syscalls for error injection: bpf: whitelist all syscalls for error injection (by Howard McLauchlan) ---------------------------------------------------------------- Dominik Brodowski (107): syscalls: define and explain goal to not call syscalls in the kernel kernel: use kernel_wait4() instead of sys_wait4() kernel: open-code sys_rt_sigpending() in sys_sigpending() kexec: call do_kexec_load() in compat syscall directly mm: use do_futex() instead of sys_futex() in mm_release() x86: use _do_fork() in compat_sys_x86_clone() x86: remove compat_sys_x86_waitpid() net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall net: socket: add __sys_sendto() helper; remove in-kernel call to syscall net: socket: add __sys_accept4() helper; remove in-kernel call to syscall net: socket: add __sys_socket() helper; remove in-kernel call to syscall net: socket: add __sys_bind() helper; remove in-kernel call to syscall net: socket: add __sys_connect() helper; remove in-kernel call to syscall net: socket: add __sys_listen() helper; remove in-kernel call to syscall net: socket: add __sys_getsockname() helper; remove in-kernel call to syscall net: socket: add __sys_getpeername() helper; remove in-kernel call to syscall net: socket: add __sys_socketpair() helper; remove in-kernel call to syscall net: socket: add __sys_shutdown() helper; remove in-kernel call to syscall net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscall net: socket: add __sys_getsockopt() helper; remove in-kernel call to syscall net: socket: add do_sys_recvmmsg() helper; remove in-kernel call to syscall net: socket: move check for forbid_cmsg_compat to __sys_...msg() net: socket: replace calls to sys_send() with __sys_sendto() net: socket: replace call to sys_recv() with __sys_recvfrom() net: socket: add __compat_sys_recvfrom() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_setsockopt() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_getsockopt() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_recvmmsg() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to compat syscalls ipc: add semtimedop syscall/compat_syscall wrappers ipc: add semget syscall wrapper ipc: add semctl syscall/compat_syscall wrappers ipc: add msgget syscall wrapper ipc: add shmget syscall wrapper ipc: add shmdt syscall wrapper ipc: add shmctl syscall/compat_syscall wrappers ipc: add msgctl syscall/compat_syscall wrappers ipc: add msgrcv syscall/compat_syscall wrappers ipc: add msgsnd syscall/compat_syscall wrappers kernel: add do_getpgid() helper; remove internal call to sys_getpgid() kernel: add do_compat_sigaltstack() helper; remove in-kernel call to compat syscall kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c sched: add do_sched_yield() helper; remove in-kernel call to sched_yield() mm: add kernel_migrate_pages() helper, move compat syscall to mm/mempolicy.c mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c mm: add kernel_mbind() helper; remove in-kernel call to syscall mm: add kernel_[sg]et_mempolicy() helpers; remove in-kernel calls to syscalls fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() fs: add do_pipe2() helper; remove internal call to sys_pipe2() fs: add do_renameat2() helper; remove internal call to sys_renameat2() fs: add do_futimesat() helper; remove internal call to sys_futimesat() fs: add do_epoll_*() helpers; remove internal calls to sys_epoll_*() fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4() fs: add do_eventfd() helper; remove internal call to sys_eventfd() fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall fs: add do_vmsplice() helper; remove in-kernel call to syscall fs: add kern_select() helper; remove in-kernel call to sys_select() fs: add do_compat_fcntl64() helper; remove in-kernel call to compat syscall fs: add do_compat_select() helper; remove in-kernel call to compat syscall fs: add do_compat_signalfd4() helper; remove in-kernel call to compat syscall fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall inotify: add do_inotify_init() helper; remove in-kernel call to syscall fanotify: add do_fanotify_mark() helper; remove in-kernel call to syscall fs/quota: add kernel_quotactl() helper; remove in-kernel call to syscall fs/quota: use COMPAT_SYSCALL_DEFINE for sys32_quotactl() fs: add ksys_mount() helper; remove in-kernel calls to sys_mount() fs: add ksys_umount() helper; remove in-kernel call to sys_umount() fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}() fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot() fs: add ksys_write() helper; remove in-kernel calls to sys_write() fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir() fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink() hostfs: rename do_rmdir() to hostfs_do_rmdir() fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir() fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove in-kernel calls to syscall fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel calls to syscall fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() wrapper; remove in-kernel calls to syscall fs: add do_faccessat() helper and ksys_access() wrapper; remove in-kernel calls to syscall fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() fs: add ksys_open() wrapper; remove in-kernel calls to sys_open() fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() fs: add ksys_ioctl() helper; remove in-kernel calls to sys_ioctl() fs: add ksys_lseek() helper; remove in-kernel calls to sys_lseek() fs: add ksys_read() helper; remove in-kernel calls to sys_read() fs: add ksys_sync() helper; remove in-kernel calls to sys_sync() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() x86: fix sys_sigreturn() return type to be long, not unsigned long kexec: move sys_kexec_load() prototype to syscalls.h syscalls: sort syscall prototypes in include/linux/syscalls.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/compat.h syscalls/x86: auto-create compat_sys_*() prototypes kernel/sys_ni: sort cond_syscall() entries kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions Howard McLauchlan (1): bpf: whitelist all syscalls for error injection Tautschnig, Michael (1): x86/sigreturn: use SYSCALL_DEFINE0 Documentation/process/adding-syscalls.rst | 34 +- arch/alpha/kernel/osf_sys.c | 2 +- arch/arm/kernel/sys_arm.c | 2 +- arch/arm64/kernel/sys.c | 2 +- arch/ia64/kernel/sys_ia64.c | 4 +- arch/m68k/kernel/sys_m68k.c | 2 +- arch/microblaze/kernel/sys_microblaze.c | 6 +- arch/mips/kernel/linux32.c | 22 +- arch/mips/kernel/syscall.c | 6 +- arch/parisc/kernel/sys_parisc.c | 30 +- arch/powerpc/kernel/sys_ppc32.c | 18 +- arch/powerpc/kernel/syscalls.c | 6 +- arch/riscv/kernel/sys_riscv.c | 4 +- arch/s390/kernel/compat_linux.c | 37 +- arch/s390/kernel/sys_s390.c | 2 +- arch/sh/kernel/sys_sh.c | 4 +- arch/sh/kernel/sys_sh32.c | 12 +- arch/sparc/kernel/setup_32.c | 2 +- arch/sparc/kernel/sys_sparc32.c | 26 +- arch/sparc/kernel/sys_sparc_32.c | 6 +- arch/sparc/kernel/sys_sparc_64.c | 2 +- arch/um/kernel/syscall.c | 2 +- arch/x86/entry/syscalls/syscall_32.tbl | 4 +- arch/x86/ia32/ia32_signal.c | 1 - arch/x86/ia32/sys_ia32.c | 50 +- arch/x86/include/asm/sys_ia32.h | 67 -- arch/x86/include/asm/syscalls.h | 3 +- arch/x86/kernel/ioport.c | 7 +- arch/x86/kernel/signal.c | 5 +- arch/x86/kernel/sys_x86_64.c | 2 +- arch/xtensa/kernel/syscall.c | 2 +- drivers/base/devtmpfs.c | 11 +- drivers/tty/sysrq.c | 2 +- drivers/tty/vt/vt_ioctl.c | 6 +- fs/autofs4/dev-ioctl.c | 2 +- fs/binfmt_misc.c | 2 +- fs/dcookies.c | 11 +- fs/eventfd.c | 9 +- fs/eventpoll.c | 23 +- fs/fcntl.c | 12 +- fs/file.c | 17 +- fs/hostfs/hostfs.h | 2 +- fs/hostfs/hostfs_kern.c | 2 +- fs/hostfs/hostfs_user.c | 2 +- fs/internal.h | 14 + fs/ioctl.c | 7 +- fs/namei.c | 61 +- fs/namespace.c | 19 +- fs/notify/fanotify/fanotify_user.c | 14 +- fs/notify/inotify/inotify_user.c | 9 +- fs/open.c | 77 +- fs/pipe.c | 9 +- fs/quota/compat.c | 13 +- fs/quota/quota.c | 10 +- fs/read_write.c | 45 +- fs/readdir.c | 11 +- fs/select.c | 29 +- fs/signalfd.c | 31 +- fs/splice.c | 12 +- fs/stat.c | 12 +- fs/sync.c | 19 +- fs/utimes.c | 25 +- include/linux/compat.h | 644 ++++++------ include/linux/futex.h | 13 +- include/linux/kexec.h | 4 - include/linux/quotaops.h | 3 + include/linux/socket.h | 37 +- include/linux/syscalls.h | 1511 +++++++++++++++++------------ include/net/compat.h | 11 - init/do_mounts.c | 26 +- init/do_mounts.h | 4 +- init/do_mounts_initrd.c | 42 +- init/do_mounts_md.c | 29 +- init/do_mounts_rd.c | 40 +- init/initramfs.c | 52 +- init/main.c | 9 +- init/noinitramfs.c | 6 +- ipc/msg.c | 60 +- ipc/sem.c | 44 +- ipc/shm.c | 28 +- ipc/syscall.c | 58 +- ipc/util.h | 31 + kernel/compat.c | 55 -- kernel/exit.c | 2 +- kernel/fork.c | 11 +- kernel/kexec.c | 52 +- kernel/pid_namespace.c | 6 +- kernel/power/hibernate.c | 2 +- kernel/power/suspend.c | 2 +- kernel/power/user.c | 2 +- kernel/sched/core.c | 8 +- kernel/signal.c | 29 +- kernel/sys.c | 74 +- kernel/sys_ni.c | 617 +++++++----- kernel/uid16.c | 25 +- kernel/uid16.h | 14 + kernel/umh.c | 4 +- mm/fadvise.c | 10 +- mm/mempolicy.c | 92 +- mm/migrate.c | 39 +- mm/mmap.c | 17 +- mm/nommu.c | 17 +- mm/readahead.c | 7 +- net/compat.c | 136 ++- net/socket.c | 234 +++-- 105 files changed, 3129 insertions(+), 1868 deletions(-) delete mode 100644 arch/x86/include/asm/sys_ia32.h create mode 100644 kernel/uid16.h