There is a class of applications that use KVM to manage multiple address spaces rather than use it as an isolation boundary. In all other terms, they are normal processes that execute system calls, handle signals, etc. Currently, each time when such a process needs to interact with the operation system, it has to switch to host and back to guest. Such entire switches are expensive and significantly increase the overhead of system calls. The new hypercall reduces this overhead by more than two times. The new hypercall runs system calls on the host. As for native system calls, seccomp filters are executed before system calls. It takes one argument that is a pointer to a pt_regs structure in the host address space. It provides registers to execute a system call according to the calling convention. Arguments are passed in %rdi, %rsi, %rdx, %r10, %r8 and %r9 and a return code is stored in %rax. The hypercall returns 0 if a system call has been executed. Otherwise, it returns an error code. This series introduces a new capability that has to be set to enable the hypercall. The new hypercall is a backdoor for regular virtual machines, so it is disabled by default. There is another standard way to allow hypercalls via cpuid. It has not been used because one of the common ways to manage them is to request all available features and let them all together. In this case, it is a hard requirement that the new hypercall can be enabled only intentionally. = Background = gVisor is one such application. It is an application kernel written in Go that implements a substantial portion of the Linux system call interface. gVisor intercepts application system calls and acts as the guest kernel. It has a platform abstraction that implements interception of syscalls, basic context switching, and memory mapping functionality. Currently, it has two platforms: ptrace and KVM. The ptrace platform uses PTRACE_SYSEMU to execute user code without allowing it to perform host system calls, and it creates stub processes to manage user address spaces. This platform is primarily for testing needs due to its bad performance. Another option is the KVM platform. In this case, the Sentry (gVisor kernel) can run in a guest ring0 and create/manage multiple address spaces. Its performance is much better than the ptrace one, but it is still not great compared with the native performance. This change optimizes the most critical part, which is the syscall overhead. The idea of using vmcall to execute system calls isn’t new. Two large users of gVisor (Google and AntFinacial) have out-of-tree code to implement such hypercalls. In the Google kernel, we have a kvm-like subsystem designed especially for gVisor. This change is the first step of integrating it into the KVM code base and making it available to all Linux users. Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> Cc: Sean Christopherson <seanjc@xxxxxxxxxx> Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> Cc: Jianfeng Tan <henry.tjf@xxxxxxxxxx> Cc: Adin Scannell <ascannell@xxxxxxxxxx> Cc: Konstantin Bogomolov <bogomolov@xxxxxxxxxx> Cc: Etienne Perot <eperot@xxxxxxxxxx> Andrei Vagin (5): kernel: add a new helper to execute system calls from kernel code kvm: add controls to enable/disable paravirtualized system calls KVM/x86: add a new hypercall to execute host system calls. selftests/kvm/x86_64: set rax before vmcall selftests/kvm/x86_64: add tests for KVM_HC_HOST_SYSCALL Documentation/virt/kvm/x86/hypercalls.rst | 15 ++ arch/x86/entry/common.c | 48 ++++++ arch/x86/include/asm/syscall.h | 1 + arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kvm/cpuid.c | 25 +++ arch/x86/kvm/cpuid.h | 8 +- arch/x86/kvm/x86.c | 37 +++++ include/uapi/linux/kvm.h | 1 + include/uapi/linux/kvm_para.h | 1 + tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 4 + .../selftests/kvm/lib/x86_64/processor.c | 2 +- .../kvm/x86_64/kvm_pv_syscall_test.c | 145 ++++++++++++++++++ 14 files changed, 289 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_pv_syscall_test.c -- 2.37.0.rc0.161.g10f37bed90-goog