Let's introduce a new VM type that allows user space to run KVM guests without having to enable vm.alloc_pgste for all user space processes. As long as user space follows these simple rules with the new VM type, everything should be fine: - Use only mmap for memory to be used with user memory regions - Do all such mmap calls after the VM was created. If user space fails to obey these rules, KVM will report -EFAULT whenever an address in that incompatible range is accessed. Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> --- Documentation/virtual/kvm/api.txt | 13 +++++++++++++ arch/s390/kvm/kvm-s390.c | 8 ++++++-- include/uapi/linux/kvm.h | 2 ++ 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 912b7df..056f391 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -117,6 +117,16 @@ In order to create user controlled virtual machines on S390, check KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as privileged user (CAP_SYS_ADMIN). +When creating a virtual machine with the flag KVM_VM_S390_LATE_MMAP +on s390, only memory mmap'ed after VM creation is guaranteed to be usable +in user memory regions, because special page tables that might be +necessary for virtualization will only be created for memory mmaped +after this call. This avoids having to enable system wide vm.alloc_pgste +in order to create KVM guests. Whether this type of VM can be created is +indicated by the capability "KVM_CAP_S390_LATE_MMAP". If memory used for +user memory regions is mmap'ed before VM creation or !mmap'ed memory is used +for user memory regions, don't use this VM type. + To use hardware assisted virtualization on MIPS (VZ ASE) rather than the default trap & emulate implementation (which changes the virtual memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the @@ -985,6 +995,9 @@ the memory region are automatically reflected into the guest. For example, an mmap() that affects the region will be made visible immediately. Another example is madvise(MADV_DROP). +See KVM_CREATE_VM for special handling related to KVM_VM_S390_LATE_MMAP +on s390x. + It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. The KVM_SET_MEMORY_REGION does not allow fine grained control over memory allocation and is deprecated. diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 89684bb..75df5cd 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -389,6 +389,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_USER_INSTR0: case KVM_CAP_S390_CMMA_MIGRATION: case KVM_CAP_S390_AIS: + case KVM_CAP_S390_LATE_MMAP: r = 1; break; case KVM_CAP_S390_MEM_OP: @@ -1796,6 +1797,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) int i, rc; char debug_name[16]; static unsigned long sca_offset; + bool mixed_pgtables = false; rc = -EINVAL; #ifdef CONFIG_KVM_S390_UCONTROL @@ -1804,11 +1806,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) if ((type & KVM_VM_S390_UCONTROL) && (!capable(CAP_SYS_ADMIN))) goto out_err; #else - if (type) + if (type & KVM_VM_S390_LATE_MMAP) + mixed_pgtables = true; + else if (type) goto out_err; #endif - rc = s390_enable_sie(false); + rc = s390_enable_sie(mixed_pgtables); if (rc) goto out_err; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2b8dc1c..436ba30 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -726,6 +726,7 @@ struct kvm_ppc_resize_hpt { /* machine type bits, to be used as argument to KVM_CREATE_VM */ #define KVM_VM_S390_UCONTROL 1 +#define KVM_VM_S390_LATE_MMAP 2 /* on ppc, 0 indicate default, 1 should force HV and 2 PR */ #define KVM_VM_PPC_HV 1 @@ -925,6 +926,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_GUEST_MWAIT 143 #define KVM_CAP_ARM_USER_IRQ 144 #define KVM_CAP_S390_CMMA_MIGRATION 145 +#define KVM_CAP_S390_LATE_MMAP 200 #ifdef KVM_CAP_IRQ_ROUTING -- 2.9.3