Hi Alex, On Wed, Nov 17, 2021 at 7:37 AM Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Stage 2 faults triggered by the profiling buffer attempting to write to > memory are reported by the SPE hardware by asserting a buffer management > event interrupt. Interrupts are by their nature asynchronous, which means > that the guest might have changed its stage 1 translation tables since the > attempted write. SPE reports the guest virtual address that caused the data > abort, not the IPA, which means that KVM would have to walk the guest's > stage 1 tables to find the IPA. Using the AT instruction to walk the > guest's tables in hardware is not an option because it doesn't report the > IPA in the case of a stage 2 fault on a stage 1 table walk. > > Avoid both issues by pre-mapping the guest memory at stage 2. This is being > done by adding a capability that allows the user to pin the memory backing > a memslot. The same capability can be used to unlock a memslot, which > unpins the pages associated with the memslot, but doesn't unmap the IPA > range from stage 2; in this case, the addresses will be unmapped from stage > 2 via the MMU notifiers when the process' address space changes. > > For now, the capability doesn't actually do anything other than checking > that the usage is correct; the memory operations will be added in future > patches. > > Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> > --- > Documentation/virt/kvm/api.rst | 57 ++++++++++++++++++++++++++ > arch/arm64/include/asm/kvm_mmu.h | 3 ++ > arch/arm64/kvm/arm.c | 42 ++++++++++++++++++-- > arch/arm64/kvm/mmu.c | 68 ++++++++++++++++++++++++++++++++ > include/uapi/linux/kvm.h | 8 ++++ > 5 files changed, 174 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index aeeb071c7688..16aa59eae3d9 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6925,6 +6925,63 @@ indicated by the fd to the VM this is called on. > This is intended to support intra-host migration of VMs between userspace VMMs, > upgrading the VMM process without interrupting the guest. > > +7.30 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION > +---------------------------------------- > + > +:Architectures: arm64 > +:Target: VM > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or > + KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK > + args[0] is the slot number > + args[1] specifies the permisions when the memslot is locked or if > + all memslots should be unlocked > + > +The presence of this capability indicates that KVM supports locking the memory > +associated with the memslot, and unlocking a previously locked memslot. > + > +The 'flags' parameter is defined as follows: > + > +7.30.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK > +------------------------------------------------- > + > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION > +:Architectures: arm64 > +:Target: VM > +:Parameters: args[0] contains the memory slot number > + args[1] contains the permissions for the locked memory: > + KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with > + read permissions and KVM_ARM_LOCK_MEMORY_WRITE > + (optional) with write permissions Nit: Those flag names don't match the ones in the code. (Their names in the code are KVM_ARM_LOCK_MEM_READ/KVM_ARM_LOCK_MEM_WRITE) What is the reason why KVM_ARM_LOCK_MEMORY_{READ,WRITE} flags need to be specified even though memslot already has similar flags ?? > +:Returns: 0 on success; negative error code on failure > + > +Enabling this capability causes the memory described by the memslot to be > +pinned in the process address space and the corresponding stage 2 IPA range > +mapped at stage 2. The permissions specified in args[1] apply to both > +mappings. The memory pinned with this capability counts towards the max > +locked memory limit for the current process. > + > +The capability should be enabled when no VCPUs are in the kernel executing an > +ioctl (and in particular, KVM_RUN); otherwise the ioctl will block until all > +VCPUs have returned. The virtual memory range described by the memslot must be > +mapped in the userspace process without any gaps. It is considered an error if > +write permissions are specified for a memslot which logs dirty pages. > + > +7.30.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK > +--------------------------------------------------- > + > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION > +:Architectures: arm64 > +:Target: VM > +:Parameters: args[0] contains the memory slot number > + args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL, > + which unlocks all previously locked memslots. > +:Returns: 0 on success; negative error code on failure > + > +Enabling this capability causes the memory pinned when locking the memslot > +specified in args[0] to be unpinned, or, optionally, all memslots to be > +unlocked. The IPA range is not unmapped from stage 2. > +>>>>>>> 56641eee289e (KVM: arm64: Add lock/unlock memslot user API) Nit: An unnecessary line. If a memslot with read/write permission is locked with read only, and then unlocked, can userspace expect stage 2 mapping for the memslot to be updated with read/write ? Can userspace delete the memslot that is locked (without unlocking) ? If so, userspace can expect the corresponding range to be implicitly unlocked, correct ? Thanks, Reiji > + > 8. Other capabilities. > ====================== > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 02d378887743..2c50734f048d 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size) > void kvm_set_way_flush(struct kvm_vcpu *vcpu); > void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled); > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags); > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags); > + > static inline unsigned int kvm_get_vmid_bits(void) > { > int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1); > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index e9b4ad7b5c82..d49905d18cee 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -78,16 +78,43 @@ int kvm_arch_check_processor_compat(void *opaque) > return 0; > } > > +static int kvm_arm_lock_memslot_supported(void) > +{ > + return 0; > +} > + > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm, > + struct kvm_enable_cap *cap) > +{ > + u64 slot, action_flags; > + u32 action; > + > + if (cap->args[2] || cap->args[3]) > + return -EINVAL; > + > + slot = cap->args[0]; > + action = cap->flags; > + action_flags = cap->args[1]; > + > + switch (action) { > + case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK: > + return kvm_mmu_lock_memslot(kvm, slot, action_flags); > + case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK: > + return kvm_mmu_unlock_memslot(kvm, slot, action_flags); > + default: > + return -EINVAL; > + } > +} > + > int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > struct kvm_enable_cap *cap) > { > int r; > > - if (cap->flags) > - return -EINVAL; > - > switch (cap->cap) { > case KVM_CAP_ARM_NISV_TO_USER: > + if (cap->flags) > + return -EINVAL; > r = 0; > kvm->arch.return_nisv_io_abort_to_user = true; > break; > @@ -101,6 +128,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > } > mutex_unlock(&kvm->lock); > break; > + case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION: > + if (!kvm_arm_lock_memslot_supported()) > + return -EINVAL; > + r = kvm_lock_user_memory_region_ioctl(kvm, cap); > + break; > default: > r = -EINVAL; > break; > @@ -168,7 +200,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) > return VM_FAULT_SIGBUS; > } > > - > /** > * kvm_arch_destroy_vm - destroy the VM data structure > * @kvm: pointer to the KVM struct > @@ -276,6 +307,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_ARM_PTRAUTH_GENERIC: > r = system_has_full_ptr_auth(); > break; > + case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION: > + r = kvm_arm_lock_memslot_supported(); > + break; > default: > r = 0; > } > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 326cdfec74a1..f65bcbc9ae69 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1296,6 +1296,74 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > return ret; > } > > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags) > +{ > + struct kvm_memory_slot *memslot; > + int ret; > + > + if (slot >= KVM_MEM_SLOTS_NUM) > + return -EINVAL; > + > + if (!(flags & KVM_ARM_LOCK_MEM_READ)) > + return -EINVAL; > + > + mutex_lock(&kvm->lock); > + if (!kvm_lock_all_vcpus(kvm)) { > + ret = -EBUSY; > + goto out_unlock_kvm; > + } > + mutex_lock(&kvm->slots_lock); > + > + memslot = id_to_memslot(kvm_memslots(kvm), slot); > + if (!memslot) { > + ret = -EINVAL; > + goto out_unlock_slots; > + } > + if ((flags & KVM_ARM_LOCK_MEM_WRITE) && > + ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) { > + ret = -EPERM; > + goto out_unlock_slots; > + } > + > + ret = -EINVAL; > + > +out_unlock_slots: > + mutex_unlock(&kvm->slots_lock); > + kvm_unlock_all_vcpus(kvm); > +out_unlock_kvm: > + mutex_unlock(&kvm->lock); > + return ret; > +} > + > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags) > +{ > + bool unlock_all = flags & KVM_ARM_UNLOCK_MEM_ALL; > + struct kvm_memory_slot *memslot; > + int ret; > + > + if (!unlock_all && slot >= KVM_MEM_SLOTS_NUM) > + return -EINVAL; > + > + mutex_lock(&kvm->slots_lock); > + > + if (unlock_all) { > + ret = -EINVAL; > + goto out_unlock_slots; > + } > + > + memslot = id_to_memslot(kvm_memslots(kvm), slot); > + if (!memslot) { > + ret = -EINVAL; > + goto out_unlock_slots; > + } > + > + ret = -EINVAL; > + > +out_unlock_slots: > + mutex_unlock(&kvm->slots_lock); > + return ret; > +} > + > bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > { > if (!kvm->arch.mmu.pgt) > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 1daa45268de2..70c969967557 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1131,6 +1131,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204 > #define KVM_CAP_ARM_MTE 205 > #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 > +#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 207 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -1483,6 +1484,13 @@ struct kvm_s390_ucas_mapping { > #define KVM_PPC_SVM_OFF _IO(KVMIO, 0xb3) > #define KVM_ARM_MTE_COPY_TAGS _IOR(KVMIO, 0xb4, struct kvm_arm_copy_mte_tags) > > +/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */ > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK (1 << 0) > +#define KVM_ARM_LOCK_MEM_READ (1 << 0) > +#define KVM_ARM_LOCK_MEM_WRITE (1 << 1) > +#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK (1 << 1) > +#define KVM_ARM_UNLOCK_MEM_ALL (1 << 0) > + > /* ioctl for vm fd */ > #define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device) > > -- > 2.33.1 > > _______________________________________________ > kvmarm mailing list > kvmarm@xxxxxxxxxxxxxxxxxxxxx > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm