On Thu, 10 Nov 2022 01:53:26 +0000, Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > As the SMCCC (and related specifications) march towards an > 'everything and the kitchen sink' interface for interacting with a > system, it is less likely that KVM will implement every supported > feature. > > Add a capability that allows userspace to trap hypercall ranges, > allowing the VMM to mix-and-match between calls handled in userspace vs. > KVM. > > Signed-off-by: Oliver Upton <oliver.upton@xxxxxxxxx> > --- > arch/arm64/include/asm/kvm_host.h | 5 ++++ > arch/arm64/include/uapi/asm/kvm.h | 15 ++++++++++ > arch/arm64/kvm/arm.c | 10 +++++++ > arch/arm64/kvm/hypercalls.c | 48 +++++++++++++++++++++++++++++++ > include/uapi/linux/kvm.h | 1 + > 5 files changed, 79 insertions(+) > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index e33ed7c09a28..cc3872f1900c 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -52,6 +52,9 @@ > > #define KVM_HAVE_MMU_RWLOCK > > +#define KVM_ARM_USER_HYPERCALL_FLAGS \ > + GENMASK_ULL(KVM_ARM_USER_HYPERCALL_FLAGS_COUNT - 1, 0) > + > /* > * Mode of operation configurable with kvm-arm.mode early param. > * See Documentation/admin-guide/kernel-parameters.txt for more information. > @@ -104,11 +107,13 @@ struct kvm_arch_memory_slot { > /** > * struct kvm_smccc_features: Descriptor of the hypercall services exposed to the guests > * > + * @user_trap_bmap: Bitmap of SMCCC function ranges trapped to userspace > * @std_bmap: Bitmap of standard secure service calls > * @std_hyp_bmap: Bitmap of standard hypervisor service calls > * @vendor_hyp_bmap: Bitmap of vendor specific hypervisor service calls > */ > struct kvm_smccc_features { > + unsigned long user_trap_bmap; nit: I strongly object to the word 'trap'. By definition, this is a trap. The difference here is that you *forward* something to userspace instead of implementing it in the kernel. > unsigned long std_bmap; > unsigned long std_hyp_bmap; > unsigned long vendor_hyp_bmap; > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h > index 316917b98707..07fa3f597e61 100644 > --- a/arch/arm64/include/uapi/asm/kvm.h > +++ b/arch/arm64/include/uapi/asm/kvm.h > @@ -370,6 +370,21 @@ enum { > #endif > }; > > +enum { > + KVM_ARM_USER_HYPERCALL_OWNER_ARCH = 0, > + KVM_ARM_USER_HYPERCALL_OWNER_CPU = 1, > + KVM_ARM_USER_HYPERCALL_OWNER_SIP = 2, > + KVM_ARM_USER_HYPERCALL_OWNER_OEM = 3, > + KVM_ARM_USER_HYPERCALL_OWNER_STANDARD = 4, > + KVM_ARM_USER_HYPERCALL_OWNER_STANDARD_HYP = 5, > + KVM_ARM_USER_HYPERCALL_OWNER_VENDOR_HYP = 6, > + KVM_ARM_USER_HYPERCALL_OWNER_TRUSTED_APP = 7, > + KVM_ARM_USER_HYPERCALL_OWNER_TRUSTED_OS = 8, > +#ifdef __KERNEL__ > + KVM_ARM_USER_HYPERCALL_FLAGS_COUNT, > +#endif > +}; > + > /* Device Control API: ARM VGIC */ > #define KVM_DEV_ARM_VGIC_GRP_ADDR 0 > #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1 > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 6f0b56e7f8c7..6e8a222fc295 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -100,6 +100,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > r = 0; > set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags); > break; > + case KVM_CAP_ARM_USER_HYPERCALLS: > + if (cap->args[0] & ~KVM_ARM_USER_HYPERCALL_FLAGS) > + return -EINVAL; > + > + r = 0; > + kvm->arch.smccc_feat.user_trap_bmap = cap->args[0]; > + break; > default: > r = -EINVAL; > break; > @@ -285,6 +292,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_ARM_PTRAUTH_GENERIC: > r = system_has_full_ptr_auth(); > break; > + case KVM_CAP_ARM_USER_HYPERCALLS: > + r = KVM_ARM_USER_HYPERCALL_FLAGS; > + break; > default: > r = 0; > } > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c > index 62ce45d0d957..22a23b12201d 100644 > --- a/arch/arm64/kvm/hypercalls.c > +++ b/arch/arm64/kvm/hypercalls.c > @@ -92,6 +92,49 @@ static bool kvm_hvc_call_default_allowed(u32 func_id) > } > } > > +static bool kvm_hvc_call_user_trapped(struct kvm_vcpu *vcpu, u32 func_id) > +{ > + struct kvm *kvm = vcpu->kvm; > + unsigned long *bmap = &kvm->arch.smccc_feat.user_trap_bmap; > + > + switch (ARM_SMCCC_OWNER_NUM(func_id)) { > + case ARM_SMCCC_OWNER_ARCH: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_ARCH, bmap); > + case ARM_SMCCC_OWNER_CPU: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_CPU, bmap); > + case ARM_SMCCC_OWNER_SIP: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_SIP, bmap); > + case ARM_SMCCC_OWNER_OEM: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_OEM, bmap); > + case ARM_SMCCC_OWNER_STANDARD: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_STANDARD, bmap); > + case ARM_SMCCC_OWNER_STANDARD_HYP: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_STANDARD_HYP, bmap); > + case ARM_SMCCC_OWNER_VENDOR_HYP: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_VENDOR_HYP, bmap); > + case ARM_SMCCC_OWNER_TRUSTED_APP ... ARM_SMCCC_OWNER_TRUSTED_APP_END: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_TRUSTED_APP, bmap); > + case ARM_SMCCC_OWNER_TRUSTED_OS ... ARM_SMCCC_OWNER_TRUSTED_OS_END: > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_TRUSTED_OS, bmap); > + default: > + return false; > + } You have multiple problems here: - the granularity is way too coarse. You want to express arbitrary ranges, and not necessarily grab a whole owner range. - you have now an overlap between ranges that are handled in the kernel (PSCI, spectre mitigations) and ranges that userspace wants to observe. Not good. If we are going down this road, this can only be done at the *function* level. And userspace must know that the kernel will refuse to forward some ranges. So obviously, this cannot be a simple bitmap. Making it a radix tree (or an xarray, which is basically the same thing) could work. And the filtering request from userspace can be similar to what we have for the PMU filters. > +} > + > +static void kvm_hvc_prepare_user_trap(struct kvm_vcpu *vcpu) > +{ > + struct kvm_run *run = vcpu->run; > + > + run->exit_reason = KVM_EXIT_HYPERCALL; > + run->hypercall.nr = smccc_get_function(vcpu); > + run->hypercall.args[0] = smccc_get_arg(vcpu, 1); > + run->hypercall.args[1] = smccc_get_arg(vcpu, 2); > + run->hypercall.args[2] = smccc_get_arg(vcpu, 3); > + run->hypercall.args[3] = smccc_get_arg(vcpu, 4); > + run->hypercall.args[4] = smccc_get_arg(vcpu, 5); > + run->hypercall.args[5] = smccc_get_arg(vcpu, 6); All of which is readily available through the ONE_REG interface. I'm mildly reluctant to expose another interface that disclose the same information (yes, I understand the performance impact). Thanks, M. -- Without deviation from the norm, progress is not possible.