On Thu, 10 Nov 2022 21:13:54 +0000, Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > On Thu, Nov 10, 2022 at 12:22:12PM +0000, Marc Zyngier wrote: > > > +static bool kvm_hvc_call_user_trapped(struct kvm_vcpu *vcpu, u32 func_id) > > > +{ > > > + struct kvm *kvm = vcpu->kvm; > > > + unsigned long *bmap = &kvm->arch.smccc_feat.user_trap_bmap; > > > + > > > + switch (ARM_SMCCC_OWNER_NUM(func_id)) { > > > + case ARM_SMCCC_OWNER_ARCH: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_ARCH, bmap); > > > + case ARM_SMCCC_OWNER_CPU: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_CPU, bmap); > > > + case ARM_SMCCC_OWNER_SIP: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_SIP, bmap); > > > + case ARM_SMCCC_OWNER_OEM: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_OEM, bmap); > > > + case ARM_SMCCC_OWNER_STANDARD: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_STANDARD, bmap); > > > + case ARM_SMCCC_OWNER_STANDARD_HYP: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_STANDARD_HYP, bmap); > > > + case ARM_SMCCC_OWNER_VENDOR_HYP: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_VENDOR_HYP, bmap); > > > + case ARM_SMCCC_OWNER_TRUSTED_APP ... ARM_SMCCC_OWNER_TRUSTED_APP_END: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_TRUSTED_APP, bmap); > > > + case ARM_SMCCC_OWNER_TRUSTED_OS ... ARM_SMCCC_OWNER_TRUSTED_OS_END: > > > + return test_bit(KVM_ARM_USER_HYPERCALL_OWNER_TRUSTED_OS, bmap); > > > + default: > > > + return false; > > > + } > > > > You have multiple problems here: > > > > - the granularity is way too coarse. You want to express arbitrary > > ranges, and not necessarily grab a whole owner range. > > > > - you have now an overlap between ranges that are handled in the > > kernel (PSCI, spectre mitigations) and ranges that userspace wants > > to observe. Not good. > > We need to come to agreement on what degree of mix-and-match should be > supported. > > Spectre really ought to be in the kernel, and I don't think anyone is > particularly excited about reimplementing PSCI. Right now my interest > in this starts and ends with forwarding the vendor-specific hypercall > range to userspace, allowing something like Hyper-V PV on KVM. > > > If we are going down this road, this can only be done at the > > *function* level. And userspace must know that the kernel will refuse > > to forward some ranges. > > The goal of what I was trying to get at is that either the kernel or > userspace takes ownership of a range that has an ABI, but not both. i.e. > you really wouldn't want some VMM or cloud provider trapping portions of > KVM's vendor-specific range while still reporting a 'vanilla' ABI at the > time of discovery. Same goes for PSCI, TRNG, etc. But I definitely think this is one of the major use cases. For example, there is value in taking PSCI to userspace in order to implement a newer version of the spec, or to support sub-features that KVM doesn't (want to) implement. I don't think this changes the ABI from the guest perspective. pKVM also has a use case for this where userspace gets a notification of the hypercall that a guest has performed to share memory. Communication with a TEE also is on the cards, as would be a FFA implementation. All of this could be implemented in KVM, or in userspace, depending what users of these misfeatures want to do. > > > So obviously, this cannot be a simple bitmap. Making it a radix tree > > (or an xarray, which is basically the same thing) could work. And the > > filtering request from userspace can be similar to what we have for > > the PMU filters. > > Right, we'll need a more robust data structure for all this. > > My only concern is that communicating the hypercall filter between > user/kernel with a set of ranges or function numbers is that we could be > mutating what KVM *doesn't* already implement into an ABI of sorts. > > i.e. suppose that userspace wants to filter function(s) in an > unallocated/unused range of function numbers. Later down the line KVM > adds support for a new shiny thing and the filter becomes a subset of a > now allocated range of calls. We then reject the filter due to the > incongruence. But isn't the problem to ask for ranges that are unallocated the first place? What semantic can userspace give to such a thing other than replying "not implemented", which is what the kernel would do anyway? The more interesting problem is when you want to emulate another hypervisor, and that the vendor spaces overlap (a very likely outcome). Somehow, this means overriding all the KVM-specific hypercalls, and let userspace deal with it. But again, this can be done on a per function basis. Thanks, M. -- Without deviation from the norm, progress is not possible.