On 01/05/2020 23:45, Sean Christopherson wrote:
On Fri, May 01, 2020 at 11:51:47AM -0700, Forrest Yuan Yu wrote:
The purpose of this new hypercall is to exchange message between
guest and hypervisor. For example, a guest may want to ask hypervisor
to harden security by setting restricted access permission on guest
SLAT entry. In this case, the guest can use this hypercall to send
a message to the hypervisor which will do its job and send back
anything it wants the guest to know.
Hrm, so this reintroduces KVM_EXIT_HYPERCALL without justifying _why_ it
needs to be reintroduced. I'm not familiar with the history, but the
comments in the documentation advise "use KVM_EXIT_IO or KVM_EXIT_MMIO".
Both of these options have the disadvantage of requiring instruction
emulation (Although a trivial one for PIO). Which enlarge host attack
surface.
I think this is one of the reasons why Hyper-V defined their PV devices
(E.g. NetVSC/StorVSC) doorbell kick with hypercall instead of PIO/MMIO.
(This is currently not much relevant as KVM's instruction emulator is
not optional and is not even offloaded to host userspace. But relevant
for the future...)
Off the top of my head, IO and/or MMIO has a few advantages:
- Allows the guest kernel to delegate permissions to guest userspace,
whereas KVM restrict hypercalls to CPL0.
- Allows "pass-through", whereas VMCALL is unconditionally forwarded to
L1.
- Is vendor agnostic, e.g. VMX and SVM recognized different opcodes for
VMCALL vs VMMCALL.
I agree with all the above (I believe similar rational had led VMware to
design their Backdoor PIO interface).
Also note that recently AWS introduced Nitro Enclave PV device which is
also de-facto a PV control-plane interface between guest and host userspace.
Why similar approach couldn't have been used here?
(Capability is exposed on a per-VM basis by attaching PV device to VM,
communication interface is device specific and no KVM changes, only host
userspace changes).
Signed-off-by: Forrest Yuan Yu <yuanyu@xxxxxxxxxx>
---
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..ff313f6827bf 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,9 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
before using paravirtualized
sched yield.
+KVM_FEATURE_UCALL 14 guest checks this feature bit
+ before calling hypercall ucall.
Why make the UCALL a KVM CPUID feature? I can understand wanting to query
KVM support from host userspace, but presumably the guest will care about
capabiliteis built on top of the UCALL, not the UCALL itself.
I agree with this.
In case of PV device approach, device detection by guest will be the
capability discovery.
+
KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
per-cpu warps are expeced in
kvmclock
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5835f9cb9ad..388a4f89464d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3385,6 +3385,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_GET_MSR_FEATURES:
case KVM_CAP_MSR_PLATFORM_INFO:
case KVM_CAP_EXCEPTION_PAYLOAD:
+ case KVM_CAP_UCALL:
For whatever reason I have a metnal block with UCALL, can we call this
KVM_CAP_USERSPACE_HYPERCALL
+1
r = 1;
break;
case KVM_CAP_SYNC_REGS:
@@ -4895,6 +4896,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
kvm->arch.exception_payload_enabled = cap->args[0];
r = 0;
break;
+ case KVM_CAP_UCALL:
+ kvm->arch.hypercall_ucall_enabled = cap->args[0];
+ r = 0;
+ break;
default:
r = -EINVAL;
break;
@@ -7554,6 +7559,19 @@ static void kvm_sched_yield(struct kvm *kvm, unsigned long dest_id)
kvm_vcpu_yield_to(target);
}
+static int complete_hypercall(struct kvm_vcpu *vcpu)
+{
+ u64 ret = vcpu->run->hypercall.ret;
+
+ if (!is_64_bit_mode(vcpu))
Do we really anticipate adding support in 32-bit guests? Honest question.
+ ret = (u32)ret;
+ kvm_rax_write(vcpu, ret);
+
+ ++vcpu->stat.hypercalls;
+
+ return kvm_skip_emulated_instruction(vcpu);
+}
+
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
{
unsigned long nr, a0, a1, a2, a3, ret;
@@ -7605,17 +7623,26 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
kvm_sched_yield(vcpu->kvm, a0);
ret = 0;
break;
+ case KVM_HC_UCALL:
+ if (vcpu->kvm->arch.hypercall_ucall_enabled) {
+ vcpu->run->hypercall.nr = nr;
+ vcpu->run->hypercall.args[0] = a0;
+ vcpu->run->hypercall.args[1] = a1;
+ vcpu->run->hypercall.args[2] = a2;
+ vcpu->run->hypercall.args[3] = a3;
If performance is a justification for a more direct userspace exit, why
limit it to just four parameters? E.g. why not copy all registers to
kvm_sync_regs and reverse the process on the way back in?
I don't think performance should be relevant for a hypercall interface.
It's control-plane path.
If a fast-path is required, guest should use this interface to
coordinate a separate fast-path (e.g. via ring-buffer on some guest
memory page).
Anyway, these kind of questions is another reason why I agree with Sean
it seems using a PV device is preferred.
Instead of forcing a general userspace hypercall interface standard, one
could just implement whatever PV device it wants in host userspace which
is device specific.
In QEMU's VMPort implementation BTW, userspace calls
cpu_synchronize_state() which de-facto syncs tons of vCPU state from KVM
to userspace. Not just the GP registers.
Because it's a slow-path, it's considered fine. :P
-Liran
+ vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->arch.complete_userspace_io = complete_hypercall;
+ return 0; // message is going to userspace
+ }
+ ret = -KVM_ENOSYS;
+ break;
default:
ret = -KVM_ENOSYS;
break;
}
out:
- if (!op_64_bit)
- ret = (u32)ret;
- kvm_rax_write(vcpu, ret);
-
- ++vcpu->stat.hypercalls;
- return kvm_skip_emulated_instruction(vcpu);
+ vcpu->run->hypercall.ret = ret;
+ return complete_hypercall(vcpu);
}
EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);