Sean Christopherson <seanjc@xxxxxxxxxx> writes: > On Thu, Aug 25, 2022, Vitaly Kuznetsov wrote: >> Sean Christopherson <seanjc@xxxxxxxxxx> writes: >> >> > This is what I ended up with as a way to dig ourselves out of the eVMCS >> > conundrum. Not well tested, though KUT and selftests pass. The enforcement >> > added by "KVM: nVMX: Enforce unsupported eVMCS in VMX MSRs for host accesses" >> > is not tested at all (and lacks a changelog). >> >> Trying to enable KVM_CAP_HYPERV_ENLIGHTENED_VMCS2 in its new shape in >> QEMU so I can test it and I immediately stumble upon >> >> ~/qemu/build/qemu-system-x86_64 -machine q35,accel=kvm,kernel-irqchip=split -cpu host,hv-evmcs-2022,hv-evmcs,hv-vpindex,hv-vapic >> qemu-system-x86_64: error: failed to set MSR 0x48d to 0xff00000016 >> qemu-system-x86_64: ../target/i386/kvm/kvm.c:3107: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. >> >> Turns out, at least with "-cpu host" QEMU reads VMX feature MSRs first >> and enables eVMCS after. > > Heh, of course there had to be a corner case. > Unfortunatelly, it's not a corner case, named CPU models in QEMU behave exactly the same (I've just forgotten to add '+vmx' yesterday). In fact, it seems QEMU uses system-wide KVM_GET_MSRS (which results in vmx_get_msr_feature() for our case) which gives unfiltered values. As it is system wide it just can't filter anything. This happens even before KVM_CREATE_VCPU is called so switching to per-vCPU ioctl is not an option. What's worse is that all the discovered features (including VMX features) are passed to upper layers of the virtualization stack, starting with libvirt and upper layers may want to enable some of the "available" features explicitly. Teaching everyone what's available with eVMCS and what's not seems to be a hard task. This use-case can probably be solved by making eVMCS enablement a per-VM thing (already did locally) and creating a per-VM version of KVM_GET_MSRS which will give us filtered VMX MSRs when eVMCS was enabled. Note: silently filtering out features when vCPUs are created is bad as the list of such features will change over time. This is guaranteed to break migrations. Honestly I'm starting to think the 'evmcs revisions' idea (to keep the exact list of features in KVM and update them every couple years when new Hyper-V releases) is easier. It's just a list, it doesn't require much. The main downside, as was already named, is that userspace VMM doesn't see which VMX features are actually passed to the guest unless it is also taught about these "evmcs revisions" (more than what's the latest number available). This, to certain extent, can probably be solved by VMM itself by doing KVM_GET_MSRS after vCPU is created (this won't help much with feature discovery by upper layers, tough). This, however, is a new use-case, unsupported with the current KVM_CAP_HYPERV_ENLIGHTENED_VMCS implementation. eVMCS seems to be special in a way that a) it evolves over time b) it is mutually exclusive with *some* other features but the list changes. We don't seem to have anything like that in KVM/QEMU, thus all the confusion. -- Vitaly