2018-01-25 01:55-0800, Liran Alon: > ----- vkuznets@xxxxxxxxxx wrote: > > I was investigating an issue with seabios >= 1.10 which stopped > > working > > for nested KVM on Hyper-V. The problem appears to be in > > handle_ept_violation() function: when we do fast mmio we need to skip > > the instruction so we do kvm_skip_emulated_instruction(). This, > > however, > > depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. > > However, this is not the case. > > > > Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when > > EPT MISCONFIG occurs. While on real hardware it was observed to be > > set, > > some hypervisors follow the spec and don't set it; we end up > > advancing > > IP with some random value. > > > > I checked with Microsoft and they confirmed they don't fill > > VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. > > > > Fix the issue by disabling fast mmio when running nested. > > > > Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > > --- > > arch/x86/kvm/vmx.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index c829d89e2e63..54afb446f38e 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu > > *vcpu) > > /* > > * A nested guest cannot optimize MMIO vmexits, because we have an > > * nGPA here instead of the required GPA. > > + * Skipping instruction below depends on undefined behavior: > > Intel's > > + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS > > + * when EPT MISCONFIG occurs and while on real hardware it was > > observed > > + * to be set, other hypervisors (namely Hyper-V) don't set it, we > > end > > + * up advancing IP with some random value. Disable fast mmio when > > + * running nested and keep it for real hardware in hope that > > + * VM_EXIT_INSTRUCTION_LEN will always be set correctly. > > If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG, > I don't think we should do this on real-hardware as-well. Neither do I, but you can see the last discussion on this topic, https://patchwork.kernel.org/patch/9903811/. In short, we've agreed to limit the hack to real hardware and wait for Intel or virtio changes. Michael and Jason, any progress on implementing a fast virtio mechanism that doesn't rely on undefined behavior? (Encode writing instruction length into last 4 bits of MMIO address, side-channel say that accesses to the MMIO area always use certain instruction length, use hypercall, ...) Thanks.