On 16/08/2017 21:59, Michael S. Tsirkin wrote: > On Wed, Aug 16, 2017 at 09:03:17PM +0200, Radim Krčmář wrote: >> 2017-08-16 19:19+0200, Paolo Bonzini: >>> On 16/08/2017 18:50, Michael S. Tsirkin wrote: >>>> On Wed, Aug 16, 2017 at 03:30:31PM +0200, Paolo Bonzini wrote: >>>>> While you can filter out instruction fetches, that's not enough. A data >>>>> read could happen because someone pointed the IDT to MMIO area, and who >>>>> knows what the VM-exit instruction length points to in that case. >>>> >>>> Thinking more about it, I don't really see how anything >>>> legal guest might be doing with virtio would trigger anything >>>> but a fault after decoding the instruction. How does >>>> skipping instruction even make sense in the example you give? >>> >>> There's no such thing as a legal guest. Anything that the hypervisor >>> does, that differs from real hardware, is a possible escalation path. >>> >>> This in fact makes me doubt the EMULTYPE_SKIP patch too. >> >> The main hack is that we expect EPT misconfig within a given range to be >> a MMIO NULL write. I think it is fine -- EMULTYPE_SKIP is a common path >> that should have well tested error paths and, IIUC, virtio doesn't allow >> any other access, so it is a problem of the guest if a buggy/malicious >> application can access virtio memory. Yes, I agree. EMULTYPE_SKIP is fine because failed decoding still causes an exception to be injected. Maybe it's better to gate the EMULTYPE_SKIP emulation on the exit qualification saying this is a write and also not a page table walk---just in case. >>>> how about we blacklist nested virt for this optimization? >> >> Not every hypervisor can be easily detected ... > > Hypervisors that don't set a hypervisor bit in CPUID are violating the > spec themselves, aren't they? Anyway, we can add a management option > for use in a nested scenario. No, the hypervisor bit only says that CPUID leaf 0x40000000 is defined. See for example https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458: "Intel and AMD have also reserved CPUID leaves 0x40000000 - 0x400000FF for software use. Hypervisors can use these leaves to provide an interface to pass information from the hypervisor to the guest operating system running inside a virtual machine. The hypervisor bit indicates the presence of a hypervisor and that it is safe to test these additional software leaves". >> KVM uses standard features and SDM clearly says that the >> instruction length field is undefined. > > True. Let's see whether intel can commit to a stronger definition. > I don't think there's any rush to make this change. I disagree. Relying on undefined processor features is a bad idea. > It's just that this has been there for 3 years and people have built a > product around this. Around 700 clock cycles? Paolo