Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/20/2023 9:58 AM, Chao Gao wrote:
On Thu, Jul 20, 2023 at 09:25:14AM +0800, Xiaoyao Li wrote:
On 7/20/2023 2:08 AM, Jim Mattson wrote:
Normally, we would restrict guest MSR writes based on guest CPU
features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
the case.

This issue isn't specific to the two MSRs. Any MSRs that are not
intercepted and with some reserved bits for future extenstions may run
into this issue. Right?

The luck is KVM defines a list of MSRs that can be passthrough for vmx:

static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS]  = {
	MSR_IA32_SPEC_CTRL,
	MSR_IA32_PRED_CMD,
	MSR_IA32_FLUSH_CMD,
	MSR_IA32_TSC,
#ifdef CONFIG_X86_64
	MSR_FS_BASE,
	MSR_GS_BASE,
	MSR_KERNEL_GS_BASE,
	MSR_IA32_XFD,
	MSR_IA32_XFD_ERR,
#endif
	MSR_IA32_SYSENTER_CS,
	MSR_IA32_SYSENTER_ESP,
	MSR_IA32_SYSENTER_EIP,
	MSR_CORE_C1_RES,
	MSR_CORE_C3_RESIDENCY,
	MSR_CORE_C6_RESIDENCY,
	MSR_CORE_C7_RESIDENCY,
};

and only a few of them has reserved bits. It's feasible to fix them.

IMO, it is a conflict of interests between
disabling MSR write intercept for less VM-exits and host's control over
the value written to the MSR by guest.

We may need something like CR0/CR4 masks and read shadows for all MSRs
to address this fundamental issue.

It looks unacceptable for HW vendor. There are so many MSRs.


For the first non-zero write to IA32_SPEC_CTRL, we check to see that
the host supports the value written. We don't care whether or not the
guest supports the value written (as long as it supports the MSR).
After the first non-zero write, we stop intercepting writes to
IA32_SPEC_CTRL, so the guest can write any value supported by the
hardware. This could be problematic in heterogeneous migration pools.
For instance, a VM that starts on a Cascade Lake host may set
IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
migrated to a Skylake host, KVM_SET_MSRS will refuse to set
IA32_SPEC_CTRL to its current value, because Skylake doesn't support
PSFD.

It is a guest fault. Can we modify guest kernel in this case?

I don't think it's a guest fault. Guest can do whatever it wants and KVM cannot expect guest's behavior.


We disable write intercepts IA32_PRED_CMD as long as the guest
supports the MSR. That's fine for now, since only one bit of PRED_CMD
has been defined. Hence, guest support and host support are
equivalent...today. But, are we really comfortable with letting the
guest set any IA32_PRED_CMD bit that may be defined in the future?

The same question applies to IA32_SPEC_CTRL. Are we comfortable with
letting the guest write to any bit that may be defined in the future?

My point is we need to fix it, though Chao has different point that sometimes
performance may be more important[*]

[*] https://lore.kernel.org/all/ZGdE3jNS11wV+V2w@chao-email/

Maybe KVM can provide options to QEMU. e.g., we can define a KVM quirk.
Disabling the quirk means always intercept IA32_SPEC_CTRL MSR writes.


At least the AMD approach with V_SPEC_CTRL prevents the guest from
clearing any bits set by the host, but on Intel, it's a total
free-for-all. What happens when a new bit is defined that absolutely
must be set to 1 all of the time?

I suppose there is no such bit now. For SPR and future CPUs, "virtualize
IA32_SPEC_CTRL" VMX feature can lock some bits to 0 or 1 regardless of
the value written by guests.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux