On 9/10/2021 5:41 AM, Sean Christopherson wrote:
On Fri, Aug 27, 2021, Xiaoyao Li wrote:
CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
number of PT ADDR ranges.
KVM needs to check that guest CPUID values set by userspace doesn't
enable any bit which is not supported by bare metal. Otherwise,
1. it will trigger vm-entry failure if hardware unsupported bit is
exposed to guest and set by guest.
2. it triggers #GP when context switch PT MSRs if exposing more
RTIT_ADDR* MSRs than hardware capacity.
Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
---
There is bit 31 of CPUID(0xD, 0).ECX that doesn't restrict any bit in
MSR_IA32_RTIT_CTL. If guest has different value than host, it won't
cause any vm-entry failure, but guest will parse the PT packet with
wrong format.
I also check it to be same as host to ensure the virtualization correctness.
Changes in v2:
- Call out that if configuring more PT ADDR MSRs than hardware, it can
cause #GP when context switch.
---
arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 739be5da3bca..0c8e06a24156 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -76,6 +76,7 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
{
struct kvm_cpuid_entry2 *best;
+ u32 eax, ebx, ecx, edx;
/*
* The existing code assumes virtual address is 48-bit or 57-bit in the
@@ -89,6 +90,30 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
return -EINVAL;
}
+ /*
+ * CPUID 0xD leaves tell Intel PT capabilities, which decides
CPUID.0xD is XSAVE state, CPUID.0x14 is Intel PT. This series needs tests...
My apologize.
+ * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
+ *
+ * pt_desc.ctl_bitmask decides the legal value for guest
+ * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond native,
+ * otherwise it will trigger vm-entry failure if guest sets native
+ * unsupported bits in MSR_IA32_RTIT_CTL.
+ */
+ best = cpuid_entry2_find(entries, nent, 0xD, 0);
+ if (best) {
+ cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
+ if (best->ebx & ~ebx || best->ecx & ~ecx)
+ return -EINVAL;
+ }
+ best = cpuid_entry2_find(entries, nent, 0xD, 1);
+ if (best) {
+ cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
+ if (((best->eax & 0x7) > (eax & 0x7)) ||
Ugh, looking at the rest of the code, even this isn't sufficient because
pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e. running KVM on hardware
with >4 entries will lead to buffer overflows.
it's hardcoded to 4 because there is a note of "no processors support
more than 4 address ranges" in SDM vol.3 Chapter 31.3.1, table 31-11
One option would be to bump that to the theoretical max of 15, which doesn't seem
too horrible, especially if pt_desc as a whole is allocated on-demand, which it
probably should be since it isn't exactly tiny (nor ubiquitous)
A different option would be to let userspace define whatever it wants for guest
CPUID, and instead cap nr_addr_ranges at min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).
Letting userspace generate a bad MSR_IA32_RTIT_CTL is not problematic, there are
plenty of ways userspace can deliberately trigger VM-Entry failure due to invalid
guest state (even if this is a VM-Fail condition, it's not a danger to KVM).
I'm fine to only safe guard the nr_addr_range if VM-Entry failure
doesn't matter.
+ ((best->eax & ~eax) >> 16) ||
+ (best->ebx & ~ebx))
+ return -EINVAL;
+ }
+
return 0;
}
--
2.27.0