Re: [PATCH v3 3/4] KVM: nVMX: relax canonical checks on some x86 registers in vmx host state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



У пт, 2024-08-16 у 15:03 -0700, Sean Christopherson пише:
> On Fri, Aug 16, 2024, mlevitsk@xxxxxxxxxx wrote:
> > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 30 +++++++++++++++++++++++-------
> > >  1 file changed, 23 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index 2392a7ef254d..3f18edff80ac 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -2969,6 +2969,22 @@ static int nested_vmx_check_address_space_size(struct kvm_vcpu *vcpu,
> > >         return 0;
> > >  }
> > >  
> > > +static bool is_l1_noncanonical_address_static(u64 la, struct kvm_vcpu *vcpu)
> > > +{
> > > +       u8 max_guest_address_bits = guest_can_use(vcpu, X86_FEATURE_LA57) ? 57 : 48;
> 
> I don't see any reason to use LA57 support from guest CPUID for the VMCS checks.
> The virtualization hole exists can't be safely plugged for all cases, so why
> bother trying to plug it only for some cases?

I also thought like that but then there is another argument:

My idea was that the guest really ought to not put non canonical values if its CPUID doesn't
support 5 level paging. There is absolutely no reason for doing so.

If the guest does this though via WRMSR, most of the time the MSR is not intercepted, thus
it makes sense to allow this in emulation patch as well, as we discussed to be consistent.

But when VMRESUME/VMLAUNCH instruction, which is *always* emulated, writes those MSRS on VM exit,
then I don't see a reason to allow a virtualization hole.

But then as it turns out (I didn't expect that) that instructions like LGDT also don't check CR4.LA57,
and these are also passed through, then I guess singling out the VMX instructions is no longer better.

> 
> It'd be very odd that an L1 could set a "bad" value via WRMSR, but then couldn't
> load that same value on VM-Exit, e.g. if L1 gets the VMCS value by doing RDMSR.
> 
> > > +       /*
> > > +        * Most x86 arch registers which contain linear addresses like
> > > +        * segment bases, addresses that are used in instructions (e.g SYSENTER),
> > > +        * have static canonicality checks,
> > > +        * size of whose depends only on CPU's support for 5-level
> > > +        * paging, rather than state of CR4.LA57.
> > > +        *
> > > +        * In other words the check only depends on the CPU model,
> > > +        * rather than on runtime state.
> > > +        */
> > > +       return !__is_canonical_address(la, max_guest_address_bits);
> > > +}
> > > +
> > >  static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
> > >                                        struct vmcs12 *vmcs12)
> > >  {
> > > @@ -2979,8 +2995,8 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
> > >             CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
> > >                 return -EINVAL;
> > >  
> > > -       if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
> > > -           CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
> > > +       if (CC(is_l1_noncanonical_address_static(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
> > > +           CC(is_l1_noncanonical_address_static(vmcs12->host_ia32_sysenter_eip, vcpu)))
> > >                 return -EINVAL;
> > >  
> > >         if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) &&
> > > @@ -3014,11 +3030,11 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
> > >             CC(vmcs12->host_ss_selector == 0 && !ia32e))
> > >                 return -EINVAL;
> > >  
> > > -       if (CC(is_noncanonical_address(vmcs12->host_fs_base, vcpu)) ||
> > > -           CC(is_noncanonical_address(vmcs12->host_gs_base, vcpu)) ||
> > > -           CC(is_noncanonical_address(vmcs12->host_gdtr_base, vcpu)) ||
> > > -           CC(is_noncanonical_address(vmcs12->host_idtr_base, vcpu)) ||
> > > -           CC(is_noncanonical_address(vmcs12->host_tr_base, vcpu)) ||
> > > +       if (CC(is_l1_noncanonical_address_static(vmcs12->host_fs_base, vcpu)) ||
> > > +           CC(is_l1_noncanonical_address_static(vmcs12->host_gs_base, vcpu)) ||
> > > +           CC(is_l1_noncanonical_address_static(vmcs12->host_gdtr_base, vcpu)) ||
> > > +           CC(is_l1_noncanonical_address_static(vmcs12->host_idtr_base, vcpu)) ||
> > > +           CC(is_l1_noncanonical_address_static(vmcs12->host_tr_base, vcpu)) ||
> 
> If loads via LTR, LLDT, and LGDT are indeed exempt, then we need to update
> emul_is_noncanonical_address() too.

Sadly the answer to this is yes, at least on Intel. I will test on AMD soon, as soon as I grab
a Zen4 machine again.

And since these instructions are also all unintercepted, it also makes sense to use host cpuid
for them as well.

I attached two kvm unit tests, which I will hopefully polish for publishing soon, which pass
with flying colors with this patch series, and unless I made a mistake prove most of my
research.

The HOST_RIP field I checked separately by patching the L0 kernel, and observing it
either hang/crash or fail VM entry of the first guest.

Best regards,
	Maxim Levitsky

> 
> The best idea I have is to have a separate flow for system registers (not a great
> name, but I can't think of anything better), and the
> 
> E.g. s/is_host_noncanonical_msr_value/is_non_canonical_system_reg, and then
> wire that up to the emulator.
> 
> > >             CC(is_noncanonical_address(vmcs12->host_rip, vcpu)))
> > >                 return -EINVAL;
> > >  



> 

From e6d7bd2aa4f185881714a6103e9e672b0dbd12ab Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
Date: Fri, 16 Aug 2024 12:40:20 +0300
Subject: [PATCH 1/2] Add test for writing canonical values to various msrs
 Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>

---
 lib/x86/asm/setup.h |   1 +
 lib/x86/processor.h |   5 +
 lib/x86/vm.h        |   1 +
 x86/Makefile.x86_64 |   1 +
 x86/cstart64.S      |  35 +++++++
 x86/msr_canonical.c | 236 ++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 279 insertions(+)
 create mode 100644 x86/msr_canonical.c

diff --git a/lib/x86/asm/setup.h b/lib/x86/asm/setup.h
index 458eac858..399ced1f9 100644
--- a/lib/x86/asm/setup.h
+++ b/lib/x86/asm/setup.h
@@ -14,6 +14,7 @@ unsigned long setup_tss(u8 *stacktop);
 
 efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo);
 void setup_5level_page_table(void);
+void setup_4level_page_table(void);
 #endif /* CONFIG_EFI */
 
 void save_id(void);
diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index da1ed6628..d478eff91 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -468,6 +468,11 @@ static inline int rdmsr_safe(u32 index, uint64_t *val)
 	return rdreg64_safe("rdmsr", index, val);
 }
 
+static inline int rdmsr_fep_safe(u32 index, uint64_t *val)
+{
+	return __rdreg64_safe(KVM_FEP, "rdmsr", index, val);
+}
+
 static inline int wrmsr_safe(u32 index, u64 val)
 {
 	return wrreg64_safe("wrmsr", index, val);
diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index cf39787aa..60ace1a84 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -8,6 +8,7 @@
 #include "asm/bitops.h"
 
 void setup_5level_page_table(void);
+void setup_4level_page_table(void);
 
 struct pte_search {
 	int level;
diff --git a/x86/Makefile.x86_64 b/x86/Makefile.x86_64
index 2771a6fad..1bc1c10b0 100644
--- a/x86/Makefile.x86_64
+++ b/x86/Makefile.x86_64
@@ -38,6 +38,7 @@ tests += $(TEST_DIR)/rdpru.$(exe)
 tests += $(TEST_DIR)/pks.$(exe)
 tests += $(TEST_DIR)/pmu_lbr.$(exe)
 tests += $(TEST_DIR)/pmu_pebs.$(exe)
+tests += $(TEST_DIR)/msr_canonical.$(exe)
 
 ifeq ($(CONFIG_EFI),y)
 tests += $(TEST_DIR)/amd_sev.$(exe)
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 4dff11027..a91d55d00 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -92,6 +92,27 @@ switch_to_5level:
 	call enter_long_mode
 	jmpl $8, $lvl5
 
+
+switch_to_4level:
+	mov %cr0, %eax
+	btr $31, %eax
+	mov %eax, %cr0
+
+	mov $ptl4, %eax
+	mov %eax, pt_root
+
+	/* Disable CR4.LA57 */
+	mov %cr4, %eax
+	btr $12, %eax
+	mov %eax, %cr4
+
+	mov $0x10, %ax
+	mov %ax, %ss
+
+	call enter_long_mode
+	jmpl $8, $lvl5
+
+
 smp_stacktop:	.long stacktop - 4096
 
 .align 16
@@ -139,3 +160,17 @@ setup_5level_page_table:
 	lretq
 lvl5:
 	retq
+
+
+.globl setup_4level_page_table
+setup_4level_page_table:
+	/* Check if 4-level paging has already enabled */
+	mov %cr4, %rax
+	test $0x1000, %eax
+	jz lvl4
+
+	pushq $32
+	pushq $switch_to_4level
+	lretq
+lvl4:
+	retq
diff --git a/x86/msr_canonical.c b/x86/msr_canonical.c
new file mode 100644
index 000000000..89e809d90
--- /dev/null
+++ b/x86/msr_canonical.c
@@ -0,0 +1,236 @@
+#include "libcflat.h"
+
+#include "apic.h"
+#include "processor.h"
+#include "msr.h"
+#include "x86/vm.h"
+#include "asm/setup.h"
+
+static ulong msr_list[] = {
+	/*MSR_GS_BASE - tests needs this msr for _safe macros */
+	MSR_IA32_SYSENTER_ESP,
+	MSR_IA32_SYSENTER_EIP,
+	MSR_FS_BASE,
+	MSR_KERNEL_GS_BASE,
+	MSR_LSTAR,
+	MSR_CSTAR
+};
+
+#define TEST_VALUE0 0xffffffceb1600000
+#define TEST_VALUE1 0xff4547ceb1600000
+#define TEST_VALUE2 0xfa4547ceb1600000
+
+
+static void test_msrs0(void)
+{
+	int i;
+
+	for (i = 0 ; i < ARRAY_SIZE(msr_list) ; i++) {
+
+		u64 value = rdmsr(msr_list[i]);
+		u64 value1;
+		int vector;
+
+		// write test value via kvm
+		vector = wrmsr_fep_safe(msr_list[i], TEST_VALUE0);
+
+		if (vector)
+			report_fail("%d Fail to write msr via emulation", i);
+		else {
+			// read test value via hardware
+			if (rdmsr(msr_list[i]) != TEST_VALUE0)
+				report_fail("%d: Wrong msr value set via emulation", i);
+		}
+
+		// restore the original value
+		wrmsr(msr_list[i], value);
+
+		// now write test value via hardware
+		wrmsr(msr_list[i], TEST_VALUE0);
+
+		// now read test value via kvm
+		vector = rdmsr_fep_safe(msr_list[i], &value1);
+
+		if (vector)
+			report_fail("%d Fail to read msr via emulation\n", i);
+		else {
+			if( value1 != TEST_VALUE0)
+				report_fail("%d: Wrong value read via emulation", i);
+		}
+
+
+		// restore the original value
+		wrmsr(msr_list[i], value);
+	}
+}
+
+
+
+
+/*
+ * write non canonical for 4 level but canonical for 5 level paging
+ * value to the set of tested msrs
+ */
+static void test_msrs1(u64 test_value)
+{
+	int i, vector1, vector2;
+
+	for (i = 0 ; i < ARRAY_SIZE(msr_list) ; i++) {
+
+		u64 value = rdmsr(msr_list[i]);
+
+		vector1 = wrmsr_safe(msr_list[i], test_value);
+		wrmsr(msr_list[i], value);
+
+		vector2 = wrmsr_fep_safe(msr_list[i], test_value);
+		wrmsr(msr_list[i], value);
+
+		printf("%d: hw exception: %d kvm exception %d\n", i, vector1, vector2);
+
+	}
+}
+
+
+/*
+ * write non canonical for 4 level but canonical for 5 level paging
+ * value to the set of tested msrs while using 5 level paging,
+ * and then switch to 4 level paging
+ *
+ *
+ */
+static void test_msrs2(void)
+{
+	int i;
+
+	setup_5level_page_table();
+
+	for (i = 0 ; i < ARRAY_SIZE(msr_list) ; i++) {
+		wrmsr(msr_list[i], TEST_VALUE1);
+	}
+
+	setup_4level_page_table();
+
+	for (i = 0 ; i < ARRAY_SIZE(msr_list) ; i++)
+		if (rdmsr(msr_list[i]) != TEST_VALUE1)
+			report_fail("MSR %i didn't preserve value when switching back to 4 level paging", i);
+
+}
+
+
+static void test_lldt_host(u64 value)
+{
+	u16 orignal_ldt = sldt();
+
+	set_gdt_entry(FIRST_SPARE_SEL, value, 0x100, 0x82, 0);
+	lldt(FIRST_SPARE_SEL);
+	lldt(orignal_ldt);
+}
+
+static void test_ltr_host(u64 value)
+{
+	size_t tss_offset;
+
+	set_gdt_entry(FIRST_SPARE_SEL, value, 0x100, 0x89, 0);
+	ltr(FIRST_SPARE_SEL);
+
+	/* restore TSS*/
+	tss_offset = setup_tss(NULL);
+	load_gdt_tss(tss_offset);
+}
+
+static void test_lgdt_host(u64 value)
+{
+	struct descriptor_table_ptr dt_ptr;
+	u64 orig_base;
+
+	sgdt(&dt_ptr);
+	orig_base = dt_ptr.base;
+
+	dt_ptr.base = value;
+	lgdt(&dt_ptr);
+
+	dt_ptr.base = orig_base;
+	lgdt(&dt_ptr);
+}
+
+static void test_lidt_host(u64 value)
+{
+	struct descriptor_table_ptr dt_ptr;
+	u64 orig_base;
+
+	sidt(&dt_ptr);
+	orig_base = dt_ptr.base;
+
+	dt_ptr.base = value;
+	lidt(&dt_ptr);
+
+	dt_ptr.base = orig_base;
+	lidt(&dt_ptr);
+}
+
+
+static void test_special_bases(u64 value)
+{
+	test_lgdt_host(value);
+	test_lidt_host(value);
+
+	test_lldt_host(value);
+	test_ltr_host(value);
+
+	printf("Special bases test done for %lx\n", value);
+}
+
+
+int main(int argc, char **argv)
+{
+#ifndef CONFIG_EFI
+
+	printf("Basic msr test\n");
+	test_msrs0();
+
+	printf("testing msrs with 4 level paging\n");
+	test_msrs1(TEST_VALUE0);
+	printf("\n");
+
+	printf("testing msrs with 4 level paging (4 level non canonical value)\n");
+	test_msrs1(TEST_VALUE1);
+	printf("\n");
+
+	printf("testing msrs with 4 level paging (fully non canonical value)\n");
+	test_msrs1(TEST_VALUE2);
+	printf("\n");
+
+
+	setup_5level_page_table();
+
+	printf("testing msrs with 5 level paging\n");
+	test_msrs1(TEST_VALUE0);
+	printf("\n");
+
+
+	printf("testing msrs with 5 level paging (4 level non canonical value)\n");
+	test_msrs1(TEST_VALUE1);
+	printf("\n");
+
+	printf("testing msrs with 5 level paging (fully non canonical value)\n");
+	test_msrs1(TEST_VALUE2);
+	printf("\n");
+
+
+	printf("testing that msrs remain with non canonical values after switch to 4 level paging\n");
+	test_msrs2();
+
+	setup_5level_page_table();
+
+	test_special_bases(TEST_VALUE0);
+	test_special_bases(TEST_VALUE1);
+
+	setup_4level_page_table();
+
+	test_special_bases(TEST_VALUE0);
+	test_special_bases(TEST_VALUE1);
+
+#endif
+	return report_summary();
+}
+
-- 
2.40.1

From e14288f50896caaafbdd8e66d58fe757f237b13c Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
Date: Mon, 22 Jul 2024 11:09:40 -0400
Subject: [PATCH 2/2] vmx: add test for canonical checks on various fields

Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
---
 x86/vmx_tests.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index ffe7064c9..8f9784360 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -10732,6 +10732,7 @@ static void handle_exception_in_l1(u32 vector)
 	vmcs_write(EXC_BITMAP, old_eb);
 }
 
+
 static void vmx_exception_test(void)
 {
 	struct vmx_exception_test *t;
@@ -10754,6 +10755,155 @@ static void vmx_exception_test(void)
 	test_set_guest_finished();
 }
 
+
+#define TEST_VALUE_CANONICAL  0xffffffceb1600000
+#define TEST_VALUE_5CANONICAL 0xff4547ceb1600000
+
+static void vmx_canonical_test_guest(void)
+{
+	while (true) {
+		vmcall();
+	}
+}
+
+static int get_host_value(u64 vmcs_field, u64 *value)
+{
+	struct descriptor_table_ptr dt_ptr;
+
+	switch(vmcs_field) {
+	case HOST_SYSENTER_ESP:
+		*value = rdmsr(MSR_IA32_SYSENTER_ESP);
+		break;
+	case HOST_SYSENTER_EIP:
+		*value =  rdmsr(MSR_IA32_SYSENTER_EIP);
+		break;
+	case HOST_BASE_FS:
+		*value =  rdmsr(MSR_FS_BASE);
+		break;
+	case HOST_BASE_GS:
+		*value =  rdmsr(MSR_GS_BASE);
+		break;
+	case HOST_BASE_GDTR:
+		sgdt(&dt_ptr);
+		*value =  dt_ptr.base;
+		break;
+	case HOST_BASE_IDTR:
+		sidt(&dt_ptr);
+		*value =  dt_ptr.base;
+		break;
+	case HOST_BASE_TR:
+		*value = get_gdt_entry_base(get_tss_descr());
+		/* value might not reflect the actual base if changed by VMX */
+		return 1;
+	default:
+		assert(0);
+	}
+	return 0;
+}
+
+static void set_host_value(u64 vmcs_field, u64 value)
+{
+	struct descriptor_table_ptr dt_ptr;
+
+	switch(vmcs_field) {
+	case HOST_SYSENTER_ESP:
+		wrmsr(MSR_IA32_SYSENTER_ESP, value);
+		break;
+	case HOST_SYSENTER_EIP:
+		wrmsr(MSR_IA32_SYSENTER_EIP, value);
+		break;
+	case HOST_BASE_FS:
+		wrmsr(MSR_FS_BASE, value);
+		break;
+	case HOST_BASE_GS:
+		wrmsr(MSR_GS_BASE, value);
+		break;
+	case HOST_BASE_GDTR:
+		sgdt(&dt_ptr);
+		dt_ptr.base = value;
+		lgdt(&dt_ptr);
+		break;
+	case HOST_BASE_IDTR:
+		sidt(&dt_ptr);
+		dt_ptr.base = value;
+		lidt(&dt_ptr);
+		break;
+	case HOST_BASE_TR:
+		/* set the base and clear the busy bit */
+		set_gdt_entry(FIRST_SPARE_SEL, value, 0x200, 0x89, 0);
+		ltr(FIRST_SPARE_SEL);
+		break;
+	}
+}
+
+static void do_vmx_canonical_test_one_field(const char* name, u64 field)
+{
+	/* backup the msr and field values */
+	u64 host_org_value, test_value;
+	u64 field_org_value = vmcs_read(field);
+
+	get_host_value(field, &host_org_value);
+
+	/* write 57-canonical value on the host and check that it was written */
+	set_host_value(field, TEST_VALUE_5CANONICAL);
+	if (!get_host_value(field, &test_value)) {
+		report(test_value == TEST_VALUE_5CANONICAL, "%s: HOST value is set to test value directly", name);
+	}
+
+	/* write 57-canonical value via VMLANUCH/VMRESUME instruction*/
+	set_host_value(field, TEST_VALUE_CANONICAL);
+	vmcs_write(field, TEST_VALUE_5CANONICAL);
+
+	enter_guest();
+	skip_exit_vmcall();
+
+	if (!get_host_value(field, &test_value)) {
+		/* check that now msr value is the same as the field value*/
+		report(test_value == TEST_VALUE_5CANONICAL, "%s: HOST value is set to test value via VMLAUNCH/VMRESUME", name);
+	}
+
+	/* Restore original values */
+	vmcs_write(field, field_org_value);
+	set_host_value(field, host_org_value);
+}
+
+#define vmx_canonical_test_one_field(field) \
+	do_vmx_canonical_test_one_field(#field, field);
+
+
+
+static void test_lldt_host(u64 value)
+{
+	u16 orignal_ldt = sldt();
+
+	set_gdt_entry(FIRST_SPARE_SEL, value, 0x100, 0x82, 0);
+	lldt(FIRST_SPARE_SEL);
+	lldt(orignal_ldt);
+}
+
+static void vmx_canonical_test(void)
+{
+	report(!(read_cr4() & X86_CR4_LA57), "4 level paging");
+
+	test_set_guest(vmx_canonical_test_guest);
+
+	test_lldt_host(TEST_VALUE_5CANONICAL);
+
+	vmx_canonical_test_one_field(HOST_SYSENTER_ESP);
+	vmx_canonical_test_one_field(HOST_SYSENTER_EIP);
+
+	vmx_canonical_test_one_field(HOST_BASE_FS);
+	vmx_canonical_test_one_field(HOST_BASE_GS);
+
+	vmx_canonical_test_one_field(HOST_BASE_GDTR);
+	vmx_canonical_test_one_field(HOST_BASE_IDTR);
+
+	vmx_canonical_test_one_field(HOST_BASE_TR);
+
+
+	test_set_guest_finished();
+}
+
 enum Vid_op {
 	VID_OP_SET_ISR,
 	VID_OP_NOP,
@@ -11262,5 +11412,6 @@ struct vmx_test vmx_tests[] = {
 	TEST(vmx_pf_invvpid_test),
 	TEST(vmx_pf_vpid_test),
 	TEST(vmx_exception_test),
+	TEST(vmx_canonical_test),
 	{ NULL, NULL, NULL, NULL, NULL, {0} },
 };
-- 
2.40.1


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux