On Tue, Apr 16, 2024, Julian Stecklina wrote: > From: Thomas Prescher <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx> > > This issue occurs when the kernel is interrupted by a signal while > running a L2 guest. If the signal is meant to be delivered to the L0 > VMM, and L0 updates CR4 for L1, i.e. when the VMM sets > KVM_SYNC_X86_SREGS in kvm_run->kvm_dirty_regs, the kernel programs an > incorrect read shadow value for L2's CR4. > > The result is that the guest can read a value for CR4 where bits from > L1 have leaked into L2. No, this is a userspace bug. If L2 is active when userspace stuffs register state, then from KVM's perspective the incoming value is L2's value. E.g. if userspace *wants* to update L2 CR4 for whatever reason, this patch would result in L2 getting a stale value, i.e. the value of CR4 at the time of VM-Enter. And even if userspace wants to change L1, this patch is wrong, as KVM is writing vmcs02.GUEST_CR4, i.e. is clobbering the L2 CR4 that was programmed by L1, *and* is dropping the CR4 value that userspace wanted to stuff for L1. To fix this, your userspace needs to either wait until L2 isn't active, or force the vCPU out of L2 (which isn't easy, but it's doable if absolutely necessary). Pulling in a snippet from the initial bug report[*], : The reason why this triggers in VirtualBox and not in Qemu is that there are : cases where VirtualBox marks CR4 dirty even though it hasn't changed. simply not trying to stuff register state dirty when L2 is active sounds like it would resolve the issue. https://lore.kernel.org/all/af2ede328efee9dc3761333bd47648ee6f752686.camel@xxxxxxxxxxxxxxxxxxxxx > We found this issue by running uXen [1] as L2 in VirtualBox/KVM [2]. > The issue can also easily be reproduced in Qemu/KVM if we force a sreg > sync on each call to KVM_RUN [3]. The issue can also be reproduced by > running a L2 Windows 10. In the Windows case, CR4.VMXE leaks from L1 > to L2 causing the OS to blue-screen with a kernel thread exception > during TLB invalidation where the following code sequence triggers the > issue: > > mov rax, cr4 <--- L2 reads CR4 with contents from L1 > mov rcx, cr4 > btc 0x7, rax <--- L2 toggles CR4.PGE > mov cr4, rax <--- #GP because L2 writes CR4 with reserved bits set > mov cr4, rcx > > The existing code seems to fixup CR4_READ_SHADOW after calling > vmx_set_cr4 except in __set_sregs_common. While we could fix it there > as well, it's easier to just handle it centrally. > > There might be a similar issue with CR0. > > [1] https://github.com/OpenXT/uxen > [2] https://github.com/cyberus-technology/virtualbox-kvm > [3] https://github.com/tpressure/qemu/commit/d64c9d5e76f3f3b747bea7653d677bd61e13aafe > > Signed-off-by: Julian Stecklina <julian.stecklina@xxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Thomas Prescher <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx> SoB is reversed, yours should come after Thomas'. > --- > arch/x86/kvm/vmx/vmx.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 6780313914f8..0d4af00245f3 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -3474,7 +3474,11 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) > hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE); > } > > - vmcs_writel(CR4_READ_SHADOW, cr4); > + if (is_guest_mode(vcpu)) > + vmcs_writel(CR4_READ_SHADOW, nested_read_cr4(get_vmcs12(vcpu))); > + else > + vmcs_writel(CR4_READ_SHADOW, cr4); > + > vmcs_writel(GUEST_CR4, hw_cr4); > > if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE)) > -- > 2.43.2 >