Hi Sean, On Tue, 2024-04-16 at 07:35 -0700, Sean Christopherson wrote: > On Tue, Apr 16, 2024, Julian Stecklina wrote: > > From: Thomas Prescher <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx> > > > > This issue occurs when the kernel is interrupted by a signal while > > running a L2 guest. If the signal is meant to be delivered to the > > L0 > > VMM, and L0 updates CR4 for L1, i.e. when the VMM sets > > KVM_SYNC_X86_SREGS in kvm_run->kvm_dirty_regs, the kernel programs > > an > > incorrect read shadow value for L2's CR4. > > > > The result is that the guest can read a value for CR4 where bits > > from > > L1 have leaked into L2. > > No, this is a userspace bug. If L2 is active when userspace stuffs > register state, > then from KVM's perspective the incoming value is L2's value. E.g. > if userspace > *wants* to update L2 CR4 for whatever reason, this patch would result > in L2 getting > a stale value, i.e. the value of CR4 at the time of VM-Enter. > > And even if userspace wants to change L1, this patch is wrong, as KVM > is writing > vmcs02.GUEST_CR4, i.e. is clobbering the L2 CR4 that was programmed > by L1, *and* > is dropping the CR4 value that userspace wanted to stuff for L1. > > To fix this, your userspace needs to either wait until L2 isn't > active, or force > the vCPU out of L2 (which isn't easy, but it's doable if absolutely > necessary). What you say makes sense. Is there any way for userspace to detect whether L2 is currently active after returning from KVM_RUN? I couldn't find anything in the official documentation https://docs.kernel.org/virt/kvm/api.html Can you point me into the right direction? > > Pulling in a snippet from the initial bug report[*], > > : The reason why this triggers in VirtualBox and not in Qemu is that > there are > : cases where VirtualBox marks CR4 dirty even though it hasn't > changed. > > simply not trying to stuff register state dirty when L2 is active > sounds like it > would resolve the issue. > > https://lore.kernel.org/all/af2ede328efee9dc3761333bd47648ee6f752686.camel@xxxxxxxxxxxxxxxxxxxxx > > > We found this issue by running uXen [1] as L2 in VirtualBox/KVM > > [2]. > > The issue can also easily be reproduced in Qemu/KVM if we force a > > sreg > > sync on each call to KVM_RUN [3]. The issue can also be reproduced > > by > > running a L2 Windows 10. In the Windows case, CR4.VMXE leaks from > > L1 > > to L2 causing the OS to blue-screen with a kernel thread exception > > during TLB invalidation where the following code sequence triggers > > the > > issue: > > > > mov rax, cr4 <--- L2 reads CR4 with contents from L1 > > mov rcx, cr4 > > btc 0x7, rax <--- L2 toggles CR4.PGE > > mov cr4, rax <--- #GP because L2 writes CR4 with reserved bits set > > mov cr4, rcx > > > > The existing code seems to fixup CR4_READ_SHADOW after calling > > vmx_set_cr4 except in __set_sregs_common. While we could fix it > > there > > as well, it's easier to just handle it centrally. > > > > There might be a similar issue with CR0. > > > > [1] https://github.com/OpenXT/uxen > > [2] https://github.com/cyberus-technology/virtualbox-kvm > > [3] > > https://github.com/tpressure/qemu/commit/d64c9d5e76f3f3b747bea7653d677bd61e13aafe > > > > Signed-off-by: Julian Stecklina > > <julian.stecklina@xxxxxxxxxxxxxxxxxxxxx> > > Signed-off-by: Thomas Prescher > > <thomas.prescher@xxxxxxxxxxxxxxxxxxxxx> > > SoB is reversed, yours should come after Thomas'. > > > --- > > arch/x86/kvm/vmx/vmx.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 6780313914f8..0d4af00245f3 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -3474,7 +3474,11 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, > > unsigned long cr4) > > hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | > > X86_CR4_PKE); > > } > > > > - vmcs_writel(CR4_READ_SHADOW, cr4); > > + if (is_guest_mode(vcpu)) > > + vmcs_writel(CR4_READ_SHADOW, > > nested_read_cr4(get_vmcs12(vcpu))); > > + else > > + vmcs_writel(CR4_READ_SHADOW, cr4); > > + > > vmcs_writel(GUEST_CR4, hw_cr4); > > > > if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE)) > > -- > > 2.43.2 > >