On Thu, Jul 25, 2024 at 01:52:32PM -0400, Maxim Levitsky wrote: KVM: VMX: >reset the segment cache after segment initialization in vmx_vcpu_reset >to avoid stale uninitialized data being cached in the segment cache. > >In particular the following scenario is possible when full preemption >is enabled: > >- vCPU is just created, and the vCPU thread is preempted before SS.AR_BYTES >is written in vmx_vcpu_reset. > >- During preemption, the kvm_arch_vcpu_in_kernel is called which >reads SS's segment AR byte to determine if the CPU was in the kernel. > >That caches 0 value of SS.AR_BYTES, then eventually the vCPU thread will be >preempted back, then set the correct SS.AR_BYTES value in the vmcs >and the cached value will remain stale, and could be read e.g via >KVM_GET_SREGS. > >Usually this is not a problem because VMX segment cache is reset on each >vCPU run, but if the userspace (e.g KVM selftests do) reads the segment >registers just after the vCPU was created, and modifies some of them >but passes through other registers and in this case SS.AR_BYTES, >the stale value of it will make it into the vmcs, >and later lead to a VM entry failure due to incorrect SS segment type. I looked into the same issue last week, which was reported by someone internally. > >Fix this by moving the vmx_segment_cache_clear() call to be after the >segments are initialized. > >Note that this still doesn't fix the issue of kvm_arch_vcpu_in_kernel >getting stale data during the segment setup, and that issue will >be addressed later. > >Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Do you need a Fixes tag and/or Cc: stable? >--- > arch/x86/kvm/vmx/vmx.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >index fa9f307d9b18..d43bb755e15c 100644 >--- a/arch/x86/kvm/vmx/vmx.c >+++ b/arch/x86/kvm/vmx/vmx.c >@@ -4870,9 +4870,6 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > vmx->hv_deadline_tsc = -1; > kvm_set_cr8(vcpu, 0); > >- vmx_segment_cache_clear(vmx); >- kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS); >- > seg_setup(VCPU_SREG_CS); > vmcs_write16(GUEST_CS_SELECTOR, 0xf000); > vmcs_writel(GUEST_CS_BASE, 0xffff0000ul); >@@ -4899,6 +4896,9 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > vmcs_writel(GUEST_IDTR_BASE, 0); > vmcs_write32(GUEST_IDTR_LIMIT, 0xffff); > >+ vmx_segment_cache_clear(vmx); >+ kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS); vmx_segment_cache_clear() is called in a few other sites. I think at least the call in __vmx_set_segment() should be fixed, because QEMU may read SS.AR right after a write to it. if the write was preempted after the cache was cleared but before the new value being written into VMCS, QEMU would find that SS.AR held a stale value. >+ > vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); > vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); > vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0); >-- >2.26.3 > >