On Wed, Jul 03, 2024 at 01:14:09PM -0700, Reinette Chatre wrote: > Hi Sean, > > On 6/28/24 5:39 PM, Sean Christopherson wrote: > > Forking this off to try and avoid confusion... > > > > On Wed, Jun 12, 2024, Reinette Chatre wrote: > > ... > > > > + > > > + freq = (tmict - tmcct) * tdcrs[i].divide_count * tsc_hz / (tsc1 - tsc0); > > > + /* Check if measured frequency is within 1% of configured frequency. */ > > > + GUEST_ASSERT(freq < apic_hz * 101 / 100); > > > + GUEST_ASSERT(freq > apic_hz * 99 / 100); > > > + } > > > > This test fails on our SKX, CLX, and ICX systems due to what appears to be a CPU > > bug. It looks like something APICv related is clobbering internal VMX timer state? > > Or maybe there's a tearing or truncation issue? > > It has been a few days. Just a note to let you know that we are investigating this. > On my side I have not yet been able to reproduce this issue. I tested > kvm-x86-next-2024.06.28 on an ICX and an CLX system by running 100 iterations of > apic_bus_clock_test and they all passed. Since I have lack of experience here there are > some Intel virtualization experts helping out with this investigation and I hope that > they will be some insights from the analysis and testing that you already provided. > > Reinette > I can reproduce this on my side, even with apicv disabled by 'insmod $kernel_path/arch/x86/kvm/kvm-intel.ko enable_apicv=N'. @Sean I think we're observing same issue, please see the details below: apic_bus_clock_test can't reproduce this may because the preemption timer value calculation relies on apic bus frequency/TMICT/Divide count/host TSC frequency and ratio between preemption timer and host TSC frequency, too many factors to generate the 'magic' value there. So I changed KVM and added a small KVM kselftest tool to set the preemption timer value directly from guest, this makes the reproducing easily. The changes are attached at end of this comment. The trace I captured below came form host with 1.7GHz TSC, the VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is enabled to get the cpu saved vmcs.VMX_PREEMPTION_TIMER_VALUE after VMEXIT. I set the vmcs.VMX_PREEMPTION_TIMER_VALUE to 0x880042ad which is the 'magic' number on this 1.7Ghz TSC machine: preempt_test 20677.199521: kvm:kvm_vmx_debug: kvm_vmx_debug: 3, a0:0x77fd5554 a1:0x880042ad a2:0x880042ad a3:0x20462e98d9b9 a0: The previous vmcs.VMX_PREEMPTION_TIMER_VALUE value saved by CPU when VMEXIT. a1: The new preemption timer value wrote to vmcs.VMX_PREEMPTION_TIMER_VALUE. a2: The value read back from vmcs.VMX_PREEMPTION_TIMER_VALUE, for double confirmation. a3: The host tsc at the time point, debug only. preempt_test 20677.199579: kvm:kvm_exit: reason PREEMPTION_TIMER rip 0x40274d info 0 0 intr 0 preempt_test 20677.199579: kvm:kvm_vmx_debug: kvm_vmx_debug: 2, a0:0x34 a1:0x0 a2:0x87fea9b0 a3:0x20462e9a5749 a0: The VMEXIT reason, 0x34 is preemption timer VMEXIT. a1: The read back vmcs.VMX_PREEMPTION_TIMER_VALUE value, here 0. a2: The next preemption timer value should be written to vmcs, calculates from the (target tsc - current tsc) >> 7. Now the preemption timer vmexit happend only after ~58 microseconds elapsed, it should happen after ~171.79 seconds but not such soon, the issue is reproduced. Another more easy way to observe this symptom w/o care the 'magic' preemption timer vlaue is use the maximum preemption timer value 0xffffffff, below log w/ 0xffffffff is captured from same machine: preempt_test 20530.456589: kvm:kvm_vmx_debug: kvm_vmx_debug: 3, a0:0x77fd5551 a1:0xffffffff a2:0xffffffff a3:0x200c1971ca5d a0: The previous vmcs.VMX_PREEMPTION_TIMER_VALUE value saved by CPU when VMEXIT. a1: The new preemption timer value wrote to vmcs.VMX_PREEMPTION_TIMER_VALUE. a2: The read back value from vmcs.VMX_PREEMPTION_TIMER_VALUE, double confirmation. a3: the host tsc at the time point, debug only. preempt_test 20530.456690: kvm:kvm_exit: reason VMCALL rip 0x4131a0 info 0 0 intr 0 The preempt_test checks preemption timer state every 100us, this VMEXIT is expected behavior. preempt_test 20530.456691: kvm:kvm_entry: vcpu 0, rip 0x4131a3 preempt_test 20530.456691: kvm:kvm_vmx_debug: kvm_vmx_debug: 3, a0:0x77ff82cc a1:0xfffe900b a2:0xfffe900b a3:0x200c19746d45 a0: The previous vmcs.VMX_PREEMPTION_TIMER_VALUE value saved by CPU when VMEXIT. The difference value shouldn't be such huge number 0xffffffff - 0x77ff82cc = 0x88007D33 when just ~100us elapsed from previous VMENTRY, the issue is reproduced. Use 0x88000000 as preemption timer value to verify this preempt_test tool, the preemption timer VMEXIT happend after ~171.45 seconds which is expected behavior on the host 1.7Ghz TSC system: The preemption timer VMEXIT should happen after: 2281701376 × 128 / 1700000000 = 171.79 seconds. Attached my changes in KVM and kselftest tool for reproducing here, based on: https://github.com/kvm-x86/linux.git tag:kvm-x86-next-2024.06.28 Patch 01: