Hi Sean, On Tue Feb 20, 2024 at 4:18 PM UTC, Sean Christopherson wrote: > On Mon, Feb 19, 2024, Nicolas Saenz Julienne wrote: > > Under certain extreme conditions, the tick-based cputime accounting may > > produce inaccurate data. For instance, guest CPU usage is sensitive to > > interrupts firing right before the tick's expiration. This forces the > > guest into kernel context, and has that time slice wrongly accounted as > > system time. This issue is exacerbated if the interrupt source is in > > sync with the tick, significantly skewing usage metrics towards system > > time. > > ... > > > NOTE: This wasn't tested in depth, and it's mostly intended to highlight > > the issue we're trying to solve. Also ccing KVM folks, since it's > > relevant to guest CPU usage accounting. > > How bad is the synchronization issue on upstream kernels? We tried to address > that in commit 160457140187 ("KVM: x86: Defer vtime accounting 'til after IRQ handling"). > > I don't expect it to be foolproof, but it'd be good to know if there's a blatant > flaw and/or easily closed hole. The issue is not really about the interrupts themselves, but their side effects. For instance, let's say the guest sets up an Hyper-V stimer that consistently fires 1 us before the preemption tick. The preemption tick will expire while the vCPU thread is running with !PF_VCPU (maybe inside kvm_hv_process_stimers() for ex.). As long as they both keep in sync, you'll get a 100% system usage. I was able to reproduce this one through kvm-unit-tests, but the race window is too small to keep the interrupts in sync for long periods of time, yet still capable of producing random system usage bursts (which unacceptable for some use-cases). Other use-cases have bigger race windows and managed to maintain high system CPU usage over long periods of time. For example, with user-space HPET emulation, or KVM+Xen (don't know the fine details on these, but VIRT_CPU_ACCOUNTING_GEN fixes the mis-accounting). It all comes down to the same situation. Something triggers an exit, and the vCPU thread goes past 'vtime_account_guest_exit()' just in time for the tick interrupt to show up. Note that we're running with 160457140187 ("KVM: x86: Defer vtime accounting 'til after IRQ handling"), on the kernel that reproduced these issues. The RFC fix was tested against an upstream kernel by tracing cputime accounting and making sure the right code-paths were exercised. Nicolas