On 1/7/21 4:33 AM, Vitaly Kuznetsov wrote: > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > >> On Wed, Jan 06, 2021, Vitaly Kuznetsov wrote: >>> Looking back, I don't quite understand why we wanted to account ticks >>> between vmexit and exiting guest context as 'guest' in the first place; >>> to my understanging 'guest time' is time spent within VMX non-root >>> operation, the rest is KVM overhead (system). >> With tick-based accounting, if the tick IRQ is received after PF_VCPU is cleared >> then that tick will be accounted to the host/system. The motivation for opening >> an IRQ window after VM-Exit is to handle the case where the guest is constantly >> exiting for a different reason _just_ before the tick arrives, e.g. if the guest >> has its tick configured such that the guest and host ticks get synchronized >> in a bad way. >> >> This is a non-issue when using CONFIG_VIRT_CPU_ACCOUNTING_GEN=y, at least with a >> stable TSC, as the accounting happens during guest_exit_irqoff() itself. >> Accounting might be less-than-stellar if TSC is unstable, but I don't think it >> would be as binary of a failure as tick-based accounting. >> > Oh, yea, I vaguely remember we had to deal with a very similar problem > but for userspace/kernel accounting. It was possible to observe e.g. a > userspace task going 100% kernel while in reality it was just perfectly > synchronized with the tick and doing a syscall just before it arrives > (or something like that, I may be misremembering the details). > > So depending on the frequency, it is probably possible to e.g observe > '100% host' with tick based accounting, the guest just has to > synchronize exiting to KVM in a way that the tick will always arrive > past guest_exit_irqoff(). > > It seems to me this is a fundamental problem in case the frequency of > guest exits can match the frequency of the time accounting tick. > Just to make sure that I am understanding things correctly. There are two issues: 1. The first issue is with the tick IRQs that arrive after PF_VCPU is cleared as they are then accounted into the system context atleast on the setup where CONFIG_VIRT_CPU_ACCOUNTING_GEN is not enabled. With the patch "KVM: x86: Unconditionally enable irqs in guest context", we are atleast taking care of the scenario where the guest context is exiting constantly just before the arrival of the tick. 2. The second issue that Sean mentioned was introduced because of moving guest_exit_irqoff() closer to VM-exit. Due to this change, any ticks that happen after IRQs are disabled are incorrectly accounted into the system context. This is because we exit the guest context early without ensuring if the required guest states to handle IRQs are restored. So, the increase in the system time (reported by cpuacct.stats) that I was observing is not entirely correct after all. Am I missing anything here? -- Thanks Nitesh