Excerpts from Laurent Vivier's message of October 28, 2021 10:48 pm: > On 27/10/2021 16:21, Nicholas Piggin wrote: >> From: Laurent Vivier <lvivier@xxxxxxxxxx> >> >> Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest >> context before enabling irqs") moved guest_exit() into the interrupt >> protected area to avoid wrong context warning (or worse). The problem is >> that tick-based time accounting has not yet been updated at this point >> (because it depends on the timer interrupt firing), so the guest time >> gets incorrectly accounted to system time. >> >> To fix the problem, follow the x86 fix in commit 160457140187 ("Defer >> vtime accounting 'til after IRQ handling"), and allow host IRQs to run >> before accounting the guest exit time. >> >> In the case vtime accounting is enabled, this is not required because TB >> is used directly for accounting. >> >> Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a >> guest running a kernel compile, the 'guest' fields of /proc/stat are >> stuck at zero. With the patch they can be observed increasing roughly as >> expected. >> >> Fixes: e233d54d4d97 ("KVM: booke: use __kvm_guest_exit") >> Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs") >> Cc: <stable@xxxxxxxxxxxxxxx> # 5.12 >> Signed-off-by: Laurent Vivier <lvivier@xxxxxxxxxx> >> [np: only required for tick accounting, add Book3E fix, tweak changelog] >> Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> >> --- >> Since v2: >> - I took over the patch with Laurent's blessing. >> - Changed to avoid processing IRQs if we do have vtime accounting >> enabled. >> - Changed so in either case the accounting is called with irqs disabled. >> - Added similar Book3E fix. >> - Rebased on upstream, tested, observed bug and confirmed fix. >> >> arch/powerpc/kvm/book3s_hv.c | 30 ++++++++++++++++++++++++++++-- >> arch/powerpc/kvm/booke.c | 16 +++++++++++++++- >> 2 files changed, 43 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 2acb1c96cfaf..7b74fc0a986b 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -3726,7 +3726,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) >> >> kvmppc_set_host_core(pcpu); >> >> - guest_exit_irqoff(); >> + context_tracking_guest_exit(); >> + if (!vtime_accounting_enabled_this_cpu()) { >> + local_irq_enable(); >> + /* >> + * Service IRQs here before vtime_account_guest_exit() so any >> + * ticks that occurred while running the guest are accounted to >> + * the guest. If vtime accounting is enabled, accounting uses >> + * TB rather than ticks, so it can be done without enabling >> + * interrupts here, which has the problem that it accounts >> + * interrupt processing overhead to the host. >> + */ >> + local_irq_disable(); >> + } >> + vtime_account_guest_exit(); >> >> local_irq_enable(); >> >> @@ -4510,7 +4523,20 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, >> >> kvmppc_set_host_core(pcpu); >> >> - guest_exit_irqoff(); >> + context_tracking_guest_exit(); >> + if (!vtime_accounting_enabled_this_cpu()) { >> + local_irq_enable(); >> + /* >> + * Service IRQs here before vtime_account_guest_exit() so any >> + * ticks that occurred while running the guest are accounted to >> + * the guest. If vtime accounting is enabled, accounting uses >> + * TB rather than ticks, so it can be done without enabling >> + * interrupts here, which has the problem that it accounts >> + * interrupt processing overhead to the host. >> + */ >> + local_irq_disable(); >> + } >> + vtime_account_guest_exit(); >> >> local_irq_enable(); >> >> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c >> index 977801c83aff..8c15c90dd3a9 100644 >> --- a/arch/powerpc/kvm/booke.c >> +++ b/arch/powerpc/kvm/booke.c >> @@ -1042,7 +1042,21 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr) >> } >> >> trace_kvm_exit(exit_nr, vcpu); >> - guest_exit_irqoff(); >> + >> + context_tracking_guest_exit(); >> + if (!vtime_accounting_enabled_this_cpu()) { >> + local_irq_enable(); >> + /* >> + * Service IRQs here before vtime_account_guest_exit() so any >> + * ticks that occurred while running the guest are accounted to >> + * the guest. If vtime accounting is enabled, accounting uses >> + * TB rather than ticks, so it can be done without enabling >> + * interrupts here, which has the problem that it accounts >> + * interrupt processing overhead to the host. >> + */ >> + local_irq_disable(); >> + } >> + vtime_account_guest_exit(); >> >> local_irq_enable(); >> >> > > I'm wondering if we should put the context_tracking_guest_exit() just after the > "srcu_read_unlock(&vc->kvm->srcu, srcu_idx);" as it was before 61bd0f66ff92 ("KVM: PPC: > Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN")? For the run_single_vcpu path, I _think_ it shouldn't matter. It's mostly just fixing up low level powerpc details. But actually I wonder whether we should move the guest context entirely inside the SRCU lock because it's a high level host locking primitive. For the kvmppc_run_core path, we are actually taking the vc->lock spin lock as well. Arguably it's waiting for secondary threads in the guest but I think changing to host context as soon as possible could make sense. If we don't have an actual bug identified it could wait for next merge perhaps. Thanks, Nick >