Re: [PATCH] KVM: stats: add stats to detect if vcpu is currently halted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+1 to the rephrasing of the commit message.

On Wed, Aug 18, 2021 at 11:09 AM Jing Zhang <jingzhangos@xxxxxxxxxx> wrote:
>
> Hi Sean,
>
> On Tue, Aug 17, 2021 at 4:46 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Tue, Aug 17, 2021, Jing Zhang wrote:
> > > Current guest/host/halt stats don't show when we are currently halting
> >
> > s/we are/KVM is
> >
> > And I would probably reword it to "when KVM is blocking a vCPU in response to
> > the vCPU activity state, e.g. halt".  More on that below.
> >
> > > well. If a guest halts for a long period of time they could appear
> > > pathologically blocked but really it's the opposite there's nothing to
> > > do.
> > > Simply count the number of times we enter and leave the kvm_vcpu_block
> >
> > s/we/KVM
> >
> > In general, it's good practice to avoid pronouns in comments and changelogs as
> > doing so all but forces using precise, unambiguous language.  Things like 'it'
> > and 'they' are ok when it's abundantly clear what they refer to, but 'we' and 'us'
> > are best avoided entirely.
> >
> > > function per vcpu, if they are unequal, then a VCPU is currently
> > > halting.
> > > The existing stats like halt_exits and halt_wakeups don't quite capture
> > > this. The time spend halted and halt polling is reported eventually, but
> > > not until we wakeup and resume. If a guest were to indefinitely halt one
> > > of it's CPUs we would never know, it may simply appear blocked.
> >      ^^^^      ^^
> >      its       userspace?
> >
> >
> > The "blocked" terminology is a bit confusing since KVM is explicitly blocking
> > the vCPU, it just happens to mostly do so in response to a guest HLT.  I think
> > "block" is intended to mean "vCPU task not run", but it would be helpful to make
> > that clear.
> >
> That's a good point. Will reword the comments as you suggested.
> > > Original-by: Cannon Matthews <cannonmatthews@xxxxxxxxxx>
> > > Signed-off-by: Jing Zhang <jingzhangos@xxxxxxxxxx>
> > > ---
> > >  include/linux/kvm_host.h  | 4 +++-
> > >  include/linux/kvm_types.h | 2 ++
> > >  virt/kvm/kvm_main.c       | 2 ++
> > >  3 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index d447b21cdd73..23d2e19af3ce 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -1459,7 +1459,9 @@ struct _kvm_stats_desc {
> > >       STATS_DESC_LOGHIST_TIME_NSEC(VCPU_GENERIC, halt_poll_fail_hist,        \
> > >                       HALT_POLL_HIST_COUNT),                                 \
> > >       STATS_DESC_LOGHIST_TIME_NSEC(VCPU_GENERIC, halt_wait_hist,             \
> > > -                     HALT_POLL_HIST_COUNT)
> > > +                     HALT_POLL_HIST_COUNT),                                 \
> > > +     STATS_DESC_COUNTER(VCPU_GENERIC, halt_block_starts),                   \
> > > +     STATS_DESC_COUNTER(VCPU_GENERIC, halt_block_ends)
> >
> > Why two counters?  It's per-vCPU, can't this just be a "blocked" flag or so?  I
> > get that all the other stats use "halt", but that's technically wrong as KVM will
> > block vCPUs that are not runnable for other reason, e.g. because they're in WFS
> > on x86.

The point is to separate "blocked because not runable" and "guest
explicitly asked to do nothing"

IIRC I originally wrote this patch to help discriminate how we spent
VCPU thread time,
in particular into two categories of essentially  "doing useful work
on behalf of the guest" and
"blocked from doing useful work."

Since a guest has explictly asked for a vcpu to HLT, this is "useful
work on behalf of the guest"
even though the thread is "blocked" from running.

This allows answering questions like, are we spending too much time
waiting on mutexes, or
long running kernel routines rather than running the vcpu in guest
mode, or did the guest explictly
tell us to not doing anything.

So I would suggest keeping the "halt" part of the counters' name, and
remove the "blocked" part
rather than the other way around. We explicitly do not want to include
non-halt blockages in this.

That being said I suppose a boolean could work as well. I think we did
this because it worked with
and mirrored existing stats rather than anything particularly nuanced.
Though there might be some
eventual consistency sort of concerns with how these stats are updated
and exported that could make
monotonic increasing counters more useful.

> The two counters are used to determine the reason why vCPU is not
> running. If the halt_block_ends is one less than halt_block_starts,
> then we know the vCPU is explicitly blocked by KVM. Otherwise, we know
> there might be something wrong with the vCPU. Does this make sense?
> Will rename from "halt_block_*" to "vcpu_block_*".
>
> Thanks,
> Jing



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux