RE: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation

Bhushan Bharat-R65777 <R65777@xxxxxxxxxxxxx> · Tue, 17 Jul 2012 09:57:08 +0000

> -----Original Message-----
> From: kvm-ppc-owner@xxxxxxxxxxxxxxx [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On
> Behalf Of Alexander Graf
> Sent: Tuesday, July 17, 2012 12:50 PM
> To: Wood Scott-B07421
> Cc: Bhushan Bharat-R65777; <kvm-ppc@xxxxxxxxxxxxxxx>; <kvm@xxxxxxxxxxxxxxx>;
> <bharatb.yadav@xxxxxxxxx>; Bhushan Bharat-R65777; Benjamin Herrenschmidt; Kumar
> Gala
> Subject: Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation
> 
> 
> 
> On 17.07.2012, at 03:02, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:
> 
> > On 07/16/2012 12:18 PM, Alexander Graf wrote:
> >>> +/*
> >>> + * Return the number of jiffies until the next timeout.  If the
> >> timeout is
> >>> + * longer than the NEXT_TIMER_MAX_DELTA, that
> >>
> >> then?
> >>
> >>> return NEXT_TIMER_MAX_DELTA
> >>> + * instead.
> >>
> >> I can read code.
> >
> > Come on, it's not exactly x++; /* add one to x */
> >
> > It's faster to read code (as well as know the constraints within which
> > you can modify it without having to spend a lot of time digesting all
> > the callers' use cases) when you have a high level description of its
> > interface contract, and can be selective about when to zoom in to the
> > details.  Linux kernel code tends to be bad about this.
> 
> Yeah, not opposed to leave that part in :).
> 
> >
> >> The important piece of information in the comment is
> >> missing: The reason.
> >
> > The reason for what?  Why you want to know the next timeout?  That's
> > the caller's business.  Or why we use NEXT_TIMER_MAX_DELTA as the limit?
> 
> Why we use the limit. IIRC it was explained in the last thread, just didn't make
> its way into the comment.

Earlier we have a comment on the #define MAX_TIMEOUT (new define added for a purpose, so the comment described the puspose).
Now we uses the generic #define NEX_TIMER_MAX_DELTA (include/linux/timer.h), so removed the comment.

> 
> >
> >>> +void kvmppc_watchdog_func(unsigned long data) {
> >>> +    struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> >>> +    u32 tsr, new_tsr;
> >>> +    int final;
> >>> +
> >>> +    do {
> >>> +        new_tsr = tsr = vcpu->arch.tsr;
> >>> +        final = 0;
> >>> +
> >>> +        /* Time out event */
> >>> +        if (tsr & TSR_ENW) {
> >>> +            if (tsr & TSR_WIS) {
> >>> +                new_tsr = (tsr & ~TCR_WRC_MASK) |
> >>> +                      (vcpu->arch.tcr & TCR_WRC_MASK);
> >>> +                vcpu->arch.tcr &= ~TCR_WRC_MASK;
> >>
> >> Can't we just poke the vcpu to exit the VM and do the above on its own?
> >
> > We've discussed this before.  TSR updates are done via atomics, and we
> > send a request for the vcpu to act on the result.  This is how the
> > decrementer works.
> >
> > http://www.spinics.net/lists/kvm-ppc/msg03169.html
> 
> Yeah, the major difference to the dec is the atomicity of the whole thing. Dec
> changes one bit to enable the interrupt line. The final expiration is more
> complex.

Is not setting the TSR.WRS atomic here (cmpxchg() will handle this)?

> 
> >
> >> This is the watdog expired case, right?
> >
> > Final expiration, yes.
> >
> >> I'd also prefer to have an
> >> explicit event for the expiry than a special TSR check in the main loop.
> >
> > So check TSR[WRS] in update_timer_ints(), and have it queue a
> > pseudoexception?
> 
> Or here.

Do we mean define a sudo IROPRIO for final expiry.

> 
> > That would eliminate the need to change the runnable function.
> >
> >> Also call me sceptic on the reset of tcr. If our user space watchdog
> >> event is "write a message", then we essentially want to hide the fact
> >> that the watchdog expired from the guest, right? In that case, the
> >> second time-out wouldn't do anything guest visible.
> >
> > This was probably copied straight out of the hardware documentation,
> > which explicitly says TCR[WRC] gets set to zero on final expiration
> > (as part of reset).  We should leave that part up to userspace.  It
> > definitely shouldn't be done inside the cmpxchg loop (or from
> > interrupt context -- only TSR gets the atomic treatment).  I don't
> > think the read of TCR outside vcpu context is a problem, though.
> 
> Yeah, but it'd just make me less wary if only the vcpu thread itself accesses
> vcpu internal registers that aren't irq state and thus designed for it (TSR).
> 
> But yes, the most flexible way would probably be to do it from user space. Since
> it'd happen from within the vcpu context of user space, we can also guarantee
> that the TCR access is atomic.

Yes, will move the tcr.wrc clearing to userspace.

> 
> >
> >>> int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) {
> >>> -    return !(v->arch.shared->msr & MSR_WE) ||
> >>> -           !!(v->arch.pending_exceptions) ||
> >>> -           v->requests;
> >>> +    bool ret = !(v->arch.shared->msr & MSR_WE) ||
> >>> +           !!(v->arch.pending_exceptions) ||
> >>> +           v->requests;
> >>> +
> >>> +    ret = ret || kvmppc_get_tsr_wrc(v);
> >>
> >> Why do you need to declare the cpu as non-runnable when a watchdog
> >> event occured?
> >
> > It's the other way around -- it's always runnable when a watchdog exit
> > is pending.  It's like a pending exception.
> 
> Ah, so yes, we should just shove it into pending_exceptions then.

Pending_exception? You mean sudo again here as said earlier.

Thanks
-Bharat
> 
> >
> >>> diff --git a/include/linux/kvm.h b/include/linux/kvm.h index
> >>> 2ce09aa..b9fdb52 100644
> >>> --- a/include/linux/kvm.h
> >>> +++ b/include/linux/kvm.h
> >>> @@ -163,6 +163,7 @@ struct kvm_pit_config {
> >>> #define KVM_EXIT_OSI              18
> >>> #define KVM_EXIT_PAPR_HCALL      19
> >>> #define KVM_EXIT_S390_UCONTROL      20
> >>> +#define KVM_EXIT_WDT              21
> >>
> >> Please make this a more generic KVM_EXIT_WATCHDOG so that other archs
> >> may have the chance to reuse it.
> >
> > WDT is generic (85 of 115 files in drivers/watchdog/ contain "wdt" in
> > their names), just overly abbreviated.  KVM_EXIT_WATCHDOG is more readable.
> 
> Ah, didn't know that it's a commonly used abbreviation. It's not like we're
> constrained in our line length for exit code handling usually though, so
> readable still wins for me :)
> 
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
> of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html