RE: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation

Bhushan Bharat-R65777 <R65777@xxxxxxxxxxxxx> · Tue, 17 Jul 2012 13:15:57 +0000

> -----Original Message-----
> From: kvm-ppc-owner@xxxxxxxxxxxxxxx [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On
> Behalf Of Alexander Graf
> Sent: Tuesday, July 17, 2012 6:22 PM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; <kvm-ppc@xxxxxxxxxxxxxxx>; <kvm@xxxxxxxxxxxxxxx>;
> <bharatb.yadav@xxxxxxxxx>; Benjamin Herrenschmidt; Kumar Gala; Avi Kivity
> Subject: Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation
> 
> On 07/17/2012 11:57 AM, Bhushan Bharat-R65777 wrote:
> >
> >> -----Original Message-----
> >> From: kvm-ppc-owner@xxxxxxxxxxxxxxx
> >> [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On Behalf Of Alexander Graf
> >> Sent: Tuesday, July 17, 2012 12:50 PM
> >> To: Wood Scott-B07421
> >> Cc: Bhushan Bharat-R65777; <kvm-ppc@xxxxxxxxxxxxxxx>;
> >> <kvm@xxxxxxxxxxxxxxx>; <bharatb.yadav@xxxxxxxxx>; Bhushan
> >> Bharat-R65777; Benjamin Herrenschmidt; Kumar Gala
> >> Subject: Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation
> >>
> >>
> >>
> >> On 17.07.2012, at 03:02, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:
> >>
> >>> On 07/16/2012 12:18 PM, Alexander Graf wrote:
> >>>>> +/*
> >>>>> + * Return the number of jiffies until the next timeout.  If the
> >>>> timeout is
> >>>>> + * longer than the NEXT_TIMER_MAX_DELTA, that
> >>>> then?
> >>>>
> >>>>> return NEXT_TIMER_MAX_DELTA
> >>>>> + * instead.
> >>>> I can read code.
> >>> Come on, it's not exactly x++; /* add one to x */
> >>>
> >>> It's faster to read code (as well as know the constraints within
> >>> which you can modify it without having to spend a lot of time
> >>> digesting all the callers' use cases) when you have a high level
> >>> description of its interface contract, and can be selective about
> >>> when to zoom in to the details.  Linux kernel code tends to be bad about
> this.
> >> Yeah, not opposed to leave that part in :).
> >>
> >>>> The important piece of information in the comment is
> >>>> missing: The reason.
> >>> The reason for what?  Why you want to know the next timeout?  That's
> >>> the caller's business.  Or why we use NEXT_TIMER_MAX_DELTA as the limit?
> >> Why we use the limit. IIRC it was explained in the last thread, just
> >> didn't make its way into the comment.
> > Earlier we have a comment on the #define MAX_TIMEOUT (new define added for a
> purpose, so the comment described the puspose).
> > Now we uses the generic #define NEX_TIMER_MAX_DELTA (include/linux/timer.h),
> so removed the comment.
> 
> Ah, ok. Just saying, if you comment on some mechanism, like you did here, please
> also include the reasoning behind it. For example
> 
>    Do foo if x is true.
> 
> isn't particularly helpful. However
> 
>    Do foo if x is true because the bar API will break with high values
> 
> is very helpful. It includes the action and reason of the code :).
> Alternatively, to me the same as above would be
> 
>    /* bar API will break with high values */
>    if (x)
>      do(foo)
> 
> because in this case the code is the action description. Either variant works
> fine for me.

Ok :)

> 
> >
> >>>>> +void kvmppc_watchdog_func(unsigned long data) {
> >>>>> +    struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> >>>>> +    u32 tsr, new_tsr;
> >>>>> +    int final;
> >>>>> +
> >>>>> +    do {
> >>>>> +        new_tsr = tsr = vcpu->arch.tsr;
> >>>>> +        final = 0;
> >>>>> +
> >>>>> +        /* Time out event */
> >>>>> +        if (tsr & TSR_ENW) {
> >>>>> +            if (tsr & TSR_WIS) {
> >>>>> +                new_tsr = (tsr & ~TCR_WRC_MASK) |
> >>>>> +                      (vcpu->arch.tcr & TCR_WRC_MASK);
> >>>>> +                vcpu->arch.tcr &= ~TCR_WRC_MASK;
> >>>> Can't we just poke the vcpu to exit the VM and do the above on its own?
> >>> We've discussed this before.  TSR updates are done via atomics, and
> >>> we send a request for the vcpu to act on the result.  This is how
> >>> the decrementer works.
> >>>
> >>> http://www.spinics.net/lists/kvm-ppc/msg03169.html
> >> Yeah, the major difference to the dec is the atomicity of the whole
> >> thing. Dec changes one bit to enable the interrupt line. The final
> >> expiration is more complex.
> > Is not setting the TSR.WRS atomic here (cmpxchg() will handle this)?
> 
> Final expiration sets TCR. TSR should be ok.

It is clearing some TCR bits :)

Let us move the TCR clearing to userspace (please see below response ^^). Then it is just setting TSR. Right?

> 
> >
> >>>> This is the watdog expired case, right?
> >>> Final expiration, yes.
> >>>
> >>>> I'd also prefer to have an
> >>>> explicit event for the expiry than a special TSR check in the main loop.
> >>> So check TSR[WRS] in update_timer_ints(), and have it queue a
> >>> pseudoexception?
> >> Or here.
> > Do we mean define a sudo IROPRIO for final expiry.
> 
> We can also define an event that is sent through kvm_make_request. But yeah,
> IRQPRIO is probably easier. Not 100% sure which way is better though. Avi, any
> preferences?
> 
> >
> >>> That would eliminate the need to change the runnable function.
> >>>
> >>>> Also call me sceptic on the reset of tcr. If our user space watchdog
> >>>> event is "write a message", then we essentially want to hide the fact
> >>>> that the watchdog expired from the guest, right? In that case, the
> >>>> second time-out wouldn't do anything guest visible.
> >>> This was probably copied straight out of the hardware documentation,
> >>> which explicitly says TCR[WRC] gets set to zero on final expiration
> >>> (as part of reset).  We should leave that part up to userspace.  It
> >>> definitely shouldn't be done inside the cmpxchg loop (or from
> >>> interrupt context -- only TSR gets the atomic treatment).  I don't
> >>> think the read of TCR outside vcpu context is a problem, though.
> >> Yeah, but it'd just make me less wary if only the vcpu thread itself accesses
> >> vcpu internal registers that aren't irq state and thus designed for it (TSR).
> >>
> >> But yes, the most flexible way would probably be to do it from user space.
> Since
> >> it'd happen from within the vcpu context of user space, we can also guarantee
> >> that the TCR access is atomic.
> > Yes, will move the tcr.wrc clearing to userspace.

^^ Here .. It is good to move clearing the TCR to guest.

Thanks
-Bharat

> >
> >>>>> int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) {
> >>>>> -    return !(v->arch.shared->msr & MSR_WE) ||
> >>>>> -           !!(v->arch.pending_exceptions) ||
> >>>>> -           v->requests;
> >>>>> +    bool ret = !(v->arch.shared->msr & MSR_WE) ||
> >>>>> +           !!(v->arch.pending_exceptions) ||
> >>>>> +           v->requests;
> >>>>> +
> >>>>> +    ret = ret || kvmppc_get_tsr_wrc(v);
> >>>> Why do you need to declare the cpu as non-runnable when a watchdog
> >>>> event occured?
> >>> It's the other way around -- it's always runnable when a watchdog exit
> >>> is pending.  It's like a pending exception.
> >> Ah, so yes, we should just shove it into pending_exceptions then.
> > Pending_exception? You mean sudo again here as said earlier.
> 
> pseudo :). Yeah, I'm referring to above. No need to check 500 different
> conditions when we already have a bitmap that says "event is pending".
> 
> 
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html