Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation

Alexander Graf <agraf@xxxxxxx> · Tue, 17 Jul 2012 09:20:20 +0200

On 17.07.2012, at 03:02, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:

> On 07/16/2012 12:18 PM, Alexander Graf wrote:
>>> +/*
>>> + * Return the number of jiffies until the next timeout.  If the
>> timeout is
>>> + * longer than the NEXT_TIMER_MAX_DELTA, that
>> 
>> then?
>> 
>>> return NEXT_TIMER_MAX_DELTA
>>> + * instead.
>> 
>> I can read code.
> 
> Come on, it's not exactly x++; /* add one to x */
> 
> It's faster to read code (as well as know the constraints within which
> you can modify it without having to spend a lot of time digesting all
> the callers' use cases) when you have a high level description of its
> interface contract, and can be selective about when to zoom in to the
> details.  Linux kernel code tends to be bad about this.

Yeah, not opposed to leave that part in :).

> 
>> The important piece of information in the comment is
>> missing: The reason.
> 
> The reason for what?  Why you want to know the next timeout?  That's the
> caller's business.  Or why we use NEXT_TIMER_MAX_DELTA as the limit?

Why we use the limit. IIRC it was explained in the last thread, just didn't make its way into the comment.

> 
>>> +void kvmppc_watchdog_func(unsigned long data)
>>> +{
>>> +    struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
>>> +    u32 tsr, new_tsr;
>>> +    int final;
>>> +
>>> +    do {
>>> +        new_tsr = tsr = vcpu->arch.tsr;
>>> +        final = 0;
>>> +
>>> +        /* Time out event */
>>> +        if (tsr & TSR_ENW) {
>>> +            if (tsr & TSR_WIS) {
>>> +                new_tsr = (tsr & ~TCR_WRC_MASK) |
>>> +                      (vcpu->arch.tcr & TCR_WRC_MASK);
>>> +                vcpu->arch.tcr &= ~TCR_WRC_MASK;
>> 
>> Can't we just poke the vcpu to exit the VM and do the above on its own?
> 
> We've discussed this before.  TSR updates are done via atomics, and we
> send a request for the vcpu to act on the result.  This is how the
> decrementer works.
> 
> http://www.spinics.net/lists/kvm-ppc/msg03169.html

Yeah, the major difference to the dec is the atomicity of the whole thing. Dec changes one bit to enable the interrupt line. The final expiration is more complex.

> 
>> This is the watdog expired case, right?
> 
> Final expiration, yes.
> 
>> I'd also prefer to have an
>> explicit event for the expiry than a special TSR check in the main loop.
> 
> So check TSR[WRS] in update_timer_ints(), and have it queue a
> pseudoexception?  

Or here.

> That would eliminate the need to change the runnable
> function.
>    
>> Also call me sceptic on the reset of tcr. If our user space watchdog
>> event is "write a message", then we essentially want to hide the fact
>> that the watchdog expired from the guest, right? In that case, the
>> second time-out wouldn't do anything guest visible.
> 
> This was probably copied straight out of the hardware documentation,
> which explicitly says TCR[WRC] gets set to zero on final expiration (as
> part of reset).  We should leave that part up to userspace.  It
> definitely shouldn't be done inside the cmpxchg loop (or from interrupt
> context -- only TSR gets the atomic treatment).  I don't think the read
> of TCR outside vcpu context is a problem, though.

Yeah, but it'd just make me less wary if only the vcpu thread itself accesses vcpu internal registers that aren't irq state and thus designed for it (TSR).

But yes, the most flexible way would probably be to do it from user space. Since it'd happen from within the vcpu context of user space, we can also guarantee that the TCR access is atomic.

> 
>>> int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>> {
>>> -    return !(v->arch.shared->msr & MSR_WE) ||
>>> -           !!(v->arch.pending_exceptions) ||
>>> -           v->requests;
>>> +    bool ret = !(v->arch.shared->msr & MSR_WE) ||
>>> +           !!(v->arch.pending_exceptions) ||
>>> +           v->requests;
>>> +
>>> +    ret = ret || kvmppc_get_tsr_wrc(v);
>> 
>> Why do you need to declare the cpu as non-runnable when a watchdog event
>> occured?
> 
> It's the other way around -- it's always runnable when a watchdog exit
> is pending.  It's like a pending exception.

Ah, so yes, we should just shove it into pending_exceptions then.

> 
>>> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
>>> index 2ce09aa..b9fdb52 100644
>>> --- a/include/linux/kvm.h
>>> +++ b/include/linux/kvm.h
>>> @@ -163,6 +163,7 @@ struct kvm_pit_config {
>>> #define KVM_EXIT_OSI              18
>>> #define KVM_EXIT_PAPR_HCALL      19
>>> #define KVM_EXIT_S390_UCONTROL      20
>>> +#define KVM_EXIT_WDT              21
>> 
>> Please make this a more generic KVM_EXIT_WATCHDOG so that other archs
>> may have the chance to reuse it.
> 
> WDT is generic (85 of 115 files in drivers/watchdog/ contain "wdt" in
> their names), just overly abbreviated.  KVM_EXIT_WATCHDOG is more readable.

Ah, didn't know that it's a commonly used abbreviation. It's not like we're constrained in our line length for exit code handling usually though, so readable still wins for me :)

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html