Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

Alexander Graf <agraf@xxxxxxx> · Mon, 9 Jul 2012 10:49:25 +0200

On 09.07.2012, at 07:13, Bhushan Bharat-R65777 <R65777@xxxxxxxxxxxxx> wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@xxxxxxx]
>> Sent: Saturday, July 07, 2012 1:21 PM
>> To: Wood Scott-B07421
>> Cc: Bhushan Bharat-R65777; kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Bhushan
>> Bharat-R65777
>> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
>> 
>> 
>> On 07.07.2012, at 01:37, Scott Wood wrote:
>> 
>>> On 07/06/2012 08:17 AM, Alexander Graf wrote:
>>>> On 28.06.2012, at 08:17, Bharat Bhushan wrote:
>>>>> +/*
>>>>> + * The timer system can almost deal with LONG_MAX timeouts, except that
>>>>> + * when you get very close to LONG_MAX, the slack added can cause overflow.
>>>>> + *
>>>>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
>>>>> + * any realistic use.
>>>>> + */
>>>>> +#define MAX_TIMEOUT (LONG_MAX/2)
>>>> 
>>>> Should this really be in kvm code?
>>> 
>>> It looks like we can use NEXT_TIMER_MAX_DELTA for this.
>>> 
>>>>> +    mask = 1ULL << (63 - period);
>>>>> +    tb = get_tb();
>>>>> +    if (tb & mask)
>>>>> +        nr_jiffies += mask;
>>>> 
>>>> To be honest, you lost me here. nr_jiffies is jiffies, right? While
>>>> mask is basically in timebase granularity. I suppose you're just
>>>> reusing the variable here for no good reason - the compiler will
>>>> gladly optimize things for you if you write things a bit more verbose
>>>> :).
>>> 
>>> Probably due to the way do_div() works, but yeah, it's confusing.  Maybe
>>> something generic like "ticks", "interval", "remaining", etc. would be
>>> better, with a comment on the do_div saying it's converting timebase
>>> ticks into jiffies.
>> 
>> Well, you could start off with a variable "delta_tb", then do
>> 
>>  nr_jiffies = delta_tb;
>>  x = do_div(...);
>> 
>> and things would suddenly become readable :). Of course I don't object to
>> comments along the code either :).
> 
> Ok
> 
>> 
>>> 
>>>>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +    unsigned long nr_jiffies;
>>>>> +
>>>>> +    nr_jiffies = watchdog_next_timeout(vcpu);
>>>>> +    if (nr_jiffies < MAX_TIMEOUT)
>>>>> +        mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
>>>>> +    else
>>>>> +        del_timer(&vcpu->arch.wdt_timer);
>>>> 
>>>> Can you del a timer that's not armed? Could that ever happen in this case?
>>> 
>>> "del_timer() deactivates a timer - this works on both active and
>>> inactive timers."
>> 
>> Ah, good :).
>> 
>>> 
>>>> Also, could the timer possibly be running somewhere, so do we need
>> del_timer_sync? Or don't we need to care?
>>> 
>>> This can be called in the context of the timer, so del_timer_sync()
>>> would hang.
>>> 
>>> As for what would happen if a caller from a different context were to
>>> race with a timer, I think you could end up with the timer armed based
>>> on an old TCR.  del_timer_sync() won't help though, unless you replace
>>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see
>>> whether it's running in timer context).  A better solution is probably
>>> to use a spinlock in arm_next_watchdog().
>> 
>> Yup. Either way, we have a race that the guest might not expect.
> 
> Ok, will use spinlock in arm_next_watchdog().
> 
>> 
>>> 
>>>>> +void kvmppc_watchdog_func(unsigned long data)
>>>>> +{
>>>>> +    struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
>>>>> +    u32 tsr, new_tsr;
>>>>> +    int final;
>>>>> +
>>>>> +    do {
>>>>> +        new_tsr = tsr = vcpu->arch.tsr;
>>>>> +        final = 0;
>>>>> +
>>>>> +        /* Time out event */
>>>>> +        if (tsr & TSR_ENW) {
>>>>> +            if (tsr & TSR_WIS) {
>>>>> +                new_tsr = (tsr & ~TCR_WRC_MASK) |
>>>>> +                      (vcpu->arch.tcr & TCR_WRC_MASK);
>>>>> +                vcpu->arch.tcr &= ~TCR_WRC_MASK;
>>>>> +                final = 1;
>>>>> +            } else {
>>>>> +                new_tsr = tsr | TSR_WIS;
>>>>> +            }
>>>>> +        } else {
>>>>> +            new_tsr = tsr | TSR_ENW;
>>>>> +        }
>>>>> +    } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
>>>>> +
>>>>> +    if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
>>>>> +        smp_wmb();
>>>>> +        kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>>>>> +        kvm_vcpu_kick(vcpu);
>>>>> +    }
>>>>> +
>>>>> +    /*
>>>>> +     * Avoid getting a storm of timers if the guest sets
>>>>> +     * the period very short.  We'll restart it if anything
>>>>> +     * changes.
>>>>> +     */
>>>>> +    if (!final)
>>>>> +        arm_next_watchdog(vcpu);
>>>> 
>>>> Mind to explain this part a bit further?
>>> 
>>> The whole function, or some subset near the end?
>>> 
>>> The "if (!final)" check means that we stop running the timer after final
>>> expiration, to prevent the host from being flooded with timers if the
>>> guest sets a short period but does not have TCR set to exit to QEMU.
>>> Timers will resume the next time TSR/TCR is updated.
>> 
>> Ah. The semantics make sense. The comment however is slightly too short. Please
>> explain this in a more verbose way, so someone who didn't write the code knows
>> what's going on :).
> 
> Ok.
> 
>> 
>>> 
>>>>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
>>>>>    }
>>>>> 
>>>>>    if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) {
>>>>> +        u32 old_tsr = vcpu->arch.tsr;
>>>>> +
>>>>>        vcpu->arch.tsr = sregs->u.e.tsr;
>>>>> +
>>>>> +        if ((old_tsr ^ vcpu->arch.tsr) &
>>>>> +            (TSR_ENW | TSR_WIS | TCR_WRC_MASK))
>>>>> +            arm_next_watchdog(vcpu);
>>>> 
>>>> Why isn't this one guarded by vcpu->arch.watchdog_enable?
>>> 
>>> I'm not sure that any of them should be -- there's no reason for the
>>> watchdog interrupt mechanism to be dependent on QEMU, only the
>>> heavyweight exit on final expiration.
>> 
>> Well - I like the concept of having new features switchable. Overlapping the
>> "watchdog is implemented" feature with "user space wants watchdog exits" makes
>> sense. But I definitely want to have a switch for the former, because we
>> otherwise differ quite substantially from the emulation we had before.
> 
> Ok, will guard with watchdog_enable.
> 
>> 
>>> 
>>>>> +            spr_val &= ~TCR_WRC_MASK;
>>>>> +        kvmppc_set_tcr(vcpu,
>>>>> +                       spr_val | (TCR_WRC_MASK & vcpu->arch.tcr));
>>>> 
>>>> In fact, what you're trying to do here is keep TCR_WRC always on when it was
>> enabled once. So all you need is the OR here. No need for the mask above.
>>> 
>>> WRC is a 2-bit field that is supposed to preserve its value once written
>>> to be non-zero.  Not that we actually do anything different based on the
>>> specific non-zero value, but still we should implement the architected
>>> semantics.
>> 
>> Ah, being 2 bits wide, the above code suddenly makes more sense :). How about
>> 
>> /* WRC is a 2-bit field that is supposed to preserve its value once written to
>> be non-zero */
>> spr_val &= ~TCR_WRC_MASK;
>> spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
>> kvmppc_set_tcr(vcpu, spr_val);
> 
> I think you mean:
> 
> if (TCR_WRC_MASK & vcpu->arch.tcr) {
>    spr_val &= ~TCR_WRC_MASK;
>    spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
> }
> kvmppc_set_tcr(vcpu, spr_val);

Eh, yes, of course :). Plus the comment.

Alex

> 
> Thanks
> -Bharat
> 
>> 
>> 
>> Alex
>> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html