Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

Scott Wood <scottwood@xxxxxxxxxxxxx> · Fri, 6 Jul 2012 18:37:55 -0500

On 07/06/2012 08:17 AM, Alexander Graf wrote:
> On 28.06.2012, at 08:17, Bharat Bhushan wrote:
>> +/*
>> + * The timer system can almost deal with LONG_MAX timeouts, except that
>> + * when you get very close to LONG_MAX, the slack added can cause overflow.
>> + *
>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
>> + * any realistic use.
>> + */
>> +#define MAX_TIMEOUT (LONG_MAX/2)
> 
> Should this really be in kvm code?

It looks like we can use NEXT_TIMER_MAX_DELTA for this.

>> +	mask = 1ULL << (63 - period);
>> +	tb = get_tb();
>> +	if (tb & mask)
>> +		nr_jiffies += mask;
> 
> To be honest, you lost me here. nr_jiffies is jiffies, right? While
> mask is basically in timebase granularity. I suppose you're just
> reusing the variable here for no good reason - the compiler will
> gladly optimize things for you if you write things a bit more verbose
> :).

Probably due to the way do_div() works, but yeah, it's confusing.  Maybe
something generic like "ticks", "interval", "remaining", etc. would be
better, with a comment on the do_div saying it's converting timebase
ticks into jiffies.

>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long nr_jiffies;
>> +
>> +	nr_jiffies = watchdog_next_timeout(vcpu);
>> +	if (nr_jiffies < MAX_TIMEOUT)
>> +		mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
>> +	else
>> +		del_timer(&vcpu->arch.wdt_timer);
> 
> Can you del a timer that's not armed? Could that ever happen in this case?

"del_timer() deactivates a timer - this works on both active and
inactive timers."

> Also, could the timer possibly be running somewhere, so do we need del_timer_sync? Or don't we need to care?

This can be called in the context of the timer, so del_timer_sync()
would hang.

As for what would happen if a caller from a different context were to
race with a timer, I think you could end up with the timer armed based
on an old TCR.  del_timer_sync() won't help though, unless you replace
mod_timer() with del_timer_sync() plus add_timer() (with a check to see
whether it's running in timer context).  A better solution is probably
to use a spinlock in arm_next_watchdog().

>> +void kvmppc_watchdog_func(unsigned long data)
>> +{
>> +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
>> +	u32 tsr, new_tsr;
>> +	int final;
>> +
>> +	do {
>> +		new_tsr = tsr = vcpu->arch.tsr;
>> +		final = 0;
>> +
>> +		/* Time out event */
>> +		if (tsr & TSR_ENW) {
>> +			if (tsr & TSR_WIS) {
>> +				new_tsr = (tsr & ~TCR_WRC_MASK) |
>> +					  (vcpu->arch.tcr & TCR_WRC_MASK);
>> +				vcpu->arch.tcr &= ~TCR_WRC_MASK;
>> +				final = 1;
>> +			} else {
>> +				new_tsr = tsr | TSR_WIS;
>> +			}
>> +		} else {
>> +			new_tsr = tsr | TSR_ENW;
>> +		}
>> +	} while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
>> +
>> +	if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
>> +		smp_wmb();
>> +		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +
>> +	/*
>> +	 * Avoid getting a storm of timers if the guest sets
>> +	 * the period very short.  We'll restart it if anything
>> +	 * changes.
>> +	 */
>> +	if (!final)
>> +		arm_next_watchdog(vcpu);
> 
> Mind to explain this part a bit further?

The whole function, or some subset near the end?

The "if (!final)" check means that we stop running the timer after final
expiration, to prevent the host from being flooded with timers if the
guest sets a short period but does not have TCR set to exit to QEMU.
Timers will resume the next time TSR/TCR is updated.

>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
>> 	}
>>
>> 	if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) {
>> +		u32 old_tsr = vcpu->arch.tsr;
>> +
>> 		vcpu->arch.tsr = sregs->u.e.tsr;
>> +
>> +		if ((old_tsr ^ vcpu->arch.tsr) &
>> +		    (TSR_ENW | TSR_WIS | TCR_WRC_MASK))
>> +			arm_next_watchdog(vcpu);
> 
> Why isn't this one guarded by vcpu->arch.watchdog_enable?

I'm not sure that any of them should be -- there's no reason for the
watchdog interrupt mechanism to be dependent on QEMU, only the
heavyweight exit on final expiration.

>> +			spr_val &= ~TCR_WRC_MASK;
>> +		kvmppc_set_tcr(vcpu,
>> +		               spr_val | (TCR_WRC_MASK & vcpu->arch.tcr));
> 
> In fact, what you're trying to do here is keep TCR_WRC always on when it was enabled once. So all you need is the OR here. No need for the mask above.

WRC is a 2-bit field that is supposed to preserve its value once written
to be non-zero.  Not that we actually do anything different based on the
specific non-zero value, but still we should implement the architected
semantics.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html