RE: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Alexander Graf [mailto:agraf@xxxxxxx]
> Sent: Saturday, July 07, 2012 1:21 PM
> To: Wood Scott-B07421
> Cc: Bhushan Bharat-R65777; kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Bhushan
> Bharat-R65777
> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
> 
> 
> On 07.07.2012, at 01:37, Scott Wood wrote:
> 
> > On 07/06/2012 08:17 AM, Alexander Graf wrote:
> >> On 28.06.2012, at 08:17, Bharat Bhushan wrote:
> >>> +/*
> >>> + * The timer system can almost deal with LONG_MAX timeouts, except that
> >>> + * when you get very close to LONG_MAX, the slack added can cause overflow.
> >>> + *
> >>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
> >>> + * any realistic use.
> >>> + */
> >>> +#define MAX_TIMEOUT (LONG_MAX/2)
> >>
> >> Should this really be in kvm code?
> >
> > It looks like we can use NEXT_TIMER_MAX_DELTA for this.
> >
> >>> +	mask = 1ULL << (63 - period);
> >>> +	tb = get_tb();
> >>> +	if (tb & mask)
> >>> +		nr_jiffies += mask;
> >>
> >> To be honest, you lost me here. nr_jiffies is jiffies, right? While
> >> mask is basically in timebase granularity. I suppose you're just
> >> reusing the variable here for no good reason - the compiler will
> >> gladly optimize things for you if you write things a bit more verbose
> >> :).
> >
> > Probably due to the way do_div() works, but yeah, it's confusing.  Maybe
> > something generic like "ticks", "interval", "remaining", etc. would be
> > better, with a comment on the do_div saying it's converting timebase
> > ticks into jiffies.
> 
> Well, you could start off with a variable "delta_tb", then do
> 
>   nr_jiffies = delta_tb;
>   x = do_div(...);
> 
> and things would suddenly become readable :). Of course I don't object to
> comments along the code either :).

Ok

> 
> >
> >>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	unsigned long nr_jiffies;
> >>> +
> >>> +	nr_jiffies = watchdog_next_timeout(vcpu);
> >>> +	if (nr_jiffies < MAX_TIMEOUT)
> >>> +		mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
> >>> +	else
> >>> +		del_timer(&vcpu->arch.wdt_timer);
> >>
> >> Can you del a timer that's not armed? Could that ever happen in this case?
> >
> > "del_timer() deactivates a timer - this works on both active and
> > inactive timers."
> 
> Ah, good :).
> 
> >
> >> Also, could the timer possibly be running somewhere, so do we need
> del_timer_sync? Or don't we need to care?
> >
> > This can be called in the context of the timer, so del_timer_sync()
> > would hang.
> >
> > As for what would happen if a caller from a different context were to
> > race with a timer, I think you could end up with the timer armed based
> > on an old TCR.  del_timer_sync() won't help though, unless you replace
> > mod_timer() with del_timer_sync() plus add_timer() (with a check to see
> > whether it's running in timer context).  A better solution is probably
> > to use a spinlock in arm_next_watchdog().
> 
> Yup. Either way, we have a race that the guest might not expect.

Ok, will use spinlock in arm_next_watchdog().

> 
> >
> >>> +void kvmppc_watchdog_func(unsigned long data)
> >>> +{
> >>> +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> >>> +	u32 tsr, new_tsr;
> >>> +	int final;
> >>> +
> >>> +	do {
> >>> +		new_tsr = tsr = vcpu->arch.tsr;
> >>> +		final = 0;
> >>> +
> >>> +		/* Time out event */
> >>> +		if (tsr & TSR_ENW) {
> >>> +			if (tsr & TSR_WIS) {
> >>> +				new_tsr = (tsr & ~TCR_WRC_MASK) |
> >>> +					  (vcpu->arch.tcr & TCR_WRC_MASK);
> >>> +				vcpu->arch.tcr &= ~TCR_WRC_MASK;
> >>> +				final = 1;
> >>> +			} else {
> >>> +				new_tsr = tsr | TSR_WIS;
> >>> +			}
> >>> +		} else {
> >>> +			new_tsr = tsr | TSR_ENW;
> >>> +		}
> >>> +	} while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
> >>> +
> >>> +	if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
> >>> +		smp_wmb();
> >>> +		kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> >>> +		kvm_vcpu_kick(vcpu);
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * Avoid getting a storm of timers if the guest sets
> >>> +	 * the period very short.  We'll restart it if anything
> >>> +	 * changes.
> >>> +	 */
> >>> +	if (!final)
> >>> +		arm_next_watchdog(vcpu);
> >>
> >> Mind to explain this part a bit further?
> >
> > The whole function, or some subset near the end?
> >
> > The "if (!final)" check means that we stop running the timer after final
> > expiration, to prevent the host from being flooded with timers if the
> > guest sets a short period but does not have TCR set to exit to QEMU.
> > Timers will resume the next time TSR/TCR is updated.
> 
> Ah. The semantics make sense. The comment however is slightly too short. Please
> explain this in a more verbose way, so someone who didn't write the code knows
> what's going on :).

Ok.

> 
> >
> >>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
> >>> 	}
> >>>
> >>> 	if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) {
> >>> +		u32 old_tsr = vcpu->arch.tsr;
> >>> +
> >>> 		vcpu->arch.tsr = sregs->u.e.tsr;
> >>> +
> >>> +		if ((old_tsr ^ vcpu->arch.tsr) &
> >>> +		    (TSR_ENW | TSR_WIS | TCR_WRC_MASK))
> >>> +			arm_next_watchdog(vcpu);
> >>
> >> Why isn't this one guarded by vcpu->arch.watchdog_enable?
> >
> > I'm not sure that any of them should be -- there's no reason for the
> > watchdog interrupt mechanism to be dependent on QEMU, only the
> > heavyweight exit on final expiration.
> 
> Well - I like the concept of having new features switchable. Overlapping the
> "watchdog is implemented" feature with "user space wants watchdog exits" makes
> sense. But I definitely want to have a switch for the former, because we
> otherwise differ quite substantially from the emulation we had before.

Ok, will guard with watchdog_enable.

> 
> >
> >>> +			spr_val &= ~TCR_WRC_MASK;
> >>> +		kvmppc_set_tcr(vcpu,
> >>> +		               spr_val | (TCR_WRC_MASK & vcpu->arch.tcr));
> >>
> >> In fact, what you're trying to do here is keep TCR_WRC always on when it was
> enabled once. So all you need is the OR here. No need for the mask above.
> >
> > WRC is a 2-bit field that is supposed to preserve its value once written
> > to be non-zero.  Not that we actually do anything different based on the
> > specific non-zero value, but still we should implement the architected
> > semantics.
> 
> Ah, being 2 bits wide, the above code suddenly makes more sense :). How about
> 
> /* WRC is a 2-bit field that is supposed to preserve its value once written to
> be non-zero */
> spr_val &= ~TCR_WRC_MASK;
> spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
> kvmppc_set_tcr(vcpu, spr_val);

I think you mean:

if (TCR_WRC_MASK & vcpu->arch.tcr) {
    spr_val &= ~TCR_WRC_MASK;
    spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
}
kvmppc_set_tcr(vcpu, spr_val);

Thanks
-Bharat

> 
> 
> Alex
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux