On 07/06/2012 08:17 AM, Alexander Graf wrote: > On 28.06.2012, at 08:17, Bharat Bhushan wrote: >> +/* >> + * The timer system can almost deal with LONG_MAX timeouts, except that >> + * when you get very close to LONG_MAX, the slack added can cause overflow. >> + * >> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for >> + * any realistic use. >> + */ >> +#define MAX_TIMEOUT (LONG_MAX/2) > > Should this really be in kvm code? It looks like we can use NEXT_TIMER_MAX_DELTA for this. >> + mask = 1ULL << (63 - period); >> + tb = get_tb(); >> + if (tb & mask) >> + nr_jiffies += mask; > > To be honest, you lost me here. nr_jiffies is jiffies, right? While > mask is basically in timebase granularity. I suppose you're just > reusing the variable here for no good reason - the compiler will > gladly optimize things for you if you write things a bit more verbose > :). Probably due to the way do_div() works, but yeah, it's confusing. Maybe something generic like "ticks", "interval", "remaining", etc. would be better, with a comment on the do_div saying it's converting timebase ticks into jiffies. >> +static void arm_next_watchdog(struct kvm_vcpu *vcpu) >> +{ >> + unsigned long nr_jiffies; >> + >> + nr_jiffies = watchdog_next_timeout(vcpu); >> + if (nr_jiffies < MAX_TIMEOUT) >> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies); >> + else >> + del_timer(&vcpu->arch.wdt_timer); > > Can you del a timer that's not armed? Could that ever happen in this case? "del_timer() deactivates a timer - this works on both active and inactive timers." > Also, could the timer possibly be running somewhere, so do we need del_timer_sync? Or don't we need to care? This can be called in the context of the timer, so del_timer_sync() would hang. As for what would happen if a caller from a different context were to race with a timer, I think you could end up with the timer armed based on an old TCR. del_timer_sync() won't help though, unless you replace mod_timer() with del_timer_sync() plus add_timer() (with a check to see whether it's running in timer context). A better solution is probably to use a spinlock in arm_next_watchdog(). >> +void kvmppc_watchdog_func(unsigned long data) >> +{ >> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; >> + u32 tsr, new_tsr; >> + int final; >> + >> + do { >> + new_tsr = tsr = vcpu->arch.tsr; >> + final = 0; >> + >> + /* Time out event */ >> + if (tsr & TSR_ENW) { >> + if (tsr & TSR_WIS) { >> + new_tsr = (tsr & ~TCR_WRC_MASK) | >> + (vcpu->arch.tcr & TCR_WRC_MASK); >> + vcpu->arch.tcr &= ~TCR_WRC_MASK; >> + final = 1; >> + } else { >> + new_tsr = tsr | TSR_WIS; >> + } >> + } else { >> + new_tsr = tsr | TSR_ENW; >> + } >> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr); >> + >> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) { >> + smp_wmb(); >> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); >> + kvm_vcpu_kick(vcpu); >> + } >> + >> + /* >> + * Avoid getting a storm of timers if the guest sets >> + * the period very short. We'll restart it if anything >> + * changes. >> + */ >> + if (!final) >> + arm_next_watchdog(vcpu); > > Mind to explain this part a bit further? The whole function, or some subset near the end? The "if (!final)" check means that we stop running the timer after final expiration, to prevent the host from being flooded with timers if the guest sets a short period but does not have TCR set to exit to QEMU. Timers will resume the next time TSR/TCR is updated. >> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, >> } >> >> if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) { >> + u32 old_tsr = vcpu->arch.tsr; >> + >> vcpu->arch.tsr = sregs->u.e.tsr; >> + >> + if ((old_tsr ^ vcpu->arch.tsr) & >> + (TSR_ENW | TSR_WIS | TCR_WRC_MASK)) >> + arm_next_watchdog(vcpu); > > Why isn't this one guarded by vcpu->arch.watchdog_enable? I'm not sure that any of them should be -- there's no reason for the watchdog interrupt mechanism to be dependent on QEMU, only the heavyweight exit on final expiration. >> + spr_val &= ~TCR_WRC_MASK; >> + kvmppc_set_tcr(vcpu, >> + spr_val | (TCR_WRC_MASK & vcpu->arch.tcr)); > > In fact, what you're trying to do here is keep TCR_WRC always on when it was enabled once. So all you need is the OR here. No need for the mask above. WRC is a 2-bit field that is supposed to preserve its value once written to be non-zero. Not that we actually do anything different based on the specific non-zero value, but still we should implement the architected semantics. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html