On 07.07.2012, at 01:37, Scott Wood wrote: > On 07/06/2012 08:17 AM, Alexander Graf wrote: >> On 28.06.2012, at 08:17, Bharat Bhushan wrote: >>> +/* >>> + * The timer system can almost deal with LONG_MAX timeouts, except that >>> + * when you get very close to LONG_MAX, the slack added can cause overflow. >>> + * >>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for >>> + * any realistic use. >>> + */ >>> +#define MAX_TIMEOUT (LONG_MAX/2) >> >> Should this really be in kvm code? > > It looks like we can use NEXT_TIMER_MAX_DELTA for this. > >>> + mask = 1ULL << (63 - period); >>> + tb = get_tb(); >>> + if (tb & mask) >>> + nr_jiffies += mask; >> >> To be honest, you lost me here. nr_jiffies is jiffies, right? While >> mask is basically in timebase granularity. I suppose you're just >> reusing the variable here for no good reason - the compiler will >> gladly optimize things for you if you write things a bit more verbose >> :). > > Probably due to the way do_div() works, but yeah, it's confusing. Maybe > something generic like "ticks", "interval", "remaining", etc. would be > better, with a comment on the do_div saying it's converting timebase > ticks into jiffies. Well, you could start off with a variable "delta_tb", then do nr_jiffies = delta_tb; x = do_div(...); and things would suddenly become readable :). Of course I don't object to comments along the code either :). > >>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu) >>> +{ >>> + unsigned long nr_jiffies; >>> + >>> + nr_jiffies = watchdog_next_timeout(vcpu); >>> + if (nr_jiffies < MAX_TIMEOUT) >>> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies); >>> + else >>> + del_timer(&vcpu->arch.wdt_timer); >> >> Can you del a timer that's not armed? Could that ever happen in this case? > > "del_timer() deactivates a timer - this works on both active and > inactive timers." Ah, good :). > >> Also, could the timer possibly be running somewhere, so do we need del_timer_sync? Or don't we need to care? > > This can be called in the context of the timer, so del_timer_sync() > would hang. > > As for what would happen if a caller from a different context were to > race with a timer, I think you could end up with the timer armed based > on an old TCR. del_timer_sync() won't help though, unless you replace > mod_timer() with del_timer_sync() plus add_timer() (with a check to see > whether it's running in timer context). A better solution is probably > to use a spinlock in arm_next_watchdog(). Yup. Either way, we have a race that the guest might not expect. > >>> +void kvmppc_watchdog_func(unsigned long data) >>> +{ >>> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; >>> + u32 tsr, new_tsr; >>> + int final; >>> + >>> + do { >>> + new_tsr = tsr = vcpu->arch.tsr; >>> + final = 0; >>> + >>> + /* Time out event */ >>> + if (tsr & TSR_ENW) { >>> + if (tsr & TSR_WIS) { >>> + new_tsr = (tsr & ~TCR_WRC_MASK) | >>> + (vcpu->arch.tcr & TCR_WRC_MASK); >>> + vcpu->arch.tcr &= ~TCR_WRC_MASK; >>> + final = 1; >>> + } else { >>> + new_tsr = tsr | TSR_WIS; >>> + } >>> + } else { >>> + new_tsr = tsr | TSR_ENW; >>> + } >>> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr); >>> + >>> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) { >>> + smp_wmb(); >>> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); >>> + kvm_vcpu_kick(vcpu); >>> + } >>> + >>> + /* >>> + * Avoid getting a storm of timers if the guest sets >>> + * the period very short. We'll restart it if anything >>> + * changes. >>> + */ >>> + if (!final) >>> + arm_next_watchdog(vcpu); >> >> Mind to explain this part a bit further? > > The whole function, or some subset near the end? > > The "if (!final)" check means that we stop running the timer after final > expiration, to prevent the host from being flooded with timers if the > guest sets a short period but does not have TCR set to exit to QEMU. > Timers will resume the next time TSR/TCR is updated. Ah. The semantics make sense. The comment however is slightly too short. Please explain this in a more verbose way, so someone who didn't write the code knows what's going on :). > >>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, >>> } >>> >>> if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) { >>> + u32 old_tsr = vcpu->arch.tsr; >>> + >>> vcpu->arch.tsr = sregs->u.e.tsr; >>> + >>> + if ((old_tsr ^ vcpu->arch.tsr) & >>> + (TSR_ENW | TSR_WIS | TCR_WRC_MASK)) >>> + arm_next_watchdog(vcpu); >> >> Why isn't this one guarded by vcpu->arch.watchdog_enable? > > I'm not sure that any of them should be -- there's no reason for the > watchdog interrupt mechanism to be dependent on QEMU, only the > heavyweight exit on final expiration. Well - I like the concept of having new features switchable. Overlapping the "watchdog is implemented" feature with "user space wants watchdog exits" makes sense. But I definitely want to have a switch for the former, because we otherwise differ quite substantially from the emulation we had before. > >>> + spr_val &= ~TCR_WRC_MASK; >>> + kvmppc_set_tcr(vcpu, >>> + spr_val | (TCR_WRC_MASK & vcpu->arch.tcr)); >> >> In fact, what you're trying to do here is keep TCR_WRC always on when it was enabled once. So all you need is the OR here. No need for the mask above. > > WRC is a 2-bit field that is supposed to preserve its value once written > to be non-zero. Not that we actually do anything different based on the > specific non-zero value, but still we should implement the architected > semantics. Ah, being 2 bits wide, the above code suddenly makes more sense :). How about /* WRC is a 2-bit field that is supposed to preserve its value once written to be non-zero */ spr_val &= ~TCR_WRC_MASK; spr_val |= vcpu->arch.tcr & TCR_WRC_MASK; kvmppc_set_tcr(vcpu, spr_val); Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html