On 09.07.2012, at 19:09, Scott Wood wrote: > On 07/07/2012 02:50 AM, Alexander Graf wrote: >> >> On 07.07.2012, at 01:37, Scott Wood wrote: >> >>> On 07/06/2012 08:17 AM, Alexander Graf wrote: >>>> On 28.06.2012, at 08:17, Bharat Bhushan wrote: >>>>> +/* >>>>> + * The timer system can almost deal with LONG_MAX timeouts, except that >>>>> + * when you get very close to LONG_MAX, the slack added can cause overflow. >>>>> + * >>>>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for >>>>> + * any realistic use. >>>>> + */ >>>>> +#define MAX_TIMEOUT (LONG_MAX/2) >>>> >>>> Should this really be in kvm code? >>> >>> It looks like we can use NEXT_TIMER_MAX_DELTA for this. >>> >>>>> + mask = 1ULL << (63 - period); >>>>> + tb = get_tb(); >>>>> + if (tb & mask) >>>>> + nr_jiffies += mask; >>>> >>>> To be honest, you lost me here. nr_jiffies is jiffies, right? While >>>> mask is basically in timebase granularity. I suppose you're just >>>> reusing the variable here for no good reason - the compiler will >>>> gladly optimize things for you if you write things a bit more verbose >>>> :). >>> >>> Probably due to the way do_div() works, but yeah, it's confusing. Maybe >>> something generic like "ticks", "interval", "remaining", etc. would be >>> better, with a comment on the do_div saying it's converting timebase >>> ticks into jiffies. >> >> Well, you could start off with a variable "delta_tb", then do >> >> nr_jiffies = delta_tb; >> x = do_div(...); >> >> and things would suddenly become readable :). Of course I don't object to comments along the code either :). >> >>> >>>>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu) >>>>> +{ >>>>> + unsigned long nr_jiffies; >>>>> + >>>>> + nr_jiffies = watchdog_next_timeout(vcpu); >>>>> + if (nr_jiffies < MAX_TIMEOUT) >>>>> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies); >>>>> + else >>>>> + del_timer(&vcpu->arch.wdt_timer); >>>> >>>> Can you del a timer that's not armed? Could that ever happen in this case? >>> >>> "del_timer() deactivates a timer - this works on both active and >>> inactive timers." >> >> Ah, good :). >> >>> >>>> Also, could the timer possibly be running somewhere, so do we need del_timer_sync? Or don't we need to care? >>> >>> This can be called in the context of the timer, so del_timer_sync() >>> would hang. >>> >>> As for what would happen if a caller from a different context were to >>> race with a timer, I think you could end up with the timer armed based >>> on an old TCR. del_timer_sync() won't help though, unless you replace >>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see >>> whether it's running in timer context). A better solution is probably >>> to use a spinlock in arm_next_watchdog(). >> >> Yup. Either way, we have a race that the guest might not expect. >> >>> >>>>> +void kvmppc_watchdog_func(unsigned long data) >>>>> +{ >>>>> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; >>>>> + u32 tsr, new_tsr; >>>>> + int final; >>>>> + >>>>> + do { >>>>> + new_tsr = tsr = vcpu->arch.tsr; >>>>> + final = 0; >>>>> + >>>>> + /* Time out event */ >>>>> + if (tsr & TSR_ENW) { >>>>> + if (tsr & TSR_WIS) { >>>>> + new_tsr = (tsr & ~TCR_WRC_MASK) | >>>>> + (vcpu->arch.tcr & TCR_WRC_MASK); >>>>> + vcpu->arch.tcr &= ~TCR_WRC_MASK; >>>>> + final = 1; >>>>> + } else { >>>>> + new_tsr = tsr | TSR_WIS; >>>>> + } >>>>> + } else { >>>>> + new_tsr = tsr | TSR_ENW; >>>>> + } >>>>> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr); >>>>> + >>>>> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) { >>>>> + smp_wmb(); >>>>> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); >>>>> + kvm_vcpu_kick(vcpu); >>>>> + } >>>>> + >>>>> + /* >>>>> + * Avoid getting a storm of timers if the guest sets >>>>> + * the period very short. We'll restart it if anything >>>>> + * changes. >>>>> + */ >>>>> + if (!final) >>>>> + arm_next_watchdog(vcpu); >>>> >>>> Mind to explain this part a bit further? >>> >>> The whole function, or some subset near the end? >>> >>> The "if (!final)" check means that we stop running the timer after final >>> expiration, to prevent the host from being flooded with timers if the >>> guest sets a short period but does not have TCR set to exit to QEMU. >>> Timers will resume the next time TSR/TCR is updated. >> >> Ah. The semantics make sense. The comment however is slightly too short. Please explain this in a more verbose way, so someone who didn't write the code knows what's going on :). >> >>> >>>>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, >>>>> } >>>>> >>>>> if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) { >>>>> + u32 old_tsr = vcpu->arch.tsr; >>>>> + >>>>> vcpu->arch.tsr = sregs->u.e.tsr; >>>>> + >>>>> + if ((old_tsr ^ vcpu->arch.tsr) & >>>>> + (TSR_ENW | TSR_WIS | TCR_WRC_MASK)) >>>>> + arm_next_watchdog(vcpu); >>>> >>>> Why isn't this one guarded by vcpu->arch.watchdog_enable? >>> >>> I'm not sure that any of them should be -- there's no reason for the >>> watchdog interrupt mechanism to be dependent on QEMU, only the >>> heavyweight exit on final expiration. >> >> Well - I like the concept of having new features switchable. > > The code quickly becomes an unmantainable mess if every new feature is > switchable (and where is the line between new feature and bugfix when it > comes to filling in missing emulation?). The reason for switchability > in this case is limited to QEMU interface compatibility. > > The internal-to-KVM part of the watchdog isn't much different from the > FIT -- we're not going to conditionalize that, are we? > >> Overlapping the "watchdog is implemented" feature with "user space >> wants watchdog exits" makes sense. But I definitely want to have a >> switch for the former, because we otherwise differ quite >> substantially from the emulation we had before. > > It differs in that it is now a bit closer to being correct. If we can maintain user space backwards compatibility, I'm fine with not switching it, since we're effectively fixing a bug wrt the implementation of the announced features, yes. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html