On 09.07.2012, at 07:13, Bhushan Bharat-R65777 <R65777@xxxxxxxxxxxxx> wrote: > > >> -----Original Message----- >> From: Alexander Graf [mailto:agraf@xxxxxxx] >> Sent: Saturday, July 07, 2012 1:21 PM >> To: Wood Scott-B07421 >> Cc: Bhushan Bharat-R65777; kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Bhushan >> Bharat-R65777 >> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation >> >> >> On 07.07.2012, at 01:37, Scott Wood wrote: >> >>> On 07/06/2012 08:17 AM, Alexander Graf wrote: >>>> On 28.06.2012, at 08:17, Bharat Bhushan wrote: >>>>> +/* >>>>> + * The timer system can almost deal with LONG_MAX timeouts, except that >>>>> + * when you get very close to LONG_MAX, the slack added can cause overflow. >>>>> + * >>>>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for >>>>> + * any realistic use. >>>>> + */ >>>>> +#define MAX_TIMEOUT (LONG_MAX/2) >>>> >>>> Should this really be in kvm code? >>> >>> It looks like we can use NEXT_TIMER_MAX_DELTA for this. >>> >>>>> + mask = 1ULL << (63 - period); >>>>> + tb = get_tb(); >>>>> + if (tb & mask) >>>>> + nr_jiffies += mask; >>>> >>>> To be honest, you lost me here. nr_jiffies is jiffies, right? While >>>> mask is basically in timebase granularity. I suppose you're just >>>> reusing the variable here for no good reason - the compiler will >>>> gladly optimize things for you if you write things a bit more verbose >>>> :). >>> >>> Probably due to the way do_div() works, but yeah, it's confusing. Maybe >>> something generic like "ticks", "interval", "remaining", etc. would be >>> better, with a comment on the do_div saying it's converting timebase >>> ticks into jiffies. >> >> Well, you could start off with a variable "delta_tb", then do >> >> nr_jiffies = delta_tb; >> x = do_div(...); >> >> and things would suddenly become readable :). Of course I don't object to >> comments along the code either :). > > Ok > >> >>> >>>>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu) >>>>> +{ >>>>> + unsigned long nr_jiffies; >>>>> + >>>>> + nr_jiffies = watchdog_next_timeout(vcpu); >>>>> + if (nr_jiffies < MAX_TIMEOUT) >>>>> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies); >>>>> + else >>>>> + del_timer(&vcpu->arch.wdt_timer); >>>> >>>> Can you del a timer that's not armed? Could that ever happen in this case? >>> >>> "del_timer() deactivates a timer - this works on both active and >>> inactive timers." >> >> Ah, good :). >> >>> >>>> Also, could the timer possibly be running somewhere, so do we need >> del_timer_sync? Or don't we need to care? >>> >>> This can be called in the context of the timer, so del_timer_sync() >>> would hang. >>> >>> As for what would happen if a caller from a different context were to >>> race with a timer, I think you could end up with the timer armed based >>> on an old TCR. del_timer_sync() won't help though, unless you replace >>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see >>> whether it's running in timer context). A better solution is probably >>> to use a spinlock in arm_next_watchdog(). >> >> Yup. Either way, we have a race that the guest might not expect. > > Ok, will use spinlock in arm_next_watchdog(). > >> >>> >>>>> +void kvmppc_watchdog_func(unsigned long data) >>>>> +{ >>>>> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; >>>>> + u32 tsr, new_tsr; >>>>> + int final; >>>>> + >>>>> + do { >>>>> + new_tsr = tsr = vcpu->arch.tsr; >>>>> + final = 0; >>>>> + >>>>> + /* Time out event */ >>>>> + if (tsr & TSR_ENW) { >>>>> + if (tsr & TSR_WIS) { >>>>> + new_tsr = (tsr & ~TCR_WRC_MASK) | >>>>> + (vcpu->arch.tcr & TCR_WRC_MASK); >>>>> + vcpu->arch.tcr &= ~TCR_WRC_MASK; >>>>> + final = 1; >>>>> + } else { >>>>> + new_tsr = tsr | TSR_WIS; >>>>> + } >>>>> + } else { >>>>> + new_tsr = tsr | TSR_ENW; >>>>> + } >>>>> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr); >>>>> + >>>>> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) { >>>>> + smp_wmb(); >>>>> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); >>>>> + kvm_vcpu_kick(vcpu); >>>>> + } >>>>> + >>>>> + /* >>>>> + * Avoid getting a storm of timers if the guest sets >>>>> + * the period very short. We'll restart it if anything >>>>> + * changes. >>>>> + */ >>>>> + if (!final) >>>>> + arm_next_watchdog(vcpu); >>>> >>>> Mind to explain this part a bit further? >>> >>> The whole function, or some subset near the end? >>> >>> The "if (!final)" check means that we stop running the timer after final >>> expiration, to prevent the host from being flooded with timers if the >>> guest sets a short period but does not have TCR set to exit to QEMU. >>> Timers will resume the next time TSR/TCR is updated. >> >> Ah. The semantics make sense. The comment however is slightly too short. Please >> explain this in a more verbose way, so someone who didn't write the code knows >> what's going on :). > > Ok. > >> >>> >>>>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, >>>>> } >>>>> >>>>> if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) { >>>>> + u32 old_tsr = vcpu->arch.tsr; >>>>> + >>>>> vcpu->arch.tsr = sregs->u.e.tsr; >>>>> + >>>>> + if ((old_tsr ^ vcpu->arch.tsr) & >>>>> + (TSR_ENW | TSR_WIS | TCR_WRC_MASK)) >>>>> + arm_next_watchdog(vcpu); >>>> >>>> Why isn't this one guarded by vcpu->arch.watchdog_enable? >>> >>> I'm not sure that any of them should be -- there's no reason for the >>> watchdog interrupt mechanism to be dependent on QEMU, only the >>> heavyweight exit on final expiration. >> >> Well - I like the concept of having new features switchable. Overlapping the >> "watchdog is implemented" feature with "user space wants watchdog exits" makes >> sense. But I definitely want to have a switch for the former, because we >> otherwise differ quite substantially from the emulation we had before. > > Ok, will guard with watchdog_enable. > >> >>> >>>>> + spr_val &= ~TCR_WRC_MASK; >>>>> + kvmppc_set_tcr(vcpu, >>>>> + spr_val | (TCR_WRC_MASK & vcpu->arch.tcr)); >>>> >>>> In fact, what you're trying to do here is keep TCR_WRC always on when it was >> enabled once. So all you need is the OR here. No need for the mask above. >>> >>> WRC is a 2-bit field that is supposed to preserve its value once written >>> to be non-zero. Not that we actually do anything different based on the >>> specific non-zero value, but still we should implement the architected >>> semantics. >> >> Ah, being 2 bits wide, the above code suddenly makes more sense :). How about >> >> /* WRC is a 2-bit field that is supposed to preserve its value once written to >> be non-zero */ >> spr_val &= ~TCR_WRC_MASK; >> spr_val |= vcpu->arch.tcr & TCR_WRC_MASK; >> kvmppc_set_tcr(vcpu, spr_val); > > I think you mean: > > if (TCR_WRC_MASK & vcpu->arch.tcr) { > spr_val &= ~TCR_WRC_MASK; > spr_val |= vcpu->arch.tcr & TCR_WRC_MASK; > } > kvmppc_set_tcr(vcpu, spr_val); Eh, yes, of course :). Plus the comment. Alex > > Thanks > -Bharat > >> >> >> Alex >> > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html