> -----Original Message----- > From: Alexander Graf [mailto:agraf@xxxxxxx] > Sent: Saturday, July 07, 2012 1:21 PM > To: Wood Scott-B07421 > Cc: Bhushan Bharat-R65777; kvm-ppc@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Bhushan > Bharat-R65777 > Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation > > > On 07.07.2012, at 01:37, Scott Wood wrote: > > > On 07/06/2012 08:17 AM, Alexander Graf wrote: > >> On 28.06.2012, at 08:17, Bharat Bhushan wrote: > >>> +/* > >>> + * The timer system can almost deal with LONG_MAX timeouts, except that > >>> + * when you get very close to LONG_MAX, the slack added can cause overflow. > >>> + * > >>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for > >>> + * any realistic use. > >>> + */ > >>> +#define MAX_TIMEOUT (LONG_MAX/2) > >> > >> Should this really be in kvm code? > > > > It looks like we can use NEXT_TIMER_MAX_DELTA for this. > > > >>> + mask = 1ULL << (63 - period); > >>> + tb = get_tb(); > >>> + if (tb & mask) > >>> + nr_jiffies += mask; > >> > >> To be honest, you lost me here. nr_jiffies is jiffies, right? While > >> mask is basically in timebase granularity. I suppose you're just > >> reusing the variable here for no good reason - the compiler will > >> gladly optimize things for you if you write things a bit more verbose > >> :). > > > > Probably due to the way do_div() works, but yeah, it's confusing. Maybe > > something generic like "ticks", "interval", "remaining", etc. would be > > better, with a comment on the do_div saying it's converting timebase > > ticks into jiffies. > > Well, you could start off with a variable "delta_tb", then do > > nr_jiffies = delta_tb; > x = do_div(...); > > and things would suddenly become readable :). Of course I don't object to > comments along the code either :). Ok > > > > >>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu) > >>> +{ > >>> + unsigned long nr_jiffies; > >>> + > >>> + nr_jiffies = watchdog_next_timeout(vcpu); > >>> + if (nr_jiffies < MAX_TIMEOUT) > >>> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies); > >>> + else > >>> + del_timer(&vcpu->arch.wdt_timer); > >> > >> Can you del a timer that's not armed? Could that ever happen in this case? > > > > "del_timer() deactivates a timer - this works on both active and > > inactive timers." > > Ah, good :). > > > > >> Also, could the timer possibly be running somewhere, so do we need > del_timer_sync? Or don't we need to care? > > > > This can be called in the context of the timer, so del_timer_sync() > > would hang. > > > > As for what would happen if a caller from a different context were to > > race with a timer, I think you could end up with the timer armed based > > on an old TCR. del_timer_sync() won't help though, unless you replace > > mod_timer() with del_timer_sync() plus add_timer() (with a check to see > > whether it's running in timer context). A better solution is probably > > to use a spinlock in arm_next_watchdog(). > > Yup. Either way, we have a race that the guest might not expect. Ok, will use spinlock in arm_next_watchdog(). > > > > >>> +void kvmppc_watchdog_func(unsigned long data) > >>> +{ > >>> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; > >>> + u32 tsr, new_tsr; > >>> + int final; > >>> + > >>> + do { > >>> + new_tsr = tsr = vcpu->arch.tsr; > >>> + final = 0; > >>> + > >>> + /* Time out event */ > >>> + if (tsr & TSR_ENW) { > >>> + if (tsr & TSR_WIS) { > >>> + new_tsr = (tsr & ~TCR_WRC_MASK) | > >>> + (vcpu->arch.tcr & TCR_WRC_MASK); > >>> + vcpu->arch.tcr &= ~TCR_WRC_MASK; > >>> + final = 1; > >>> + } else { > >>> + new_tsr = tsr | TSR_WIS; > >>> + } > >>> + } else { > >>> + new_tsr = tsr | TSR_ENW; > >>> + } > >>> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr); > >>> + > >>> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) { > >>> + smp_wmb(); > >>> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); > >>> + kvm_vcpu_kick(vcpu); > >>> + } > >>> + > >>> + /* > >>> + * Avoid getting a storm of timers if the guest sets > >>> + * the period very short. We'll restart it if anything > >>> + * changes. > >>> + */ > >>> + if (!final) > >>> + arm_next_watchdog(vcpu); > >> > >> Mind to explain this part a bit further? > > > > The whole function, or some subset near the end? > > > > The "if (!final)" check means that we stop running the timer after final > > expiration, to prevent the host from being flooded with timers if the > > guest sets a short period but does not have TCR set to exit to QEMU. > > Timers will resume the next time TSR/TCR is updated. > > Ah. The semantics make sense. The comment however is slightly too short. Please > explain this in a more verbose way, so someone who didn't write the code knows > what's going on :). Ok. > > > > >>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, > >>> } > >>> > >>> if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) { > >>> + u32 old_tsr = vcpu->arch.tsr; > >>> + > >>> vcpu->arch.tsr = sregs->u.e.tsr; > >>> + > >>> + if ((old_tsr ^ vcpu->arch.tsr) & > >>> + (TSR_ENW | TSR_WIS | TCR_WRC_MASK)) > >>> + arm_next_watchdog(vcpu); > >> > >> Why isn't this one guarded by vcpu->arch.watchdog_enable? > > > > I'm not sure that any of them should be -- there's no reason for the > > watchdog interrupt mechanism to be dependent on QEMU, only the > > heavyweight exit on final expiration. > > Well - I like the concept of having new features switchable. Overlapping the > "watchdog is implemented" feature with "user space wants watchdog exits" makes > sense. But I definitely want to have a switch for the former, because we > otherwise differ quite substantially from the emulation we had before. Ok, will guard with watchdog_enable. > > > > >>> + spr_val &= ~TCR_WRC_MASK; > >>> + kvmppc_set_tcr(vcpu, > >>> + spr_val | (TCR_WRC_MASK & vcpu->arch.tcr)); > >> > >> In fact, what you're trying to do here is keep TCR_WRC always on when it was > enabled once. So all you need is the OR here. No need for the mask above. > > > > WRC is a 2-bit field that is supposed to preserve its value once written > > to be non-zero. Not that we actually do anything different based on the > > specific non-zero value, but still we should implement the architected > > semantics. > > Ah, being 2 bits wide, the above code suddenly makes more sense :). How about > > /* WRC is a 2-bit field that is supposed to preserve its value once written to > be non-zero */ > spr_val &= ~TCR_WRC_MASK; > spr_val |= vcpu->arch.tcr & TCR_WRC_MASK; > kvmppc_set_tcr(vcpu, spr_val); I think you mean: if (TCR_WRC_MASK & vcpu->arch.tcr) { spr_val &= ~TCR_WRC_MASK; spr_val |= vcpu->arch.tcr & TCR_WRC_MASK; } kvmppc_set_tcr(vcpu, spr_val); Thanks -Bharat > > > Alex > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html