Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation

Alexander Graf <agraf@xxxxxxx> · Tue, 17 Jul 2012 14:51:56 +0200

On 07/17/2012 11:57 AM, Bhushan Bharat-R65777 wrote:

-----Original Message-----
From: kvm-ppc-owner@xxxxxxxxxxxxxxx [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On
Behalf Of Alexander Graf
Sent: Tuesday, July 17, 2012 12:50 PM
To: Wood Scott-B07421
Cc: Bhushan Bharat-R65777; <kvm-ppc@xxxxxxxxxxxxxxx>; <kvm@xxxxxxxxxxxxxxx>;
<bharatb.yadav@xxxxxxxxx>; Bhushan Bharat-R65777; Benjamin Herrenschmidt; Kumar
Gala
Subject: Re: [PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation

On 17.07.2012, at 03:02, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:

On 07/16/2012 12:18 PM, Alexander Graf wrote:
+/*
+ * Return the number of jiffies until the next timeout.  If the
timeout is
+ * longer than the NEXT_TIMER_MAX_DELTA, that
then?

return NEXT_TIMER_MAX_DELTA
+ * instead.
I can read code.
Come on, it's not exactly x++; /* add one to x */

It's faster to read code (as well as know the constraints within which
you can modify it without having to spend a lot of time digesting all
the callers' use cases) when you have a high level description of its
interface contract, and can be selective about when to zoom in to the
details.  Linux kernel code tends to be bad about this.
Yeah, not opposed to leave that part in :).

The important piece of information in the comment is
missing: The reason.
The reason for what?  Why you want to know the next timeout?  That's
the caller's business.  Or why we use NEXT_TIMER_MAX_DELTA as the limit?
Why we use the limit. IIRC it was explained in the last thread, just didn't make
its way into the comment.
Earlier we have a comment on the #define MAX_TIMEOUT (new define added for a purpose, so the comment described the puspose).
Now we uses the generic #define NEX_TIMER_MAX_DELTA (include/linux/timer.h), so removed the comment.

Ah, ok. Just saying, if you comment on some mechanism, like you did 
here, please also include the reasoning behind it. For example

  Do foo if x is true.

isn't particularly helpful. However

  Do foo if x is true because the bar API will break with high values

is very helpful. It includes the action and reason of the code :). 
Alternatively, to me the same as above would be

  /* bar API will break with high values */
  if (x)
    do(foo)

because in this case the code is the action description. Either variant 
works fine for me.

+void kvmppc_watchdog_func(unsigned long data) {
+    struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
+    u32 tsr, new_tsr;
+    int final;
+
+    do {
+        new_tsr = tsr = vcpu->arch.tsr;
+        final = 0;
+
+        /* Time out event */
+        if (tsr & TSR_ENW) {
+            if (tsr & TSR_WIS) {
+                new_tsr = (tsr & ~TCR_WRC_MASK) |
+                      (vcpu->arch.tcr & TCR_WRC_MASK);
+                vcpu->arch.tcr &= ~TCR_WRC_MASK;
Can't we just poke the vcpu to exit the VM and do the above on its own?
We've discussed this before.  TSR updates are done via atomics, and we
send a request for the vcpu to act on the result.  This is how the
decrementer works.

http://www.spinics.net/lists/kvm-ppc/msg03169.html
Yeah, the major difference to the dec is the atomicity of the whole thing. Dec
changes one bit to enable the interrupt line. The final expiration is more
complex.
Is not setting the TSR.WRS atomic here (cmpxchg() will handle this)?

Final expiration sets TCR. TSR should be ok.

This is the watdog expired case, right?
Final expiration, yes.

I'd also prefer to have an
explicit event for the expiry than a special TSR check in the main loop.
So check TSR[WRS] in update_timer_ints(), and have it queue a
pseudoexception?
Or here.
Do we mean define a sudo IROPRIO for final expiry.

We can also define an event that is sent through kvm_make_request. But 
yeah, IRQPRIO is probably easier. Not 100% sure which way is better 
though. Avi, any preferences?

That would eliminate the need to change the runnable function.

Also call me sceptic on the reset of tcr. If our user space watchdog
event is "write a message", then we essentially want to hide the fact
that the watchdog expired from the guest, right? In that case, the
second time-out wouldn't do anything guest visible.
This was probably copied straight out of the hardware documentation,
which explicitly says TCR[WRC] gets set to zero on final expiration
(as part of reset).  We should leave that part up to userspace.  It
definitely shouldn't be done inside the cmpxchg loop (or from
interrupt context -- only TSR gets the atomic treatment).  I don't
think the read of TCR outside vcpu context is a problem, though.
Yeah, but it'd just make me less wary if only the vcpu thread itself accesses
vcpu internal registers that aren't irq state and thus designed for it (TSR).

But yes, the most flexible way would probably be to do it from user space. Since
it'd happen from within the vcpu context of user space, we can also guarantee
that the TCR access is atomic.
Yes, will move the tcr.wrc clearing to userspace.

int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) {
-    return !(v->arch.shared->msr & MSR_WE) ||
-           !!(v->arch.pending_exceptions) ||
-           v->requests;
+    bool ret = !(v->arch.shared->msr & MSR_WE) ||
+           !!(v->arch.pending_exceptions) ||
+           v->requests;
+
+    ret = ret || kvmppc_get_tsr_wrc(v);
Why do you need to declare the cpu as non-runnable when a watchdog
event occured?
It's the other way around -- it's always runnable when a watchdog exit
is pending.  It's like a pending exception.
Ah, so yes, we should just shove it into pending_exceptions then.
Pending_exception? You mean sudo again here as said earlier.

pseudo :). Yeah, I'm referring to above. No need to check 500 different 
conditions when we already have a bitmap that says "event is pending".

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html