Re: 2.6.35-rc1 regression with pvclock and smp guests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Zachary Amsden wrote:
On 07/27/2010 11:51 AM, Andre Przywara wrote:
Andre Przywara wrote:
Avi Kivity wrote:
  On 07/27/2010 04:48 PM, Andre Przywara wrote:
Wierd.  Maybe the clock goes crazy.

Let's see if it jumps forward alot:

         } while (unlikely(last != ret));
+
+       {
+            static u64 last_report;
+            if (ret > last_report + 10000) {
+                    last_report = ret;
+                    printk("kvmclock: %llx\n", ret);
+            }
+
+       }

         return ret;
  }

Worth updating the 'return last' to update ret and goto the new code, so we don't miss that path.
Did that. There is _a lot_ of output (about 350 lines per second via the 115k serial console), both with smp=1 and smp=2. The majority is differing about 2,000,000 (ticks?), but a handful of them are in the range of 20 million.
nanoseconds.  So 2-20ms.  Consistent with 350 lines/sec.

No difference between smp=2 and smp=1.
I also get some "BUG: recent printk recursion!" and I don't see any kernel boot progress beyond outputting the BogoMIPS value.
Right, printk() wants the time too.

BTW: I found two message from your earlier debug statement:
[    0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
[    0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
Those are from kvmclock initialization, not from the older patch.

I'm completely confused, everything seems to be in order.

Let's see. if you s/return last/return ret/ in the original, does this help things along? this makes pvclock drop the computation and should be exactly the same as before the patch.
Yes, this works, both smp version boot. I see a short very short break after the line in question, but then it proceeds well. Thanks for your help, now I got a much better insight into the issue. I will see if I can find something more.
Did some more investigations, some observations:
- The cmpxchg does not seem to be a problem, I didn't see the loop iterated more than once. - Turning off printk-timestamps makes the bug go away. But I guess it is just hiding or deferring it, and it's no real workaround anyway. - I instrumented the "if (ret < last) return last;" statement, when the kernel hangs I get only printks from there, although it has hit before:
----------
[    0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.820000] returning last instead (cnt=19001)
[    0.820000] returning last instead (cnt=20001)
The last line repeats forever with the same timestamp, the counter (counting the number of "return last;") increments about 3500 times/second.

I will see if I find something more...


gcc --version?
That would be 4.3.3
I compiled the guest kernel with 4.4.4 also, that made no difference.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux