Hello again,
I'm glad to see that you were able to make sense of those test samples :).
Is that behavior with the preemption timer normal at the KVM level? (I'm
guessing not!).
OpenBSD has timer tick sysctl variables, but they can't be changed
without recompiling the kernel. Here is what the default layout looks
like on OpenBSD 6.1:
# sysctl -a | grep -i clockrate
kern.clockrate=tick = 10000, tickadj = 40, hz = 100, profhz = 100,
stathz = 100
Best regards,
Andrew
On 1/2/2018 7:02 PM, Paolo Bonzini wrote:
On 02/01/2018 20:02, Andrew Davis wrote:
Here's the post-boot/2sec/4sec experiment run again, with "trace-cmd
record -e kvm".
http://45.63.6.248/kvm3/trace-01-boot.tar.gz
http://45.63.6.248/kvm3/trace-02-2sec.tar.gz
http://45.63.6.248/kvm3/trace-03-4sec.tar.gz
http://45.63.6.248/kvm3/env.txt
That's pretty interesting. The normal trace looks like this (one CPU
only and with some traces omitted):
CPU-1135 [014] 440078.978674: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 440078.988645: kvm_vcpu_wakeup: wait time 9967011 ns, polling valid
CPU-1135 [014] 440078.988660: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 440078.998645: kvm_vcpu_wakeup: wait time 9980730 ns, polling valid
CPU-1135 [014] 440078.998660: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 440079.008646: kvm_vcpu_wakeup: wait time 9982416 ns, polling valid
CPU-1135 [014] 440079.008664: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 440079.018644: kvm_vcpu_wakeup: wait time 9976479 ns, polling valid
CPU-1135 [014] 440079.018659: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 440079.028644: kvm_vcpu_wakeup: wait time 9981113 ns, polling valid
CPU-1135 [014] 440079.028659: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 440079.038648: kvm_vcpu_wakeup: wait time 9985185 ns, polling valid
CPU-1135 [014] 440079.038663: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 440079.048646: kvm_vcpu_wakeup: wait time 9979267 ns, polling valid
Then, it looks like this:
CPU-1135 [027] 440530.308689: kvm_vcpu_wakeup: wait time 19968272 ns, polling valid
CPU-1135 [027] 440530.308696: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [027] 440530.308704: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [027] 440530.328690: kvm_vcpu_wakeup: wait time 19984070 ns, polling valid
CPU-1135 [027] 440530.328698: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [027] 440530.328705: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [027] 440530.348691: kvm_vcpu_wakeup: wait time 19982890 ns, polling valid
CPU-1135 [027] 440530.348699: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [027] 440530.348707: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [027] 440530.368691: kvm_vcpu_wakeup: wait time 19981827 ns, polling valid
CPU-1135 [027] 440530.368699: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [027] 440530.368707: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [027] 440530.388692: kvm_vcpu_wakeup: wait time 19981959 ns, polling valid
And then it looks like this:
CPU-1135 [014] 441652.963757: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441652.963764: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441652.963768: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441652.963775: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 441653.003745: kvm_vcpu_wakeup: wait time 39964879 ns, polling valid
CPU-1135 [014] 441653.003758: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441653.003765: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441653.003772: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441653.003783: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 441653.043745: kvm_vcpu_wakeup: wait time 39957085 ns, polling valid
CPU-1135 [014] 441653.043765: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441653.043770: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441653.043774: kvm_exit: reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
CPU-1135 [014] 441653.043783: kvm_exit: reason HLT rip 0xffffffff816a0b6b info 0 0
CPU-1135 [014] 441653.083734: kvm_vcpu_wakeup: wait time 39945744 ns, polling valid
So the preemption timer is set to a very small value after some time. This causes
the deadline to be advanced, and the next HLT to wake up after twice the correct
deadline.
Then the premption timer is set to a very small value for three times in a row,
each of which advances the deadline; the next HLT then wakes up after four times
the regular time.
This validates your observation that disabling the preemption timer works around
the bug. I'll take a look and try to create a test case. Out of curiosity, can
the OpenBSD timer tick can be adjusted? If so, does the bug trigger faster if
you increase the frequency (or slower if you decrease it)?
Thanks,
Paolo