Re: Timing issue with QEMU/KVM and OpenBSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What does the APIC say it is calibrated too? I'm curious how the
output of https://github.com/openbsd/src/blob/master/sys/arch/amd64/amd64/lapic.c#L501
compares to reality.

On Wed, Jan 10, 2018 at 4:40 PM, Steve Rutherford
<srutherford@xxxxxxxxxx> wrote:
> That mode moves the x86 PIC, PIT, and IOAPIC up to userspace. I'm not
> particularly familiar with OpenBSD so I can't claim to know how these
> interact. If it's heavily reliant on the PIT for time, I could imagine
> things going wrong (though that is just speculation on my part).
>
> On Wed, Jan 10, 2018 at 3:30 PM, Andrew Davis <adavis@xxxxxxxxxxxxxxx> wrote:
>> Hello again,
>>
>> Just a follow-up with some more info. One of my vendors suggested a
>> different workaround for the clock issue.
>>
>> "-machine accel=kvm,kernel_irqchip=off"
>>
>> Setting "kernel_irqchip=off" seems to resolve the timing issue. I ran this
>> for roughly 2 hrs on my test hardware and didn't experience the timing issue
>> (normally it becomes present after 10 mins). However, as far as I
>> understand, this flag moves some interrupt handling to a userspace emulator
>> which results in lower overall performance.
>>
>> This is what I noticed with the different irqchip modes:
>>
>> Without kernel_irqchip specified:    timing issue became present after 10
>> mins.
>> With kernel_irqchip=off:             no timing issue present, ran for 2 hrs.
>> With kernel_irqchip=split:           timing issue became present after 9
>> mins.
>>
>> Best regards,
>> Andrew
>>
>>
>>
>> On 1/3/2018 7:19 PM, Paolo Bonzini wrote:
>>>
>>> On 04/01/2018 01:01, Andrew Davis wrote:
>>>>
>>>> Hello again,
>>>>
>>>> I'm glad to see that you were able to make sense of those test samples
>>>> :).
>>>>
>>>> Is that behavior with the preemption timer normal at the KVM level? (I'm
>>>> guessing not!).
>>>
>>> Indeed, not at all. :)  It's probably some kind of rounding error that
>>> gets bigger and bigger, one bit at a time, or an overflow.
>>>
>>> Paolo
>>>
>>>> OpenBSD has timer tick sysctl variables, but they can't be changed
>>>> without recompiling the kernel. Here is what the default layout looks
>>>> like on OpenBSD 6.1:
>>>>
>>>>      # sysctl -a | grep -i clockrate
>>>>      kern.clockrate=tick = 10000, tickadj = 40, hz = 100, profhz = 100,
>>>> stathz = 100
>>>>
>>>> Best regards,
>>>> Andrew
>>>>
>>>>
>>>> On 1/2/2018 7:02 PM, Paolo Bonzini wrote:
>>>>>
>>>>> On 02/01/2018 20:02, Andrew Davis wrote:
>>>>>>
>>>>>> Here's the post-boot/2sec/4sec experiment run again, with "trace-cmd
>>>>>> record -e kvm".
>>>>>>
>>>>>> http://45.63.6.248/kvm3/trace-01-boot.tar.gz
>>>>>> http://45.63.6.248/kvm3/trace-02-2sec.tar.gz
>>>>>> http://45.63.6.248/kvm3/trace-03-4sec.tar.gz
>>>>>> http://45.63.6.248/kvm3/env.txt
>>>>>
>>>>> That's pretty interesting.  The normal trace looks like this (one CPU
>>>>> only and with some traces omitted):
>>>>>
>>>>>                CPU-1135  [014] 440078.978674: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 440078.988645: kvm_vcpu_wakeup:
>>>>> wait time 9967011 ns, polling valid
>>>>>                CPU-1135  [014] 440078.988660: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 440078.998645: kvm_vcpu_wakeup:
>>>>> wait time 9980730 ns, polling valid
>>>>>                CPU-1135  [014] 440078.998660: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 440079.008646: kvm_vcpu_wakeup:
>>>>> wait time 9982416 ns, polling valid
>>>>>                CPU-1135  [014] 440079.008664: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 440079.018644: kvm_vcpu_wakeup:
>>>>> wait time 9976479 ns, polling valid
>>>>>                CPU-1135  [014] 440079.018659: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 440079.028644: kvm_vcpu_wakeup:
>>>>> wait time 9981113 ns, polling valid
>>>>>                CPU-1135  [014] 440079.028659: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 440079.038648: kvm_vcpu_wakeup:
>>>>> wait time 9985185 ns, polling valid
>>>>>                CPU-1135  [014] 440079.038663: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 440079.048646: kvm_vcpu_wakeup:
>>>>> wait time 9979267 ns, polling valid
>>>>>
>>>>> Then, it looks like this:
>>>>>
>>>>>                CPU-1135  [027] 440530.308689: kvm_vcpu_wakeup:
>>>>> wait time 19968272 ns, polling valid
>>>>>                CPU-1135  [027] 440530.308696: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [027] 440530.308704: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [027] 440530.328690: kvm_vcpu_wakeup:
>>>>> wait time 19984070 ns, polling valid
>>>>>                CPU-1135  [027] 440530.328698: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [027] 440530.328705: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [027] 440530.348691: kvm_vcpu_wakeup:
>>>>> wait time 19982890 ns, polling valid
>>>>>                CPU-1135  [027] 440530.348699: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [027] 440530.348707: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [027] 440530.368691: kvm_vcpu_wakeup:
>>>>> wait time 19981827 ns, polling valid
>>>>>                CPU-1135  [027] 440530.368699: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [027] 440530.368707: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [027] 440530.388692: kvm_vcpu_wakeup:
>>>>> wait time 19981959 ns, polling valid
>>>>>
>>>>> And then it looks like this:
>>>>>
>>>>>                CPU-1135  [014] 441652.963757: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441652.963764: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441652.963768: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441652.963775: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 441653.003745: kvm_vcpu_wakeup:
>>>>> wait time 39964879 ns, polling valid
>>>>>                CPU-1135  [014] 441653.003758: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441653.003765: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441653.003772: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441653.003783: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 441653.043745: kvm_vcpu_wakeup:
>>>>> wait time 39957085 ns, polling valid
>>>>>                CPU-1135  [014] 441653.043765: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441653.043770: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441653.043774: kvm_exit:
>>>>> reason PREEMPTION_TIMER rip 0xffffffff816a0b6c info 0 0
>>>>>                CPU-1135  [014] 441653.043783: kvm_exit:
>>>>> reason HLT rip 0xffffffff816a0b6b info 0 0
>>>>>                CPU-1135  [014] 441653.083734: kvm_vcpu_wakeup:
>>>>> wait time 39945744 ns, polling valid
>>>>>
>>>>> So the preemption timer is set to a very small value after some time.
>>>>> This causes
>>>>> the deadline to be advanced, and the next HLT to wake up after twice
>>>>> the correct
>>>>> deadline.
>>>>>
>>>>> Then the premption timer is set to a very small value for three times
>>>>> in a row,
>>>>> each of which advances the deadline; the next HLT then wakes up after
>>>>> four times
>>>>> the regular time.
>>>>>
>>>>> This validates your observation that disabling the preemption timer
>>>>> works around
>>>>> the bug.  I'll take a look and try to create a test case.  Out of
>>>>> curiosity, can
>>>>> the OpenBSD timer tick can be adjusted?  If so, does the bug trigger
>>>>> faster if
>>>>> you increase the frequency (or slower if you decrease it)?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Paolo
>>
>>



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux