Re: kvmclock doesn't work, help?

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Wed, 9 Dec 2015 14:43:47 -0800

On Wed, Dec 9, 2015 at 2:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>>
>>
>> On 09/12/2015 22:49, Andy Lutomirski wrote:
>>> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>> On 09/12/2015 22:10, Andy Lutomirski wrote:
>>>>> Can we please stop making kvmclock more complex?  It's a beast right
>>>>> now, and not in a good way.  It's far too tangled with the vclock
>>>>> machinery on both the host and guest sides, the pvclock stuff is not
>>>>> well thought out (even in principle in an ABI sense), and it's never
>>>>> been clear to my what problem exactly the kvmclock stuff is supposed
>>>>> to solve.
>>>>
>>>> It's supposed to solve the problem that:
>>>>
>>>> - not all hosts have a working TSC
>>>
>>> Fine, but we don't need any vdso integration for that.
>>
>> Well, you still want a fast time source.  That was a given. :)
>
> If the host can't figure out how to give *itself* a fast time source,
> I'd be surprised if KVM can manage to give the guest a fast, reliable
> time source.
>
>>
>>>> - even if they all do, virtual machines can be migrated (or
>>>> saved/restored) to a host with a different TSC frequency
>>>>
>>>> - any MMIO- or PIO-based mechanism to access the current time is orders
>>>> of magnitude slower than the TSC and less precise too.
>>>
>>> Yup.  But TSC by itself gets that benefit, too.
>>
>> Yes, the problem is if you want to solve all three of them.  The first
>> two are solved by the ACPI PM timer with a decent resolution (70
>> ns---much faster anyway than an I/O port access).  The third is solved
>> by TSC.  To solve all three, you need kvmclock.
>
> Still confused.  Is kvmclock really used in cases where even the host
> can't pull of working TSC?
>
>>
>>>>> I'm somewhat tempted to suggest that we delete kvmclock entirely and
>>>>> start over.  A correctly functioning KVM guest using TSC (i.e.
>>>>> ignoring kvmclock entirely) seems to work rather more reliably and
>>>>> considerably faster than a kvmclock guest.
>>>>
>>>> If all your hosts have a working TSC and you don't do migration or
>>>> save/restore, that's a valid configuration.  It's not a good default,
>>>> however.
>>>
>>> Er?
>>>
>>> kvmclock is still really quite slow and buggy.
>>
>> Unless it takes 3-4000 clock cycles for a gettimeofday, which it
>> shouldn't even with vdso disabled, it's definitely not slower than PIO.
>>
>>> And the patch I identified is definitely a problem here:
>>>
>>> [  136.131241] KVM: disabling fast timing permanently due to inability
>>> to recover from suspend
>>>
>>> I got that on the host with this whitespace-damaged patch:
>>>
>>>         if (backwards_tsc) {
>>>                 u64 delta_cyc = max_tsc - local_tsc;
>>> +               if (!backwards_tsc_observed)
>>> +                       pr_warn("KVM: disabling fast timing
>>> permanently due to inability to recover from suspend\n");
>>>
>>> when I suspended and resumed.
>>>
>>> Can anyone explain what problem
>>> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
>>> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
>>> normal TSC logic handle that case right?  After all, all vcpus should
>>> be paused when we resume from suspend.  At worst, we should just need
>>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
>>> shouldn't we do that regardless of which way the TSC jumped on
>>> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
>>> likely to change except on the very small handful of CPUs (if any)
>>> that keep the TSC running in S3 and hibernate.
>>
>> I don't recall the details of that patch, so Marcelo will have to answer
>> this, or Alex too since he chimed in the original thread.  At least it
>> should be made conditional on the existence of a VM at suspend time (and
>> the master clock stuff should be made per VM, as I suggested at
>> https://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg102316.html).
>>
>> It would indeed be great if the master clock could be dropped.  But I'm
>> definitely missing some of the subtle details. :(
>
> Me, too.
>
> Anyway, see the attached untested patch.  Marcelo?

That patch seems to work.  I have valid timing before and after host
suspend.  When I suspend and resume the host with a running guest, I
get:

[   26.504071] clocksource: timekeeping watchdog: Marking clocksource
'tsc' as unstable because the skew is too large:
[   26.505253] clocksource:                       'kvm-clock' wd_now:
66744c542 wd_last: 564b09794 mask: ffffffffffffffff
[   26.506436] clocksource:                       'tsc' cs_now:
fffffee310b133c8 cs_last: cf5d0b952 mask: ffffffffffffffff

in the guest, which is arguably correct.  KVM could be further
improved to update the tsc offset after suspend/resume to get rid of
that artifact.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html