On Wed, Dec 9, 2015 at 2:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: >> >> >> On 09/12/2015 22:49, Andy Lutomirski wrote: >>> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: >>>> >>>> >>>> On 09/12/2015 22:10, Andy Lutomirski wrote: >>>>> Can we please stop making kvmclock more complex? It's a beast right >>>>> now, and not in a good way. It's far too tangled with the vclock >>>>> machinery on both the host and guest sides, the pvclock stuff is not >>>>> well thought out (even in principle in an ABI sense), and it's never >>>>> been clear to my what problem exactly the kvmclock stuff is supposed >>>>> to solve. >>>> >>>> It's supposed to solve the problem that: >>>> >>>> - not all hosts have a working TSC >>> >>> Fine, but we don't need any vdso integration for that. >> >> Well, you still want a fast time source. That was a given. :) > > If the host can't figure out how to give *itself* a fast time source, > I'd be surprised if KVM can manage to give the guest a fast, reliable > time source. > >> >>>> - even if they all do, virtual machines can be migrated (or >>>> saved/restored) to a host with a different TSC frequency >>>> >>>> - any MMIO- or PIO-based mechanism to access the current time is orders >>>> of magnitude slower than the TSC and less precise too. >>> >>> Yup. But TSC by itself gets that benefit, too. >> >> Yes, the problem is if you want to solve all three of them. The first >> two are solved by the ACPI PM timer with a decent resolution (70 >> ns---much faster anyway than an I/O port access). The third is solved >> by TSC. To solve all three, you need kvmclock. > > Still confused. Is kvmclock really used in cases where even the host > can't pull of working TSC? > >> >>>>> I'm somewhat tempted to suggest that we delete kvmclock entirely and >>>>> start over. A correctly functioning KVM guest using TSC (i.e. >>>>> ignoring kvmclock entirely) seems to work rather more reliably and >>>>> considerably faster than a kvmclock guest. >>>> >>>> If all your hosts have a working TSC and you don't do migration or >>>> save/restore, that's a valid configuration. It's not a good default, >>>> however. >>> >>> Er? >>> >>> kvmclock is still really quite slow and buggy. >> >> Unless it takes 3-4000 clock cycles for a gettimeofday, which it >> shouldn't even with vdso disabled, it's definitely not slower than PIO. >> >>> And the patch I identified is definitely a problem here: >>> >>> [ 136.131241] KVM: disabling fast timing permanently due to inability >>> to recover from suspend >>> >>> I got that on the host with this whitespace-damaged patch: >>> >>> if (backwards_tsc) { >>> u64 delta_cyc = max_tsc - local_tsc; >>> + if (!backwards_tsc_observed) >>> + pr_warn("KVM: disabling fast timing >>> permanently due to inability to recover from suspend\n"); >>> >>> when I suspended and resumed. >>> >>> Can anyone explain what problem >>> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve? On >>> brief inspection, it just seems to be incorrect. Shouldn't KVM's >>> normal TSC logic handle that case right? After all, all vcpus should >>> be paused when we resume from suspend. At worst, we should just need >>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus. (Actually, >>> shouldn't we do that regardless of which way the TSC jumped on >>> suspend/resume? After all, the jTSC-to-wall-clock offset is quite >>> likely to change except on the very small handful of CPUs (if any) >>> that keep the TSC running in S3 and hibernate. >> >> I don't recall the details of that patch, so Marcelo will have to answer >> this, or Alex too since he chimed in the original thread. At least it >> should be made conditional on the existence of a VM at suspend time (and >> the master clock stuff should be made per VM, as I suggested at >> https://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg102316.html). >> >> It would indeed be great if the master clock could be dropped. But I'm >> definitely missing some of the subtle details. :( > > Me, too. > > Anyway, see the attached untested patch. Marcelo? That patch seems to work. I have valid timing before and after host suspend. When I suspend and resume the host with a running guest, I get: [ 26.504071] clocksource: timekeeping watchdog: Marking clocksource 'tsc' as unstable because the skew is too large: [ 26.505253] clocksource: 'kvm-clock' wd_now: 66744c542 wd_last: 564b09794 mask: ffffffffffffffff [ 26.506436] clocksource: 'tsc' cs_now: fffffee310b133c8 cs_last: cf5d0b952 mask: ffffffffffffffff in the guest, which is arguably correct. KVM could be further improved to update the tsc offset after suspend/resume to get rid of that artifact. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html