On Thu, 2024-02-01 at 11:19 +0100, Vitaly Kuznetsov wrote: > David Woodhouse <dwmw2@xxxxxxxxxxxxx> writes: > > > Sorry for delayed response. > > > > On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote: > > > > > > + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata); > > > + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock); > > > > I think you need to check for KVM_CLOCK_REALTIME in the flags here, > > don't you? It might not always be set. > > Good suggestion; while this shouldn't be a common use-case on x86_64, it > is possible when TSC is *not* the clocksource used on the host. I guess > we can just skip the sub-test with a message in this case. > > > > > And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands > > at the moment. It's fundamentally broken because it should always have > > used CLOCK_TAI not CLOCK_REALTIME. > > > > I'm in the process of fixing that up as an incremental ABI change, > > putting the TAI offset into one of the spare pad fields in the > > kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag. > > > > > + TEST_ASSERT(llabs(delta) < 100, > > > + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld", > > > + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC, > > > + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC); > > > > > > > > Replace the check with comparing wall clock data from shinfo to > > > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch > > > can be calculated by subtraction. Note, the computed epoch may still differ > > > a few nanoseconds from shinfo as different TSC is used and there are > > > rounding errors but 100 nanoseconds margin should be enough to cover > > > it (famous last words). > > > > Aren't those just bugs? Surely this *should* be cycle-perfect, if the > > host has a stable TSC? > > > > But I suppose this isn't the test case for that. This is just testing > > that the Xen shinfo is doing basically the right thing. > > > > And we're going to need a few more fixes like this WIP before we get > > get to cycle perfection from the horrid mess that is our kvmclock code: > > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4 > > (Some time ago I was considering suggesting we switch to Hypet-V TSC page > clocksource for Linux guests to avoid fixing the mess :-) It's becoming > less and less relevant as with modern hardware which supports TSC > scaling (and which doesn't support famous TSC bugs like TSC divergence > across sockets :-) passing through raw TSC to guests is becoming more > and more popular.) Except for migration, which is kind of hosed. And live update too. Where you are only doing a kexec on the *same* host and your time source is the *same* TSC, there's no excuse for anything less than cycle perfection. But KVM is a long way from that :) > Regarding this fix, what's your decree? My intention here is to > basically get rid on xen_shinfo flakyness without reducing test > coverage. I.e. we still want to test that shinfo data is somewhat > sane. Later, when you get your 'cycle perfection' and 'TAI' patches in, > we can always adjust the test accordingly. In case you agree, I can > re-send this with KVM_CLOCK_REALTIME check added. I think just the KVM_CLOCK_REALTIME check is all we need. No need for *this* test to be pedantic about the clock precision; mostly it's just checking that something that looks a *bit* like a time value is written to the right place in guest memory. So forget TAI, accept a slop of over 1s, with a snarky comment that the API using CLOCK_REALTIME should never have been accepted into the kernel in the first place. (I mean that; we don't want people cutting and pasting it as an example).
Attachment:
smime.p7s
Description: S/MIME cryptographic signature