Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Thu, 01 Feb 2024 08:02:00 -0800

On Thu, 2024-02-01 at 11:19 +0100, Vitaly Kuznetsov wrote:
> David Woodhouse <dwmw2@xxxxxxxxxxxxx> writes:
> 
> > Sorry for delayed response.
> > 
> > On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote:
> > > 
> > > +       vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
> > > +       delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);
> > 
> > I think you need to check for KVM_CLOCK_REALTIME in the flags here,
> > don't you? It might not always be set.
> 
> Good suggestion; while this shouldn't be a common use-case on x86_64, it
> is possible when TSC is *not* the clocksource used on the host. I guess
> we can just skip the sub-test with a message in this case.
> 
> > 
> > And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands
> > at the moment. It's fundamentally broken because it should always have
> > used CLOCK_TAI not CLOCK_REALTIME.
> > 
> > I'm in the process of fixing that up as an incremental ABI change,
> > putting the TAI offset into one of the spare pad fields in the
> > kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag.
> > 
> > > +       TEST_ASSERT(llabs(delta) < 100,
> > > +                   "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
> > > +                   wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
> > > +                   (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);
> > > 
> > 
> > > Replace the check with comparing wall clock data from shinfo to
> > > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
> > > can be calculated by subtraction. Note, the computed epoch may still differ
> > > a few nanoseconds from shinfo as different TSC is used and there are
> > > rounding errors but 100 nanoseconds margin should be enough to cover
> > > it (famous last words).
> > 
> > Aren't those just bugs? Surely this *should* be cycle-perfect, if the
> > host has a stable TSC?
> > 
> > But I suppose this isn't the test case for that. This is just testing
> > that the Xen shinfo is doing basically the right thing.
> > 
> > And we're going to need a few more fixes like this WIP before we get
> > get to cycle perfection from the horrid mess that is our kvmclock code:
> > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4
> 
> (Some time ago I was considering suggesting we switch to Hypet-V TSC page
> clocksource for Linux guests to avoid fixing the mess :-) It's becoming
> less and less relevant as with modern hardware which supports TSC
> scaling (and which doesn't support famous TSC bugs like TSC divergence
> across sockets :-) passing through raw TSC to guests is becoming more
> and more popular.)

Except for migration, which is kind of hosed. And live update too.
Where you are only doing a kexec on the *same* host and your time
source is the *same* TSC, there's no excuse for anything less than
cycle perfection. But KVM is a long way from that :)

> Regarding this fix, what's your decree? My intention here is to
> basically get rid on xen_shinfo flakyness without reducing test
> coverage. I.e. we still want to test that shinfo data is somewhat
> sane. Later, when you get your 'cycle perfection' and 'TAI' patches in,
> we can always adjust the test accordingly. In case you agree, I can
> re-send this with KVM_CLOCK_REALTIME check added.

I think just the KVM_CLOCK_REALTIME check is all we need. No need for
*this* test to be pedantic about the clock precision; mostly it's just
checking that something that looks a *bit* like a time value is written
to the right place in guest memory.

So forget TAI, accept a slop of over 1s, with a snarky comment that the
API using CLOCK_REALTIME should never have been accepted into the
kernel in the first place. (I mean that; we don't want people cutting
and pasting it as an example).
Attachment:
smime.p7s

Description: S/MIME cryptographic signature