Re: [PATCH] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/10/2023 18:32, David Woodhouse wrote:
On Tue, 2023-10-10 at 09:40 +0000, Paul Durrant wrote:
From: Paul Durrant <pdurrant@xxxxxxxxxx>

Unless explicitly told to do so (by passing 'clocksource=tsc' and
'tsc=stable:socket', and then jumping through some hoops concerning
potential CPU hotplug) Xen will never use TSC as its clocksource.
Hence, by default, a Xen guest will not see PVCLOCK_TSC_STABLE_BIT set
in either the primary or secondary pvclock memory areas. This has
led to bugs in some guest kernels which only become evident if
PVCLOCK_TSC_STABLE_BIT *is* set in the pvclock.

Specifically, some OL7 kernels backported the whole pvclock vDSO thing
but *forgot* https://git.kernel.org/torvalds/c/9f08890ab and thus kill
init with a SIGBUS the first time it tries to read a clock, because
they don't actually map the pvclock pages to userspace :)

They apparently never noticed because evidently *their* Xen fleet
doesn't actually jump through all those hoops to use the TSC as its
clocksource either.

It's a fairly safe bet that there are more broken guest kernels out
there too, hence needing to work around it.

  Hence, to support
such guests, give the VMM a new attribute to tell KVM to forcibly
clear the bit in the Xen pvclocks.

I frowned at the "PVCLOCK" part of the new attribute for a while,
thinking that perhaps if we're going to have a set of flags to tweak
behaviour, we shouldn't be so specific. Call it 'XEN_FEATURES' or
something... but then I realised we'd want to *advertise* the set of
bits which is available for userspace to set...

... and then I realised we already do. That's exactly what the set of
bits returned, and *set*, with KVM_CAP_XEN_HVM is for.

So let's ditch the new *attribute*, and just add your new (renamed)
KVM_XEN_HVM_CONFIG_PVCLOCK_NO_STABLE_TSC cap to the set of
permitted_flags in kvm_xen_hvm_config() so that userspace can enable it
that way like it does the INTERCEPT_HYPERCALL and EVTCHN_SEND
behaviours.


Ok, sounds like a plan. I'll look at configuring it that way instead.

  Paul





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux