On Thu, 2024-03-21 at 09:37 -0700, Reinette Chatre wrote: > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > Add KVM_CAP_X86_APIC_BUS_FREQUENCY capability to configure the APIC > bus clock frequency for APIC timer emulation. > Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_FREQUENCY) to set the > frequency in nanoseconds. When using this capability, the user space > VMM should configure CPUID leaf 0x15 to advertise the frequency. > > Vishal reported that the TDX guest kernel expects a 25MHz APIC bus > frequency but ends up getting interrupts at a significantly higher rate. > > The TDX architecture hard-codes the core crystal clock frequency to > 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture > does not allow the VMM to override the value. > > In addition, per Intel SDM: > "The APIC timer frequency will be the processor’s bus clock or core > crystal clock frequency (when TSC/core crystal clock ratio is > enumerated in CPUID leaf 0x15) divided by the value specified in > the divide configuration register." > > The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded > APIC bus frequency of 1GHz. > > The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user > space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is > enumerated, the guest kernel uses it as the APIC bus frequency. If not, > the guest kernel measures the frequency based on other known timers like > the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest > kernel expects a 25MHz timer frequency but gets timer interrupt more > frequently due to the 1GHz frequency used by KVM. > > To ensure that the guest doesn't have a conflicting view of the APIC bus > frequency, allow the userspace to tell KVM to use the same frequency that > TDX mandates instead of the default 1Ghz. > > There are several options to address this: > 1. Make the KVM able to configure APIC bus frequency (this series). > Pro: It resembles the existing hardware. The recent Intel CPUs > adapts 25MHz. > Con: Require the VMM to emulate the APIC timer at 25MHz. > 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable > frequency or not enumerate it. > Pro: Any APIC bus frequency is allowed. > Con: Deviates from TDX architecture. > 3. Make the TDX guest kernel use 1GHz when it's running on KVM. > Con: The kernel ignores CPUID leaf 0x15. > 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency > as 1 GHz. > Pro: This has been the virtual APIC frequency for KVM guests for 13 > years. > Pro: This requires changing only one hard-coded constant in TDX. > Con: It doesn't work with other VMMs as TDX isn't specific to KVM. > Con: Core crystal clock frequency is also used to calculate TSC > frequency. > Con: If it is configured to value different from hardware, it will > break the correctness of INTEL-PT Mini Time Count (MTC) packets > in TDs. > > Reported-by: Vishal Annapurve <vannapurve@xxxxxxxxxx> > Closes: > https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@xxxxxxxxxx/ Is Closes appropriate, given the issue Vishal hit was on non-upstream code? > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > Co-developed-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> > Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>