Re: [PATCH V5 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 25, 2024 at 03:07:01PM -0700, Reinette Chatre wrote:
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>
> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
> frequency but ends up getting interrupts at a significantly higher rate.
>
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
>
> In addition, per Intel SDM:
>     "The APIC timer frequency will be the processor’s bus clock or core
>      crystal clock frequency (when TSC/core crystal clock ratio is
>      enumerated in CPUID leaf 0x15) divided by the value specified in
>      the divide configuration register."
>
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
>
> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
> the guest kernel measures the frequency based on other known timers like
> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
> kernel expects a 25MHz timer frequency but gets timer interrupt more
> frequently due to the 1GHz frequency used by KVM.
>
> To ensure that the guest doesn't have a conflicting view of the APIC bus
> frequency, allow the userspace to tell KVM to use the same frequency that
> TDX mandates instead of the default 1Ghz.
>
> There are several options to address this:
> 1. Make the KVM able to configure APIC bus frequency (this series).
>    Pro: It resembles the existing hardware.  The recent Intel CPUs
>         adapts 25MHz.
>    Con: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
>    frequency or not enumerate it.
>    Pro: Any APIC bus frequency is allowed.
>    Con: Deviates from TDX architecture.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>    Con: The kernel ignores CPUID leaf 0x15.
> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
>    as 1 GHz.
>    Pro: This has been the virtual APIC frequency for KVM guests for 13
>         years.
>    Pro: This requires changing only one hard-coded constant in TDX.
>    Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
>    Con: Core crystal clock frequency is also used to calculate TSC
>         frequency.
>    Con: If it is configured to value different from hardware, it will
>         break the correctness of INTEL-PT Mini Time Count (MTC) packets
>         in TDs.

Reviewed-by: Yuan Yao <yuan.yao@xxxxxxxxx>

>
> Reported-by: Vishal Annapurve <vannapurve@xxxxxxxxxx>
> Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@xxxxxxxxxx/
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> Co-developed-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> ---
> Changes v5:
> - Rename capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
>   KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
> - Add Rick's Reviewed-by tag.
>
> Changes v4:
> - Rework implementation following Sean's guidance in:
>   https://lore.kernel.org/all/ZdjzIgS6EAeCsUue@xxxxxxxxxx/
> - Reword con #2 to acknowledge feedback. (Sean)
> - Add the "con" information from Xiaoyao during earlier review of v2.
> - Rework changelog to address comments related to "bus clock" vs
>   "core crystal clock" frequency. (Xiaoyao)
> - Drop snippet about impact on TSC deadline timer emulation. (Maxim)
> - Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
>   since tag received.
> - Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"
>
> Changes v3:
> - Added reviewed-by Maxim Levitsky.
> - Minor update of the commit message.
>
> Changes v2:
> - Add check if vcpu isn't created.
> - Add check if lapic chip is in-kernel emulation.
> - Fix build error for i386.
> - Add document to api.rst.
> - Typo in the commit message.
>
>  Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
>  arch/x86/kvm/x86.c             | 27 +++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h       |  1 +
>  3 files changed, 45 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f014cd9b2217 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8063,6 +8063,23 @@ error/annotated fault.
>
>  See KVM_EXIT_MEMORY_FAULT for more information.
>
> +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
> +-----------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
> +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
> +          frequency or if any vCPUs have been created, -ENXIO if a virtual
> +          local APIC has not been created using KVM_CREATE_IRQCHIP.
> +
> +This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
> +virtual APIC when emulating APIC timers.  KVM's default value can be retrieved
> +by KVM_CHECK_EXTENSION.
> +
> +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
> +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
> +
>  8. Other capabilities.
>  ======================
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 10e6315103f4..fa6954c9a9d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4715,6 +4715,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_MEMORY_FAULT_INFO:
>  		r = 1;
>  		break;
> +	case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
> +		r = APIC_BUS_CYCLE_NS_DEFAULT;
> +		break;
>  	case KVM_CAP_EXIT_HYPERCALL:
>  		r = KVM_EXIT_HYPERCALL_VALID_MASK;
>  		break;
> @@ -6755,6 +6758,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		}
>  		mutex_unlock(&kvm->lock);
>  		break;
> +	case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
> +		u64 bus_cycle_ns = cap->args[0];
> +		u64 unused;
> +
> +		r = -EINVAL;
> +		/*
> +		 * Guard against overflow in tmict_to_ns(). 128 is the highest
> +		 * divide value that can be programmed in APIC_TDCR.
> +		 */
> +		if (!bus_cycle_ns ||
> +		    check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
> +			break;
> +
> +		r = 0;
> +		mutex_lock(&kvm->lock);
> +		if (!irqchip_in_kernel(kvm))
> +			r = -ENXIO;
> +		else if (kvm->created_vcpus)
> +			r = -EINVAL;
> +		else
> +			kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..6a4d9432ab11 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_MEMORY_ATTRIBUTES 233
>  #define KVM_CAP_GUEST_MEMFD 234
>  #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
>
>  struct kvm_irq_routing_irqchip {
>  	__u32 irqchip;
> --
> 2.34.1
>
>




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux