Re: [PATCH v5 7/8] KVM: VMX: Update PID-pointer table entry when APIC ID is changed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 02, 2022, Sean Christopherson wrote:
> On Thu, Jan 13, 2022, Sean Christopherson wrote:
> > On Tue, Jan 11, 2022, Maxim Levitsky wrote:
> > > Both Intel and AMD's PRM also state that changing APIC ID is implementation
> > > dependent.
> > >  
> > > I vote to forbid changing apic id, at least in the case any APIC acceleration
> > > is used, be that APICv or AVIC.
> > 
> > That has my vote as well.  For IPIv in particular there's not much concern with
> > backwards compability, i.e. we can tie the behavior to enable_ipiv.
> 
> Hrm, it may not be that simple.  There's some crusty (really, really crusty) code
> in Linux's boot code that writes APIC_ID.  IIUC, the intent is to play nice with
> running a UP crash dump kernel on "BSP" that isn't "the BSP", e.g. has a non-zero
> APIC ID.
 
...

> It's entirely possible that this path is unused in a KVM guest, but I don't think
> we can know that with 100% certainty.
> 
> But I also completely agree that attempting to keep the tables up-to-date is ugly
> and a waste of time and effort, e.g. as Maxim pointed out, the current AVIC code
> is comically broken.
> 
> Rather than disallowing the write, what if we add yet another inhibit that disables
> APICv if IPI virtualization is enabled and a vCPU has an APIC ID != vcpu_id?  KVM
> is equipped to handle the emulation, so it just means that a guest that's doing 
> weird things loses a big of performance.

LOL, this is all such a mess.  The x2apic ID is actually indirectly writable on
AMD CPUs.  Per the APM:

  A value previously written by software to the 8-bit APIC_ID register (MMIO offset 30h) is
  converted by hardware into the appropriate format and reflected into the 32-bit x2APIC_ID
  register (MSR 802h).

I confirmed this on hardware (Milan).  That means KVM's handling of the x2APIC ID
in kvm_lapic_set_base() is wrong, at least with respect to AMD.

Intel's SDM is a bit vague.  I _think_ it means Intel CPUs treat the the x2APIC ID
and xAPIC ID as two completely independent assets.  I haven't been able to glean any
info from hardware because writes to the legacy xAPIC ID are ignored on all CPUs
I've tested (Haswell and Cascade lake).

  The x2APIC ID (32 bits) and the legacy local xAPIC ID (8 bits) are preserved
  across this transition.

Given that the xAPIC ID appears to no longer be writable on Intel CPUs, it's
impossible that _generic_ kernel code can rely on xAPIC ID being writable.  That
just leaves the aforementioned amd_numa_init() crud.

Linux's handling of that is:

  void __init x86_numa_init(void)
  {
	if (!numa_off) {
  #ifdef CONFIG_ACPI_NUMA
		if (!numa_init(x86_acpi_numa_init))
			return;
  #endif
  #ifdef CONFIG_AMD_NUMA
		if (!numa_init(amd_numa_init))
			return;
  #endif
	}

	numa_init(dummy_numa_init);
  }

i.e. ACPI_NUMA gets priority and thus amd_numa_init() will never be reached if
the NUMA topology is enumerated in the ACPI tables.  Furthermore, the VMM would
have to actually emulate an old AMD northbridge, which is also extremely unlikely.

The odds of breaking a guest are further diminised given that KVM doesn't emulate
the xAPIC ID => x2APIC ID hilarity on AMD CPUs and no one has complained.

So, rather than tie this to IPI virtualization, I think we should either make
the xAPIC ID read-only across the board, or if we want to hedge in case someone
has a crazy use case, make the xAPIC ID read-only by default, add a module param
to let userspace opt-in to a writable xAPIC ID, and report x2APIC and APICv as
unsupported if the xAPIC ID is writable.  E.g. rougly this, plus your AVIC patches
if we want to hedge.

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 28be02adc669..32854ac403a8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -539,8 +539,15 @@ void kvm_set_cpu_caps(void)
                0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
                F(F16C) | F(RDRAND)
        );
-       /* KVM emulates x2apic in software irrespective of host support. */
-       kvm_cpu_cap_set(X86_FEATURE_X2APIC);
+       /*
+        * KVM emulates x2apic in software irrespective of host support.  Due
+        * to architecturally difference between Intel and AMD, x2APIC is not
+        * supported if the xAPIC ID is writable.
+        */
+       if (!xapic_id_writable)
+               kvm_cpu_cap_set(X86_FEATURE_X2APIC);
+       else
+               kvm_cpu_cap_clear(X86_FEATURE_X2APIC);

        kvm_cpu_cap_mask(CPUID_1_EDX,
                F(FPU) | F(VME) | F(DE) | F(PSE) |
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 670361bf1d81..6b42b65e7a42 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2047,10 +2047,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)

        switch (reg) {
        case APIC_ID:           /* Local APIC ID */
-               if (!apic_x2apic_mode(apic))
+               if (apic_x2apic_mode(apic))
+                       ret = 1;
+               else if (xapic_id_writable)
                        kvm_apic_set_xapic_id(apic, val >> 24);
-               else
-                       ret = 1;
                break;

        case APIC_TASKPRI:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9cef8e4598df..71a3bcdb3317 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4743,7 +4743,8 @@ static __init int svm_hardware_setup(void)
                        nrips = false;
        }

-       enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
+       enable_apicv = avic = avic && !xapic_id_writable && npt_enabled &&
+                      boot_cpu_has(X86_FEATURE_AVIC);

        if (enable_apicv) {
                pr_info("AVIC enabled\n");
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1b135473677b..fad7b36fbb1d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7939,7 +7939,7 @@ static __init int hardware_setup(void)
                ple_window_shrink = 0;
        }

-       if (!cpu_has_vmx_apicv())
+       if (!cpu_has_vmx_apicv() || xapic_id_writable)
                enable_apicv = 0;
        if (!enable_apicv)
                vmx_x86_ops.sync_pir_to_irr = NULL;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eaad2f485b64..ef6eba8c832a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -174,6 +174,10 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
 static int __read_mostly lapic_timer_advance_ns = -1;
 module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR);

+bool __read_mostly xapic_id_writable;
+module_param(xapic_id_writable, bool, 0444);
+EXPORT_SYMBOL_GPL(xapic_id_writable);
+
 static bool __read_mostly vector_hashing = true;
 module_param(vector_hashing, bool, S_IRUGO);

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1ebd5a7594da..142663ff9cba 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -346,6 +346,7 @@ static inline bool kvm_mpx_supported(void)

 extern unsigned int min_timer_period_us;

+extern bool xapic_id_writable;
 extern bool enable_vmware_backdoor;

 extern int pi_inject_timer;




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux