On Wed, Feb 02, 2022, Sean Christopherson wrote: > On Thu, Jan 13, 2022, Sean Christopherson wrote: > > On Tue, Jan 11, 2022, Maxim Levitsky wrote: > > > Both Intel and AMD's PRM also state that changing APIC ID is implementation > > > dependent. > > > > > > I vote to forbid changing apic id, at least in the case any APIC acceleration > > > is used, be that APICv or AVIC. > > > > That has my vote as well. For IPIv in particular there's not much concern with > > backwards compability, i.e. we can tie the behavior to enable_ipiv. > > Hrm, it may not be that simple. There's some crusty (really, really crusty) code > in Linux's boot code that writes APIC_ID. IIUC, the intent is to play nice with > running a UP crash dump kernel on "BSP" that isn't "the BSP", e.g. has a non-zero > APIC ID. ... > It's entirely possible that this path is unused in a KVM guest, but I don't think > we can know that with 100% certainty. > > But I also completely agree that attempting to keep the tables up-to-date is ugly > and a waste of time and effort, e.g. as Maxim pointed out, the current AVIC code > is comically broken. > > Rather than disallowing the write, what if we add yet another inhibit that disables > APICv if IPI virtualization is enabled and a vCPU has an APIC ID != vcpu_id? KVM > is equipped to handle the emulation, so it just means that a guest that's doing > weird things loses a big of performance. LOL, this is all such a mess. The x2apic ID is actually indirectly writable on AMD CPUs. Per the APM: A value previously written by software to the 8-bit APIC_ID register (MMIO offset 30h) is converted by hardware into the appropriate format and reflected into the 32-bit x2APIC_ID register (MSR 802h). I confirmed this on hardware (Milan). That means KVM's handling of the x2APIC ID in kvm_lapic_set_base() is wrong, at least with respect to AMD. Intel's SDM is a bit vague. I _think_ it means Intel CPUs treat the the x2APIC ID and xAPIC ID as two completely independent assets. I haven't been able to glean any info from hardware because writes to the legacy xAPIC ID are ignored on all CPUs I've tested (Haswell and Cascade lake). The x2APIC ID (32 bits) and the legacy local xAPIC ID (8 bits) are preserved across this transition. Given that the xAPIC ID appears to no longer be writable on Intel CPUs, it's impossible that _generic_ kernel code can rely on xAPIC ID being writable. That just leaves the aforementioned amd_numa_init() crud. Linux's handling of that is: void __init x86_numa_init(void) { if (!numa_off) { #ifdef CONFIG_ACPI_NUMA if (!numa_init(x86_acpi_numa_init)) return; #endif #ifdef CONFIG_AMD_NUMA if (!numa_init(amd_numa_init)) return; #endif } numa_init(dummy_numa_init); } i.e. ACPI_NUMA gets priority and thus amd_numa_init() will never be reached if the NUMA topology is enumerated in the ACPI tables. Furthermore, the VMM would have to actually emulate an old AMD northbridge, which is also extremely unlikely. The odds of breaking a guest are further diminised given that KVM doesn't emulate the xAPIC ID => x2APIC ID hilarity on AMD CPUs and no one has complained. So, rather than tie this to IPI virtualization, I think we should either make the xAPIC ID read-only across the board, or if we want to hedge in case someone has a crazy use case, make the xAPIC ID read-only by default, add a module param to let userspace opt-in to a writable xAPIC ID, and report x2APIC and APICv as unsupported if the xAPIC ID is writable. E.g. rougly this, plus your AVIC patches if we want to hedge. diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 28be02adc669..32854ac403a8 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -539,8 +539,15 @@ void kvm_set_cpu_caps(void) 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND) ); - /* KVM emulates x2apic in software irrespective of host support. */ - kvm_cpu_cap_set(X86_FEATURE_X2APIC); + /* + * KVM emulates x2apic in software irrespective of host support. Due + * to architecturally difference between Intel and AMD, x2APIC is not + * supported if the xAPIC ID is writable. + */ + if (!xapic_id_writable) + kvm_cpu_cap_set(X86_FEATURE_X2APIC); + else + kvm_cpu_cap_clear(X86_FEATURE_X2APIC); kvm_cpu_cap_mask(CPUID_1_EDX, F(FPU) | F(VME) | F(DE) | F(PSE) | diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 670361bf1d81..6b42b65e7a42 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2047,10 +2047,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) switch (reg) { case APIC_ID: /* Local APIC ID */ - if (!apic_x2apic_mode(apic)) + if (apic_x2apic_mode(apic)) + ret = 1; + else if (xapic_id_writable) kvm_apic_set_xapic_id(apic, val >> 24); - else - ret = 1; break; case APIC_TASKPRI: diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9cef8e4598df..71a3bcdb3317 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4743,7 +4743,8 @@ static __init int svm_hardware_setup(void) nrips = false; } - enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC); + enable_apicv = avic = avic && !xapic_id_writable && npt_enabled && + boot_cpu_has(X86_FEATURE_AVIC); if (enable_apicv) { pr_info("AVIC enabled\n"); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 1b135473677b..fad7b36fbb1d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7939,7 +7939,7 @@ static __init int hardware_setup(void) ple_window_shrink = 0; } - if (!cpu_has_vmx_apicv()) + if (!cpu_has_vmx_apicv() || xapic_id_writable) enable_apicv = 0; if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr = NULL; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index eaad2f485b64..ef6eba8c832a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -174,6 +174,10 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); static int __read_mostly lapic_timer_advance_ns = -1; module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR); +bool __read_mostly xapic_id_writable; +module_param(xapic_id_writable, bool, 0444); +EXPORT_SYMBOL_GPL(xapic_id_writable); + static bool __read_mostly vector_hashing = true; module_param(vector_hashing, bool, S_IRUGO); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 1ebd5a7594da..142663ff9cba 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -346,6 +346,7 @@ static inline bool kvm_mpx_supported(void) extern unsigned int min_timer_period_us; +extern bool xapic_id_writable; extern bool enable_vmware_backdoor; extern int pi_inject_timer;