On Wed, 2022-03-09 at 06:01 +0000, Sean Christopherson wrote: > TL;DR: Maxim, any objection to yet another inhibit? Any potential issues you can think of? > > On Wed, Mar 09, 2022, Chao Gao wrote: > > On Tue, Mar 08, 2022 at 11:04:01PM +0000, Sean Christopherson wrote: > > > On Fri, Feb 25, 2022, Zeng Guang wrote: > > > > From: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > > > > > > > No normal guest has any reason to change physical APIC IDs, > > > > > > I don't think we can reasonably assume this, my analysis in the link (that I just > > > realized I deleted from context here) shows it's at least plausible that an existing > > > guest could rely on the APIC ID being writable. And that's just one kernel, who > > > know what else is out there, especially given that people use KVM to emulate really > > > old stuff, often on really old hardware. > > > > Making xAPIC ID readonly is not only based on your analysis, but also Intel SDM > > clearly saying writable xAPIC ID is processor model specific and ***software should > > avoid writing to xAPIC ID***. > > Intel isn't the only vendor KVM supports, and xAPIC ID is fully writable according > to AMD's docs and AMD's hardware. x2APIC is even (indirectly) writable, but luckily > KVM has never modeled that... > > Don't get me wrong, I would love to make xAPIC ID read-only, and I fully agree > that the probability of breaking someone's setup is very low, I just don't think > the benefits of forcing it are worth the risk of breaking userspace. > > > If writable xAPIC ID support should be retained and is tied to a module param, > > live migration would depend on KVM's module params: e.g., migrate a VM with > > modified xAPIC ID (apic_id_readonly off on this system) to one with > > xapic_id_readonly on would fail, right? Is this failure desired? > > Hrm, I was originally thinking it's not a terrible outcome, but I was assuming > that userspace would gracefully handle migration failure. That's a bad assumption. > > > if not, we need to have a VM-scope control. e.g., add an inhibitor of APICv > > (XAPIC_ID_MODIFIED) and disable APICv forever for this VM if its vCPUs or > > QEMU modifies xAPIC ID. > > Inhibiting APICv if IPIv is enabled (implied for AMD's AVIC) is probably a better > option than a module param. I was worried about ending up with silently degraded > VM performance, but that's easily solved by adding a stat to track APICv inhibitions, > which would be useful for other cases too (getting AMD's AVIC enabled is comically > difficult). > > That would also let us drop the code buggy avic_handle_apic_id_update(). > > And it wouldn't necessarily have to be forever, though I agree that's a perfectly > fine approach until we have data that shows anything fancier is necessary. > > > > Practically speaking, anyone that wants to deploy IPIv is going to have to make > > > the switch at some point, but that doesn't help people running legacy crud that > > > don't care about IPIv. > > > > > > I was thinking a module param would be trivial, and it is (see below) if the > > > param is off by default. A module param will also provide a convenient opportunity > > > to resolve the loophole reported by Maxim[1][2], though it's a bit funky. > > > > Could you share the links? > > Doh, sorry (they're both in this one). > > https://lore.kernel.org/all/20220301135526.136554-5-mlevitsk@xxxxxxxxxx > My opinion on this subject is very simple: we need to draw the line somewhere. There is balance between supporting (poorly) unused hardware features and not supporting them at all. Writable APIC id is not just some legacy feature like task switch but a feature that is frowned upon in both Intel and AMD manual: Yes, look at AMD's SDM at: "16.3.3 Local APIC ID Unique local APIC IDs are assigned to each CPU core in the system. The value is determined by hardware, based on the number of CPU cores on the processor and the node ID of the processor. The APIC ID is located in the APIC ID register at APIC offset 20h. See Figure 16-3. It is model dependent, whether software can modify the APIC ID Register. The initial value of the APIC ID (after a reset) is the value returned in CPUID function 0000_0001h_EBX[31:24]." Also in section '16.12 x2APIC_ID' of SDM it is mentioned: RDMSR. An RDMSR of MSR 0802h returns the x2APIC_ID in EAX[31:0]. The x2APIC_ID is a read-only register. Attempting to write MSR 802h or attempting to read this MSR when not in x2APIC mode causes a #GP(0) exception. See 16.11 “Accessing x2APIC Registers”. CPUID. The x2APIC ID is reported by the following CPUID functions Fn0000_000B (Extended Topology Enumeration) and CPUID Fn8000_001E (Extended APIC ID) as follows: