Michael! On Tue, Aug 01 2023 at 00:12, Thomas Gleixner wrote: > On Mon, Jul 31 2023 at 21:27, Michael Kelley wrote: > Clearly the hyper-v BIOS people put a lot of thoughts into this: > >> x2APIC features / processor topology (0xb): >> extended APIC ID = 0 >> --- level 0 --- >> level number = 0x0 (0) >> level type = thread (1) >> bit width of level = 0x1 (1) >> number of logical processors at level = 0x2 (2) >> --- level 1 --- >> level number = 0x1 (1) >> level type = core (2) >> bit width of level = 0x6 (6) >> number of logical processors at level = 0x40 (64) > > FAIL: ^^^^^ > > While that field is not meant for topology evaluation it is at least > expected to tell the actual number of logical processors at that level > which are actually available. > > The CPUID APIC ID aka initial_apicid clearly tells that the topology has > 36 logical CPUs in package 0 and 36 in package 1 according to your > table. > > On real hardware this looks like this: > > --- level 1 --- > level number = 0x1 (1) > level type = core (2) > bit width of level = 0x6 (6) > number of logical processors at level = 0x38 (56) > > Which corresponds to reality and is consistent. But sure, consistency is > overrated. So I looked really hard to find some hint how to detect this situation on the boot CPU, which allows us to mitigate it, but there is none at all. So we are caught between a rock and a hard place, which provides us two mutually exclusive options to chose from: 1) Have a sane topology evaluation mechanism which solves the known problems of hybrid systems, wrong sizing estimates and other unpleasantries. 2) Support the Hyper-V BIOS trainwreck forever. Unsurprisingly #2 is not really an option as #1 is a crucial issue for the kernel and we need it resolved urgently as of yesterday. So while I'm definitely a strong supporter of no-regression policy, I have to make an argument here why this particular issue is _not_ covered: 1) Hyper-V BIOS/firmware violates the firmware specification and requirements which are clearly spelled out in the SDM. 2) This violatation is reported on every boot with one promiment message per brought up AP where the initial APIC ID as provided by CPUID leaf 0xB deviates from the APIC ID read from "hardware", which is also provided by MADT starting with CPU 36 in the provided example: "[FIRMWARE BUG] CPU36: APIC id mismatch. Firmware: 40 APIC: 24" repeating itself up to CPU71 with the relevant diverging APIC IDs. At least that's what the upstream kernel produces according to validate_apic_and_package_id() in such an situation. 3) This is known for years and the Hyper-V Linux team tried to get this resolved, but obviously their arguments fell on deaf ears. IOW, the firmware BUG message has been ignored willfully for years due to "works for me, why should I care?" attitude. Seriously, kernel development cannot be held hostage forever by the wilful ignorance of a BIOS team, which refuses to adhere to specifications and defines their own world order. The x86 maintainer team is chosing the lesser of two evils and lets those who created the problem and refused to resolve it deal with the outcome. Just to clarify. This is not preventing affected guests from booting. The worst consequence is a slight performance regression because the firmware provided topology information is not matching reality and therefore the scheduler placement vs. L3 affinity sucks. That's clearly not a kernel problem. I'm happy to aid accelerating this thought process by elevating the existing pr_err(FW_BUG....) to a solid WARN_ON_ONCE(). See below. Thanks, tglx --- --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1688,7 +1688,7 @@ static void validate_apic_and_package_id apicid = apic->cpu_present_to_apicid(cpu); - if (apicid != c->topo.apicid) { + if (WARN_ON_ONCE(apicid != c->topo.apicid)) { pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n", cpu, apicid, c->topo.initial_apicid); }