From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Sent: Tuesday, August 1, 2023 3:25 PM > > Michael! > > On Tue, Aug 01 2023 at 00:12, Thomas Gleixner wrote: > > On Mon, Jul 31 2023 at 21:27, Michael Kelley wrote: > > Clearly the hyper-v BIOS people put a lot of thoughts into this: > > > >> x2APIC features / processor topology (0xb): > >> extended APIC ID = 0 > >> --- level 0 --- > >> level number = 0x0 (0) > >> level type = thread (1) > >> bit width of level = 0x1 (1) > >> number of logical processors at level = 0x2 (2) > >> --- level 1 --- > >> level number = 0x1 (1) > >> level type = core (2) > >> bit width of level = 0x6 (6) > >> number of logical processors at level = 0x40 (64) > > > > FAIL: ^^^^^ > > > > While that field is not meant for topology evaluation it is at least > > expected to tell the actual number of logical processors at that level > > which are actually available. > > > > The CPUID APIC ID aka initial_apicid clearly tells that the topology has > > 36 logical CPUs in package 0 and 36 in package 1 according to your > > table. > > > > On real hardware this looks like this: > > > > --- level 1 --- > > level number = 0x1 (1) > > level type = core (2) > > bit width of level = 0x6 (6) > > number of logical processors at level = 0x38 (56) > > > > Which corresponds to reality and is consistent. But sure, consistency is > > overrated. > > So I looked really hard to find some hint how to detect this situation > on the boot CPU, which allows us to mitigate it, but there is none at > all. > > So we are caught between a rock and a hard place, which provides us two > mutually exclusive options to chose from: > > 1) Have a sane topology evaluation mechanism which solves the known > problems of hybrid systems, wrong sizing estimates and other > unpleasantries. > > 2) Support the Hyper-V BIOS trainwreck forever. > > Unsurprisingly #2 is not really an option as #1 is a crucial issue for > the kernel and we need it resolved urgently as of yesterday. > > So while I'm definitely a strong supporter of no-regression policy, I > have to make an argument here why this particular issue is _not_ > covered: > > 1) Hyper-V BIOS/firmware violates the firmware specification and > requirements which are clearly spelled out in the SDM. > > 2) This violatation is reported on every boot with one promiment > message per brought up AP where the initial APIC ID as provided by > CPUID leaf 0xB deviates from the APIC ID read from "hardware", which is > also provided by MADT starting with CPU 36 in the provided example: > > "[FIRMWARE BUG] CPU36: APIC id mismatch. Firmware: 40 APIC: 24" > > repeating itself up to CPU71 with the relevant diverging APIC IDs. > > At least that's what the upstream kernel produces according to > validate_apic_and_package_id() in such an situation. > > 3) This is known for years and the Hyper-V Linux team tried to get this > resolved, but obviously their arguments fell on deaf ears. > > IOW, the firmware BUG message has been ignored willfully for years > due to "works for me, why should I care?" attitude. > > Seriously, kernel development cannot be held hostage forever by the > wilful ignorance of a BIOS team, which refuses to adhere to > specifications and defines their own world order. > > The x86 maintainer team is chosing the lesser of two evils and lets > those who created the problem and refused to resolve it deal with the > outcome. Fair enough. I don't have any basis to argue otherwise. I'm in discussions with the Hyper-V team about getting it fully fixed in Hyper-V, and it looks like there's some movement to make it happen. > > Just to clarify. This is not preventing affected guests from booting. > The worst consequence is a slight performance regression because the > firmware provided topology information is not matching reality and > therefore the scheduler placement vs. L3 affinity sucks. That's clearly > not a kernel problem. Yes, if Linux will still boots and runs, that helps. Then it really is up the (virtual) firmware in Hyper-V to provide the correct topology information so performance is as expected. > > I'm happy to aid accelerating this thought process by elevating the > existing pr_err(FW_BUG....) to a solid WARN_ON_ONCE(). See below. Your choice. In this particular case, it won't make a difference either way. Michael > > Thanks, > > tglx > --- > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -1688,7 +1688,7 @@ static void validate_apic_and_package_id > > apicid = apic->cpu_present_to_apicid(cpu); > > - if (apicid != c->topo.apicid) { > + if (WARN_ON_ONCE(apicid != c->topo.apicid)) { > pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n", > cpu, apicid, c->topo.initial_apicid); > }