From: Michael Kelley <mhklinux@xxxxxxxxxxx> Sent: Wednesday, April 17, 2024 3:35 PM > > From: Michael Schierl <schierlm@xxxxxx> Sent: Wednesday, April 17, 2024 2:08 PM > > > > > Don't let the type 10 distract you. It is entirely possible that the > > > byte corresponding to type == 10 is already part of the corrupted > > > memory area. Can you check if the DMI table generated by Hyper-V is > > > supposed to contain type 10 records at all? > > > > How? Hyper-V is not open source :-) > > I think that request from Jean is targeted to me or the Microsoft > people on the thread. :-) > > > > > My best guess to get Linux out of the equation would be to boot my > > trusted MS-DOS 6.2 floppy and use debug.com to dump the DMI: > > > > > | A:\>debug > > > | -df000:93d0 [to inspect] > > > | -nfromdos.dmi > > > | -rcx > > > | CX 0000 > > > | :439B > > > | -w f000:93d0 > > > | -q > > > > > > The result is byte-for-byte identical to the DMI dump I created from > > sysfs and pasted earlier in this thread. Of course, it does not have to > > be identical to the memory situation while it was parsed. > > I've been looking at the details of the DMI blob in a Linux VM on my > local Windows 11 laptop, as well as in a Generation 1 VM in the Azure > public cloud, which uses Hyper-V. The overall size and layout > of the DMI blob appears to be the same in both cases. The blob is > corrupted in the VM on the local laptop, but good in the Azure VM. > > I was wondering how to check if the Linux bootloaders and grub > were somehow corrupting the DMI blob, but now you've > answered the question by running MS-DOS and dumping the > contents. Excellent experiment! > > I still want to understand why 32-bit Linux is taking an oops during > boot while 64-bit Linux does not. The difference is in this statement in dmi_save_devices(): count = (dm->length - sizeof(struct dmi_header)) / 2; On a 64-bit system, count is 0xFFFFFFFE. That's seen as a negative value, and the "for" loop does not do any iterations. So nothing bad happens. But on a 32-bit system, count is 0x7FFFFFFE. That's a big positive number, and the "for" loop iterates to non-existent memory as Michael Schierl originally described. I don't know the "C" rules for mixed signed and unsigned expressions, and how they differ on 32-bit and 64-bit systems. But that's the cause of the different behavior. Regardless of the 32-bit vs. 64-bit behavior, the DMI blob is malformed, almost certainly as created by Hyper-V. I'll see if I can bring this to the attention of one of my previous contacts on the Hyper-V team. Michael > During boot, I can see that 64-bit > Linux wanders through the corrupted part of the DMI blob and > looks at a lot of bogus entries before it gets back on track again. > But the bogus entries don't cause an oops. Once I figure out > those details, we still have the corrupted DMI blob, and based on > your MS-DOS experiment, it's looking like Hyper-V created the > corrupted form. I want to think more about how to debug that. > > FWIW, in comparing the Azure VM with my local VM, it looks like > the corrupted entry is the first type 4 entry describing a CPU. > > Michael Kelley > > > > > > You should also check the memory map (as displayed early at boot, so > > > near the top of dmesg) and verify that the DMI table is located in a > > > "reserved" memory area, so that area can't be used for memory > > > allocation. > > > > The e820 memory map was included in the early printk output I posted > > earlier: > > > > > [ 0.000000] BIOS-provided physical RAM map: > > > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > > > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved > > > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > > > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffeffff] usable > > > [ 0.000000] BIOS-e820: [mem 0x000000007fff0000-0x000000007fffefff] ACPI data > > > [ 0.000000] BIOS-e820: [mem 0x000000007ffff000-0x000000007fffffff] ACPI NVS > > > > And from the dmidecode I pasted earlier: > > > > > Table at 0x000F93D0. > > > > The size is 0x0000439B, so the last byte should be at 0x000FD76A, well > > inside the third i820 entry (the second reserved one) - and accessible > > even from DOS without requiring any extra effort. > > > > > So the table starts at physical address 0xba135000, which is in the > > > following memory map segment: > > > > > > reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved > > > > Looks like UEFI, and well outside the 1MB range :-) > > > > > If the whole DMI table IS located in a "reserved" memory area, it can > > > still get corrupted, but only by code which itself operates on data > > > located in a reserved memory area. > > > > > > > Both DMI tables are corrupted, but are they corrupted in the exact same > > > way? > > > > At least the dumped tables are byte-for-byte identical on both OS > > flavors. And (as I tested above) byte-for-byte identical to a version > > dumped from MS-DOS. > > > > > > Regards, > > > > > > Michael >