Re: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

Jean DELVARE <jdelvare@xxxxxxxx> · Wed, 17 Apr 2024 11:43:40 +0200

Hi Michael and Michael,

Thanks to both of you for all the data and early analysis.

On Tue, 2024-04-16 at 23:20 +0000, Michael Kelley wrote:
> Thanks for the information.  I now have a repro of "dmidecode"
> in user space complaining about a zero length entry, when running
> in a Gen 1 VM with a 64-bit Linux guest.  Looking at
> /sys/firmware/dmi/tables/DMI, that section of the DMI blob definitely
> seems messed up.  The handle is 0x0005, which is the next handle in
> sequence, but the length and type of the entry are zero.  This is a bit
> different from the type 10 entry that you saw the 32-bit kernel
> choking on, and I don't have an explanation for that.  After this
> bogus entry, there are a few bytes I don't recognize, then about
> 100 bytes of zeros, which also seems weird.

Don't let the type 10 distract you. It is entirely possible that the
byte corresponding to type == 10 is already part of the corrupted
memory area. Can you check if the DMI table generated by Hyper-V is
supposed to contain type 10 records at all?

This smells like the DMI table has been overwritten by "something".
Either it happened even before boot, that is, the DMI table generated
by the VM itself is corrupted in the first place, or the DMI table was
originally good but other kernel code wrote some data at the same
memory location (I've seen this once in the past, although that was on
bare metal). That would possibly still be the result of bad information
provided by the VM (for example 2 "hardware" features being told to use
overlapping memory ranges).

You should also check the memory map (as displayed early at boot, so
near the top of dmesg) and verify that the DMI table is located in a
"reserved" memory area, so that area can't be used for memory
allocation. Example on my laptop :

# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.
Table at 0xBA135000.

So the table starts at physical address 0xba135000, which is in the
following memory map segment:

reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved

This memory area is marked as "reserved" so all is well. In my case,
the table is 2256 bytes in size (not always displayed by dmidecode by
default, but you can check the size of file
/sys/firmware/dmi/tables/DMI), so the last byte of the table is at
0xba135000 + 0x8d0 - 1 = 0xba1358cf, which is still within the reserved
range.

If the whole DMI table is NOT located in a "reserved" memory area then
it can get corrupted by any memory allocation.

If the whole DMI table IS located in a "reserved" memory area, it can
still get corrupted, but only by code which itself operates on data
located in a reserved memory area.

> But at this point, it's good that I have a repro. It has been a while since
> I've built and run a 32-bit kernel, but I think I can get that set up with
> the ability to get output during early boot. I'll do some further
> debugging with dmidecode and with the 32-bit kernel to figure out
> what's going on.  There are several mysteries here:  1) Is Hyper-V
> really building a bad DMI blob, or is something else trashing it?

This is a good question, my guess is that the table gets corrupted
afterwards, but better not assume and actually check what the table
looks like at generation time, from the host's perspective.

> 2) Why does a 64-bit kernel succeed on the putative bad DMI blob,
> while a 32-bit kernel fails?

Both DMI tables are corrupted, but are they corrupted in the exact same
way?

>   3) Is dmidecode seeing something different from the Linux kernel?

The DMI table is remapped early at boot time and the result is then
read from dmidecode through /sys/firmware/dmi/tables/DMI. To be honest,
I'm not sure if this "remapping" is a one-time copy or if future
corruption would be reflected to the file. In any case, dmidecode can't
possibly see a less corrupted version of the table. The different
outcome is because dmidecode is more robust to invalid input than the
in-kernel parser.

Note that you can force dmidcode to read the table directly from memory
by using the --no-sysfs option.

> Give me a few days to sort all this out.  And if Linux can be made
> more robust in the face of a bad DMI table entry, I'll submit a
> Linux kernel patch for that.

I agree that the in-kernel DMI table parser should not choke on bad
data. dmidecode has an explicit check on "short entries":

		/*
		 * If a short entry is found (less than 4 bytes), not only it
		 * is invalid, but we cannot reliably locate the next entry.
		 * Better stop at this point, and let the user know his/her
		 * table is broken.
		 */
		if (h.length < 4)
		{
			if (!(opt.flags & FLAG_QUIET))
			{
				fprintf(stderr,
					"Invalid entry length (%u). DMI table "
					"is broken! Stop.\n\n",
					(unsigned int)h.length);
				opt.flags |= FLAG_QUIET;
			}
			break;
		}

We need to add something similar to the kernel DMI table parser,
presumably in dmi_scan.c:dmi_decode_table().

-- 
Jean Delvare
SUSE L3 Support