On 14 July 2018 at 02:20, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, Jul 13, 2018 at 4:51 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> I'm building a "replace VM_BUG_ON() with proper printk's instead" right now. > > Ok, the machine now stays up, and I get messages like > > Removed VM_BUG_ON()! > pfn c2400 - c25ff > zone DMA32 DMA > zone pfn 1000 1 > > Removed VM_BUG_ON()! > pfn c0a00 - c0bff > zone DMA32 DMA > zone pfn 1000 1 > > Removed VM_BUG_ON()! > pfn c2200 - c23ff > zone DMA DMA32 > zone pfn 1 1000 > > instead. > > That's from > > + printk("Removed VM_BUG_ON()!\n"); > + printk(" pfn %lx - %lx\n", page_to_pfn(start_page), > page_to_pfn(end_page)); > + printk(" zone %s %s\n", page_zone(start_page)->name, > page_zone(end_page)->name); > + printk(" zone pfn %lx %lx\n", > page_zone(start_page)->zone_start_pfn, > page_zone(end_page)->zone_start_pfn); > > inside an if() statement that replaced that VM_BUG_ON(). > > WTF? That's just odd. > > But everything seems to work fine, and now it doesn't crash. > > But there's something really odd going on wrt page_zone() and/or page_to_pfn(). > > page_to_pfn() implies this is just regular memory in the 3GB area. It > is likely related to this: > > BIOS-e820: [mem 0x00000000c0b33000-0x00000000c226cfff] reserved > BIOS-e820: [mem 0x00000000c226d000-0x00000000c227efff] ACPI data > BIOS-e820: [mem 0x00000000c227f000-0x00000000c2439fff] usable > BIOS-e820: [mem 0x00000000c243a000-0x00000000c2a61fff] ACPI NVS > BIOS-e820: [mem 0x00000000c2a62000-0x00000000c32fefff] reserved > BIOS-e820: [mem 0x00000000c32ff000-0x00000000c32fffff] usable > BIOS-e820: [mem 0x00000000c3300000-0x00000000c7ffffff] reserved > > I dunno. It's a bit odd. I'm not sure I understand that VM_BUG_ON(). > Adding Ard (who worked on the memblock_next_valid_pfn() thing not that > long ago) and must have hit this same BUG_ON() because he modified it > not that long ago. > > Ard, I triggered the VM_BUG_ON() in mm/page_alloc.c:2016, with a call trace opf > > RIP: move_pfreepages_block() > Call Trace: > steal_suitable_fallback > get_page_from_freelist > ... > > just for some context. > I won't have time to dig into this before the middle of the week, but at first glance, it seems that those reserved regions have been assigned to the wrong zone. Given that they are reserved, that by itself probably does not matter in the first place, and I'd be more interested in understanding where those intervals are coming from that start or end right in the middle of a reserved region.