On 07/13/2018 08:28 PM, Linus Torvalds wrote: > On Fri, Jul 13, 2018 at 5:20 PM Pavel Tatashin > <pasha.tatashin@xxxxxxxxxx> wrote: >> >> I'd like to try to reproduce it as well, were you able to reproduce this problem in qemu? What were the qemu arguments if so? > > No, this is actually on raw hardware. I've had a unstable machine for > the last couple of weeks, and it just hung with no sign of where. > > I finally reproduced it reliably by booting with less memory > ("mem=6G") and then putting the machine under memory pressure and then > I could get it on the console when the machine died. Before that it > was just an occasional hung machine randomly every other day or > whatever. > > If it reproduces in emulation, that will certainly make it easier to > see the messages. > > But since I suspect it might be related to having that odd (read: real > life) e820 table setup, it might not reproduce in emulation. At least > when I boot up in lkvm-run, I don't see those ACPI tables and ACPI NVS > sections, which seems to be related to this. > > I'm attaching my kernel-config (this is the non-debug one - it does > have CONFIG_DEBUG_VM, but none of the other debug options I ran with > for the last few days in the hope of catching it earlier). > > Linus > I will try to reproduce it on bare metal. I believe, the problem was narrowed down to this commit: 124049decbb1 x86/e820: put !E820_TYPE_RAM regions into memblock.reserved The commit intends to zero memmap (struct pages) for every hole in e820 ranges by marking them reserved in memblock. Later zero_resv_unavail() walks through memmap ranges and zeroes struct pages for every page that is reserved, but does not have a physical backing known by kernel. Pavel