On Fri, Jul 13, 2018 at 11:25 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, Jul 13, 2018 at 8:04 PM Pavel Tatashin > <pasha.tatashin@xxxxxxxxxx> wrote: > > > > > You can't just memset() the 'struct page' to zero after it's been set up. > > > > That should not be happening, unless there is a bug. > > Well, it does seem to happen. My memory stress-tester has been running > for about half an hour now with the revert I posted - it used to > trigger the problem in maybe ~5 minutes before. > > So I do think that revert fixes it for me. No guarantees, but since I > figured out how to trigger it, it's been fairly reliable. > > > We want to zero those struct pages so we do not have uninitialized > > data accessed by various parts of the code that rounds down large > > pages and access the first page in section without verifying that the > > page is valid. The example of this is described in commit that > > introduced zero_resv_unavail() > > I'm attaching the relevant (?) parts of dmesg, which has the node > ranges, maybe you can see what the problem with the code is. > > (NOTE! This dmesg is with that "mem=6G" command line option, which causes that > > e820: remove [mem 0x180000000-0xfffffffffffffffe] usable > > line - that's just because it's my stress-test boot. It happens with > or without it, but without the "mem=6G" it took days to trigger). > > I'm more than willing to test patches (either for added information or > for testing fixes), although I think I'm getting off the computer for > today. Thank you. I am ok with reverting these patches. I will study the bug that was introduced by "f7f99100d8d9 mm: stop zeroing memory during allocation in vmemmap", and post a fixed version later. Thank you, Pavel