On Sun, 2019-03-17 at 15:22 +0000, Mel Gorman wrote: > On Fri, Mar 15, 2019 at 04:58:27PM -0400, Daniel Jordan wrote: > > On Tue, Mar 12, 2019 at 10:55:27PM +0500, Mikhail Gavrilov wrote: > > > Hi folks. > > > I am observed kernel panic after updated to git commit 610cd4eadec4. > > > I am did not make git bisect because this crashes occurs spontaneously > > > and I not have exactly instruction how reproduce it. > > > > > > Hope backtrace below could help understand how fix it: > > > > > > page:ffffef46607ce000 is uninitialized and poisoned > > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff > > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff > > > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) > > > ------------[ cut here ]------------ > > > kernel BUG at include/linux/mm.h:1020! > > > invalid opcode: 0000 [#1] SMP NOPTI > > > CPU: 1 PID: 118 Comm: kswapd0 Tainted: G C > > > 5.1.0-0.rc0.git4.1.fc31.x86_64 #1 > > > Hardware name: System manufacturer System Product Name/ROG STRIX > > > X470-I GAMING, BIOS 1201 12/07/2018 > > > RIP: 0010:__reset_isolation_pfn+0x244/0x2b0 > > > > This is new code, from e332f741a8dd1 ("mm, compaction: be selective about > > what > > pageblocks to clear skip hints"), so I added some folks. > > > > I'm travelling at the moment and only online intermittently but I think > it's worth noting that the check being tripped is during a call to > page_zone() that also happened before the patch was merged too. I don't > think it's a new check as such. I haven't been able to isolate a source > of corruption in the series yet and suspected in at least one case that > there is another source of corruption that is causing unrelated > subsystems to trip over. > So reverting this patch on the top of the mainline fixed the memory corruption for me or at least make it way much harder to reproduce. dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free lists for a target") This is easy to reproduce on both KVM and bare-metal using the reproducer. # swapoff -a # i=0; while :; do i=$((i+1)); echo $i | tee /tmp/log ; /opt/ltp/testcases/bin/oom01; sleep 5; done The memory corruption always happen within 300 tries. With the above patch reverted, both the mainline and linux-next survives with 1k+ attempts so far.