Re: kernel BUG at include/linux/mm.h:1020!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2019-03-17 at 15:22 +0000, Mel Gorman wrote:
> On Fri, Mar 15, 2019 at 04:58:27PM -0400, Daniel Jordan wrote:
> > On Tue, Mar 12, 2019 at 10:55:27PM +0500, Mikhail Gavrilov wrote:
> > > Hi folks.
> > > I am observed kernel panic after updated to git commit 610cd4eadec4.
> > > I am did not make git bisect because this crashes occurs spontaneously
> > > and I not have exactly instruction how reproduce it.
> > > 
> > > Hope backtrace below could help understand how fix it:
> > > 
> > > page:ffffef46607ce000 is uninitialized and poisoned
> > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> > > ------------[ cut here ]------------
> > > kernel BUG at include/linux/mm.h:1020!
> > > invalid opcode: 0000 [#1] SMP NOPTI
> > > CPU: 1 PID: 118 Comm: kswapd0 Tainted: G         C
> > > 5.1.0-0.rc0.git4.1.fc31.x86_64 #1
> > > Hardware name: System manufacturer System Product Name/ROG STRIX
> > > X470-I GAMING, BIOS 1201 12/07/2018
> > > RIP: 0010:__reset_isolation_pfn+0x244/0x2b0
> > 
> > This is new code, from e332f741a8dd1 ("mm, compaction: be selective about
> > what
> > pageblocks to clear skip hints"), so I added some folks.
> > 
> 
> I'm travelling at the moment and only online intermittently but I think
> it's worth noting that the check being tripped is during a call to
> page_zone() that also happened before the patch was merged too. I don't
> think it's a new check as such. I haven't been able to isolate a source
> of corruption in the series yet and suspected in at least one case that
> there is another source of corruption that is causing unrelated
> subsystems to trip over.
> 

So reverting this patch on the top of the mainline fixed the memory corruption
for me or at least make it way much harder to reproduce.

dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free
lists for a target")

This is easy to reproduce on both KVM and bare-metal using the reproducer.

# swapoff -a
# i=0; while :; do i=$((i+1)); echo $i | tee /tmp/log ;
/opt/ltp/testcases/bin/oom01; sleep 5; done

The memory corruption always happen within 300 tries. With the above patch
reverted, both the mainline and linux-next survives with 1k+ attempts so far.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux