On Tue, Mar 19, 2019 at 03:14:51PM -0400, Qian Cai wrote: > On Sun, 2019-03-17 at 15:22 +0000, Mel Gorman wrote: > > On Fri, Mar 15, 2019 at 04:58:27PM -0400, Daniel Jordan wrote: > > > On Tue, Mar 12, 2019 at 10:55:27PM +0500, Mikhail Gavrilov wrote: > > > > Hi folks. > > > > I am observed kernel panic after updated to git commit 610cd4eadec4. > > > > I am did not make git bisect because this crashes occurs spontaneously > > > > and I not have exactly instruction how reproduce it. > > > > > > > > Hope backtrace below could help understand how fix it: > > > > > > > > page:ffffef46607ce000 is uninitialized and poisoned > > > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff > > > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff > > > > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) > > > > ------------[ cut here ]------------ > > > > kernel BUG at include/linux/mm.h:1020! > > > > invalid opcode: 0000 [#1] SMP NOPTI > > > > CPU: 1 PID: 118 Comm: kswapd0 Tainted: G C > > > > 5.1.0-0.rc0.git4.1.fc31.x86_64 #1 > > > > Hardware name: System manufacturer System Product Name/ROG STRIX > > > > X470-I GAMING, BIOS 1201 12/07/2018 > > > > RIP: 0010:__reset_isolation_pfn+0x244/0x2b0 > > > > > > This is new code, from e332f741a8dd1 ("mm, compaction: be selective about > > > what > > > pageblocks to clear skip hints"), so I added some folks. > > > > > > > I'm travelling at the moment and only online intermittently but I think > > it's worth noting that the check being tripped is during a call to > > page_zone() that also happened before the patch was merged too. I don't > > think it's a new check as such. I haven't been able to isolate a source > > of corruption in the series yet and suspected in at least one case that > > there is another source of corruption that is causing unrelated > > subsystems to trip over. > > > > So reverting this patch on the top of the mainline fixed the memory corruption > for me or at least make it way much harder to reproduce. > > dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free > lists for a target") > Ok, thanks for that. I'm just about to fly and didn't reexamine the patch in detail. I'll review again and see if there are cases where order goes negative which would lead to improper accesses when I get back online properly. It's possible that next_search_order() is ending up with negative values because of assumptions made about the value of cc->order.