(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). Mel, we may have a regression from e332f741a8dd1 ("mm, compaction: be selective about what pageblocks to clear skip hints"). The crash sure looks like the one which 60fce36afa9c77c7 ("mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock") fixed, but Gabriele can reproduce it with 5.1.5. I've confirmed that 5.1.5 has 60fce36afa9c77c7. Thanks. On Mon, 27 May 2019 10:12:30 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=203715 > > Bug ID: 203715 > Summary: BUG: unable to handle kernel NULL pointer dereference > under stress (possibly related to > https://lkml.org/lkml/2019/5/24/292 ?) > Product: Memory Management > Version: 2.5 > Kernel Version: 5.1+ > Hardware: x86-64 > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > Assignee: akpm@xxxxxxxxxxxxxxxxxxxx > Reporter: balducci@xxxxxxxx > Regression: No > > Created attachment 282949 > --> https://bugzilla.kernel.org/attachment.cgi?id=282949&action=edit > crash log n.1 > > hello > > since 5.1 I'm getting machine freezes like: > > May 7 18:00:10 dschgrazlin3 kernel: BUG: unable to handle kernel NULL > pointer dereference at 0000000000000000 > May 7 18:00:10 dschgrazlin3 kernel: #PF error: [normal kernel read fault] > May 7 18:00:10 dschgrazlin3 kernel: PGD 0 P4D 0 > May 7 18:00:10 dschgrazlin3 kernel: Oops: 0000 [#1] SMP > May 7 18:00:10 dschgrazlin3 kernel: CPU: 3 PID: 44 Comm: kswapd0 Not > tainted 5.1.0 #1 > May 7 18:00:10 dschgrazlin3 kernel: Hardware name: System manufacturer > System Product Name/F2A85-M PRO, BIOS 5104 09/14/2012 > May 7 18:00:10 dschgrazlin3 kernel: RIP: > 0010:__reset_isolation_pfn+0x2cb/0x410 > [...] > May 7 18:00:10 dschgrazlin3 kernel: Call Trace: > May 7 18:00:10 dschgrazlin3 kernel: __reset_isolation_suitable+0x95/0x110 > May 7 18:00:10 dschgrazlin3 kernel: ? __wake_up_common_lock+0xd0/0xd0 > May 7 18:00:10 dschgrazlin3 kernel: reset_isolation_suitable+0x34/0x40 > May 7 18:00:10 dschgrazlin3 kernel: kswapd+0xad/0x2c0 > May 7 18:00:10 dschgrazlin3 kernel: ? __wake_up_common_lock+0xd0/0xd0 > May 7 18:00:10 dschgrazlin3 kernel: ? balance_pgdat+0x440/0x440 > May 7 18:00:10 dschgrazlin3 kernel: kthread+0xff/0x120 > May 7 18:00:10 dschgrazlin3 kernel: ? > __kthread_create_on_node+0x1b0/0x1b0 > May 7 18:00:10 dschgrazlin3 kernel: ret_from_fork+0x1f/0x30 > May 7 18:00:10 dschgrazlin3 kernel: CR2: 0000000000000000 > May 7 18:00:10 dschgrazlin3 kernel: ---[ end trace 075fb7a28df7d1d4 ]--- > May 7 18:00:10 dschgrazlin3 kernel: RIP: > 0010:__reset_isolation_pfn+0x2cb/0x410 > [...] > > (complete logs attached) > > I started having this during firefox build, but experienced it during > other build processes (mesa, gcc). The problem always appears under > heavy load of the machine. > > Unfortunately, the problem cannot be triggered with probability=1, > although firefox build triggers the machine freeze almost always (at > random points of the build, though) > > I experience the problem on two twin boxes, which makes me exclude HW > issues. > > Absolutely no problems when running kernels <5.1 (<=5.0.15) > > In some cases, I got the kernel screams without complete machine freeze, > but with heavily reduced functionality of the whole system (eg ls > command hanging) > > Due to the issue not being always reproducible, bisection isn't 100% > reliable; however the first bad commit seems to be > e332f741a8dd1ec9a6dc8aa997296ecbfe64323e > > I'll be happy to provide any other file/information which might be > useful > > -- > You are receiving this mail because: > You are the assignee for the bug.