Re: [Bug 203715] New: BUG: unable to handle kernel NULL pointer dereference under stress (possibly related to https://lkml.org/lkml/2019/5/24/292 ?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

Mel, we may have a regression from e332f741a8dd1 ("mm, compaction: be
selective about what pageblocks to clear skip hints").  The crash sure
looks like the one which 60fce36afa9c77c7 ("mm/compaction.c: correct
zone boundary handling when isolating pages from a pageblock") fixed,
but Gabriele can reproduce it with 5.1.5.  I've confirmed that 5.1.5
has 60fce36afa9c77c7.

Thanks.

On Mon, 27 May 2019 10:12:30 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=203715
> 
>             Bug ID: 203715
>            Summary: BUG: unable to handle kernel NULL pointer dereference
>                     under stress (possibly related to
>                     https://lkml.org/lkml/2019/5/24/292 ?)
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.1+
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
>           Reporter: balducci@xxxxxxxx
>         Regression: No
> 
> Created attachment 282949
>   --> https://bugzilla.kernel.org/attachment.cgi?id=282949&action=edit
> crash log n.1
> 
> hello
> 
> since 5.1 I'm getting machine freezes like:
> 
>     May  7 18:00:10 dschgrazlin3 kernel: BUG: unable to handle kernel NULL
> pointer dereference at 0000000000000000
>     May  7 18:00:10 dschgrazlin3 kernel: #PF error: [normal kernel read fault]
>     May  7 18:00:10 dschgrazlin3 kernel: PGD 0 P4D 0 
>     May  7 18:00:10 dschgrazlin3 kernel: Oops: 0000 [#1] SMP
>     May  7 18:00:10 dschgrazlin3 kernel: CPU: 3 PID: 44 Comm: kswapd0 Not
> tainted 5.1.0 #1
>     May  7 18:00:10 dschgrazlin3 kernel: Hardware name: System manufacturer
> System Product Name/F2A85-M PRO, BIOS 5104 09/14/2012
>     May  7 18:00:10 dschgrazlin3 kernel: RIP:
> 0010:__reset_isolation_pfn+0x2cb/0x410
>     [...]
>     May  7 18:00:10 dschgrazlin3 kernel: Call Trace:
>     May  7 18:00:10 dschgrazlin3 kernel:  __reset_isolation_suitable+0x95/0x110
>     May  7 18:00:10 dschgrazlin3 kernel:  ? __wake_up_common_lock+0xd0/0xd0
>     May  7 18:00:10 dschgrazlin3 kernel:  reset_isolation_suitable+0x34/0x40
>     May  7 18:00:10 dschgrazlin3 kernel:  kswapd+0xad/0x2c0
>     May  7 18:00:10 dschgrazlin3 kernel:  ? __wake_up_common_lock+0xd0/0xd0
>     May  7 18:00:10 dschgrazlin3 kernel:  ? balance_pgdat+0x440/0x440
>     May  7 18:00:10 dschgrazlin3 kernel:  kthread+0xff/0x120
>     May  7 18:00:10 dschgrazlin3 kernel:  ?
> __kthread_create_on_node+0x1b0/0x1b0
>     May  7 18:00:10 dschgrazlin3 kernel:  ret_from_fork+0x1f/0x30
>     May  7 18:00:10 dschgrazlin3 kernel: CR2: 0000000000000000
>     May  7 18:00:10 dschgrazlin3 kernel: ---[ end trace 075fb7a28df7d1d4 ]---
>     May  7 18:00:10 dschgrazlin3 kernel: RIP:
> 0010:__reset_isolation_pfn+0x2cb/0x410
>     [...]
> 
> (complete logs attached)
> 
> I started having this during firefox build, but experienced it during
> other build processes (mesa, gcc). The problem always appears under
> heavy load of the machine.
> 
> Unfortunately, the problem cannot be triggered with probability=1,
> although firefox build triggers the machine freeze almost always (at
> random points of the build, though)
> 
> I experience the problem on two twin boxes, which makes me exclude HW
> issues.
> 
> Absolutely no problems when running kernels <5.1 (<=5.0.15)
> 
> In some cases, I got the kernel screams without complete machine freeze,
> but with heavily reduced functionality of the whole system (eg ls
> command hanging)
> 
> Due to the issue not being always reproducible, bisection isn't 100%
> reliable; however the first bad commit seems to be
> e332f741a8dd1ec9a6dc8aa997296ecbfe64323e
> 
> I'll be happy to provide any other file/information which might be
> useful
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux