Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 7 Oct 2019 15:28:17 +0200

On 10/1/19 9:40 PM, Florian Weimer wrote:
> * Vlastimil Babka:
> 
>> On 9/30/19 11:17 PM, Dave Chinner wrote:
>>> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
>>>> * Dave Chinner:
>>>>
>>>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>>>>>> Simply running “du -hc” on a large directory tree causes du to be
>>>>>> killed because of kernel paging request failure in the XFS code.
>>>>>
>>>>> dmesg output? if the system was still running, then you might be
>>>>> able to pull the trace from syslog. But we can't do much without
>>>>> knowing what the actual failure was....
>>>>
>>>> Huh.  I actually have something in syslog:
>>>>
>>>> [ 4001.238411] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000000
>>>> [ 4001.238415] #PF: supervisor read access in kernel mode
>>>> [ 4001.238417] #PF: error_code(0x0000) - not-present page
>>>> [ 4001.238418] PGD 0 P4D 0 
>>>> [ 4001.238420] Oops: 0000 [#1] SMP PTI
>>>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+
>>>> #1
>>>> [ 4001.238424] Hardware name: System manufacturer System Product
>>>> Name/P6X58D-E, BIOS 0701 05/10/2011
>>>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>>>
>>> That's memory compaction code it's crashed in.
>>>
>>>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0
>>>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1
>>>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00
>>>> 00 00 4c 89 f7
>>
>> Tried to decode it, but couldn't match it to source code, my version of
>> compiled code is too different. Would it be possible to either send
>> mm/compaction.o from the matching build, or output of 'objdump -d -l'
>> for the __reset_isolation_pfn function?
> 
> See below.  I don't have debuginfo for this build, and the binary does
> not reproduce for some reason.  Due to the heavy inlining, it might be
> quite hard to figure out what's going on.

Thanks, but I'm still not able to "decompile" that in my head.

> I've switched to kernel builds with debuginfo from now on.  I'm
> surprised that it's not the default.

Let's see if you can reproduce it with that.

However, I've noticed at least something weird:

>      37e:	49 8b 16             	mov    (%r14),%rdx
>      381:	48 89 d0             	mov    %rdx,%rax
>      384:	48 c1 ea 35          	shr    $0x35,%rdx
>      388:	48 8b 14 d7          	mov    (%rdi,%rdx,8),%rdx
>      38c:	48 c1 e8 2d          	shr    $0x2d,%rax
>      390:	48 85 d2             	test   %rdx,%rdx
>      393:	74 0a                	je     39f <__reset_isolation_pfn+0x27f>

IIUC, this will jump to 39f when rdx is zero.

>      395:	0f b6 c0             	movzbl %al,%eax
>      398:	48 c1 e0 04          	shl    $0x4,%rax
>      39c:	48 01 c2             	add    %rax,%rdx
>      39f:	48 8b 02             	mov    (%rdx),%rax

And this is where we crash because rdx is zero. So the test+branch might
have sent us directly here to crash. Sounds like an inverted condition
somewhere? Or possibly a result of optimizations.