Hi guys,
On 08/21/2018 11:44 AM, Michal Hocko wrote:
On Fri 17-08-18 15:44:27, Mikulas Patocka wrote:
I report this crash on ARM64 on the kernel 4.17.11. The reason is that the
function move_freepages_block accesses contiguous runs of
pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there
and when move_freepages_block stumbles over this hole, it accesses
uninitialized page structures and crashes.
Any idea if this is nomap (so a hole in the linear map), or a missing struct page?
00000000-03ffffff : System RAM
00080000-007bffff : Kernel code
00820000-00aa3fff : Kernel data
04200000-bf80ffff : System RAM
bf810000-bfbeffff : reserved
bfbf0000-bfc8ffff : System RAM
bfc90000-bffdffff : reserved
bffe0000-bfffffff : System RAM
c0000000-dfffffff : MEM
c0000000-c00fffff : PCI Bus 0000:01
c0000000-c0003fff : 0000:01:00.0
c0000000-c0003fff : nvme
To test Laura's bounds-of-zone theory [0], could you put some empty space between the
nvme and the System RAM? (It sounds like this is a KVM guest). Reducing the amount of
memory is probably easiest.
The bug was already reported here for x86:
https://bugzilla.redhat.com/show_bug.cgi?id=1598462
For x86, it was fixed in the kernel 4.17.7 - but I observed it in the
kernel 4.17.11 on ARM64. I also observed it on 4.18-rc kernels running in
KVM virtual machine on ARM when I compiled the guest kernel with 64kB page
size.
I'm not sure this is the same bug.
[1] reports hitting a VM_BUG, this is a dereference of -ENOENT:
Unable to handle kernel paging request at virtual address fffffffffffffffe
Does your kernel have HOLES_IN_ZONE enabled? (It looks like it depends on NUMA)
Could you reproduce this with CONIG_DEBUG_VM enabled?
move_freepages() uses pfn_valid_within(), so it should handle missing struct pages in
this range.
CPU: 3 PID: 14823 Comm: updatedb.mlocat Not tainted 4.17.11 #16
Hardware name: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018
pstate: 00000085 (nzcv daIf -PAN -UAO)
pc : move_freepages_block+0xb4/0x160
lr : steal_suitable_fallback+0xe4/0x188
Any chance you could addr2line these?
Call trace:
move_freepages_block+0xb4/0x160
get_page_from_freelist+0xad8/0xea8
__alloc_pages_nodemask+0xac/0x970
new_slab+0xc0/0x348
___slab_alloc.constprop.32+0x2cc/0x350
__slab_alloc.isra.26.constprop.31+0x24/0x38
kmem_cache_alloc+0x168/0x198
spadfs_alloc_inode+0x2c/0x88
alloc_inode+0x20/0xa0
iget5_locked+0xf8/0x1c0
spadfs_iget+0x44/0x4c8
spadfs_lookup+0x70/0x108
Hmmm. What's this?
Thanks,
James
[0] https://www.spinics.net/lists/linux-mm/msg157223.html
[1] https://www.spinics.net/lists/linux-mm/msg156764.html