Hi Mikulas, On 23/08/18 12:02, Mikulas Patocka wrote: > On Tue, 21 Aug 2018, James Morse wrote: >> On 08/21/2018 11:44 AM, Michal Hocko wrote: >>> On Fri 17-08-18 15:44:27, Mikulas Patocka wrote: >>>> I report this crash on ARM64 on the kernel 4.17.11. The reason is that the >>>> function move_freepages_block accesses contiguous runs of >>>> pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there >>>> and when move_freepages_block stumbles over this hole, it accesses >>>> uninitialized page structures and crashes. >> >> Any idea if this is nomap (so a hole in the linear map), or a missing struct >> page? > > The page for this hole seems to be filled with 0xff. This sounds like a memblock:nomap region, it has a struct page, but it hasn't been initialized. deferred_init_memmap() won't initialise struct pages for memblock:nomap pages as its for_each_free_mem_range() loops use MEMBLOCK_NONE as the required flags. pfn_valid() will return false for these nomap pages, so the struct page should never be accessed. For the fault you're seeing, move_freepages() is using pfn_valid_within(), but this is optimised out as you don't have HOLES_IN_ZONE. This looks like a disconnect between nomap, ARCH_HAS_HOLES_MEMORYMODEL and HOLES_IN_ZONE. Arm64 only enables HOLES_IN_ZONE for NUMA systems: 6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA") It doesn't look like you can't disable ARCH_HAS_HOLES_MEMORYMODEL or SPARSEMEM for arm64. My best-guess is that pfn_valid_within() shouldn't be optimised out if ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set. Does something like this solve the problem?: ============================%<============================ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 32699b2dc52a..5e27095a15f4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsigned long end); * pfn_valid_within() should be used in this case; we optimise this away * when we have no holes within a MAX_ORDER_NR_PAGES block. */ -#ifdef CONFIG_HOLES_IN_ZONE +#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL) #define pfn_valid_within(pfn) pfn_valid(pfn) #else #define pfn_valid_within(pfn) (1) ============================%<============================ >> To test Laura's bounds-of-zone theory [0], could you put some empty space >> between the nvme and the System RAM? (It sounds like this is a KVM guest). >> Reducing the amount of memory is probably easiest. > > This is not KVM - it is real hardware with real PCIe nvme device. I don't > have smaller memory stick. Ah, you mentioned KVM/guests further down, given your nvme is right up against the top of the System RAM I assumed this was a guest! > The board can use u-boot firmware or EFI firmware. The u-boot firmware > doesn't put a hole in the memory map and the board has been running with > it for several months without a problem. > The EFI firmware puts a hole below 0xc0000000 and I got a crash after two > weeks of uptime. This will be because of UEFI's use of nomap when the EFI memory map describes the memory as having incompatible attributes to the kernel linear-map. (if you boot with efi=debug it will dump the uefi memory map) > I analyzed the assembler: > PageBuddy in move_freepages returns false > Then we call PageLRU, the macro calls PF_HEAD which is compound_page() > compound_page reads page->compound_head, it is 0xffffffffffffffff, so it > resturns 0xfffffffffffffffe - and accessing this address causes crash Thanks! That wasn't straightforward to work out without the vmlinux. Because you see all-ones, even in KVM, it looks like the struct page is being initialized like that deliberately... I haven't found where this might be happening. Thanks, James