On 10/16/19 9:38 PM, Florian Weimer wrote: > This time, I've got a kernel with debugging information (still > 5.2.18). The crash is at offset 0x39f: > > if (!mem_section[SECTION_NR_TO_ROOT(nr)]) > 384: 48 c1 ea 35 shr $0x35,%rdx > 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx > 38c: 48 c1 e8 2d shr $0x2d,%rax > 390: 48 85 d2 test %rdx,%rdx > 393: 74 0a je 39f <__reset_isolation_pfn+0x27f> > return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; > 395: 0f b6 c0 movzbl %al,%eax > 398: 48 c1 e0 04 shl $0x4,%rax > 39c: 48 01 c2 add %rax,%rdx > unsigned long map = section->section_mem_map; > 39f: 48 8b 02 mov (%rdx),%rax > clear_pageblock_skip(page); > 3a2: 4c 89 f2 mov %r14,%rdx > 3a5: 41 b8 01 00 00 00 mov $0x1,%r8d > 3ab: 31 f6 xor %esi,%esi > 3ad: b9 03 00 00 00 mov $0x3,%ecx > 3b2: 4c 89 f7 mov %r14,%rdi > > Hmm, -l output is likely more helpful here: > > /home/fw/src/linux/linux/mm/compaction.c:293 > 37a: a8 10 test $0x10,%al > 37c: 74 bc je 33a <__reset_isolation_pfn+0x21a> > page_to_section(): > /home/fw/src/linux/linux/./include/linux/mm.h:1265 > 37e: 49 8b 16 mov (%r14),%rdx > 381: 48 89 d0 mov %rdx,%rax > __nr_to_section(): > /home/fw/src/linux/linux/./include/linux/mmzone.h:1218 > 384: 48 c1 ea 35 shr $0x35,%rdx > 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx > page_to_section(): > /home/fw/src/linux/linux/./include/linux/mm.h:1265 > 38c: 48 c1 e8 2d shr $0x2d,%rax > __nr_to_section(): > /home/fw/src/linux/linux/./include/linux/mmzone.h:1218 > 390: 48 85 d2 test %rdx,%rdx > 393: 74 0a je 39f <__reset_isolation_pfn+0x27f> > /home/fw/src/linux/linux/./include/linux/mmzone.h:1220 > 395: 0f b6 c0 movzbl %al,%eax > 398: 48 c1 e0 04 shl $0x4,%rax > 39c: 48 01 c2 add %rax,%rdx > __section_mem_map_addr(): > /home/fw/src/linux/linux/./include/linux/mmzone.h:1247 > 39f: 48 8b 02 mov (%rdx),%rax > __reset_isolation_pfn(): > /home/fw/src/linux/linux/mm/compaction.c:294 > 3a2: 4c 89 f2 mov %r14,%rdx > 3a5: 41 b8 01 00 00 00 mov $0x1,%r8d > 3ab: 31 f6 xor %esi,%esi > > It's this loop: > > 286 /* > 287 * Only clear the hint if a sample indicates there is either a > 288 * free page or an LRU page in the block. One or other condition > 289 * is necessary for the block to be a migration source/target. > 290 */ > 291 do { > 292 if (pfn_valid_within(pfn)) { > 293 if (check_source && PageLRU(page)) { > 294 clear_pageblock_skip(page); Thanks. Looks like it's indeed here in the page_to_pfn() embedded in the clear_pageblock_skip() expansion. We've got a wrong struct page pointer, so page_to_section gives us a bogus value, __nr_to_section() a null pointer, and __section_mem_map_addr then accesses it. Hopefully the commit [1] should address the reason why we got a wrong page pointer. You could try cherry-picking it to your stable tree, or wait until it appears in a (hopefully near) future stable 5.3.y (5.2 is EOL, so it won't appear there). Thanks, Vlastimil > 295 return true; > 296 } > 297 > 298 if (check_target && PageBuddy(page)) { > 299 clear_pageblock_skip(page); > 300 return true; > 301 } > 302 } > 303 > 304 page += (1 << PAGE_ALLOC_COSTLY_ORDER); > 305 pfn += (1 << PAGE_ALLOC_COSTLY_ORDER); > 306 } while (page < end_page); >