* Vlastimil Babka: > On 9/30/19 11:17 PM, Dave Chinner wrote: >> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote: >>> * Dave Chinner: >>> >>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: >>>>> Simply running “du -hc” on a large directory tree causes du to be >>>>> killed because of kernel paging request failure in the XFS code. >>>> >>>> dmesg output? if the system was still running, then you might be >>>> able to pull the trace from syslog. But we can't do much without >>>> knowing what the actual failure was.... >>> >>> Huh. I actually have something in syslog: >>> >>> [ 4001.238411] BUG: kernel NULL pointer dereference, address: >>> 0000000000000000 >>> [ 4001.238415] #PF: supervisor read access in kernel mode >>> [ 4001.238417] #PF: error_code(0x0000) - not-present page >>> [ 4001.238418] PGD 0 P4D 0 >>> [ 4001.238420] Oops: 0000 [#1] SMP PTI >>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ >>> #1 >>> [ 4001.238424] Hardware name: System manufacturer System Product >>> Name/P6X58D-E, BIOS 0701 05/10/2011 >>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 >> >> That's memory compaction code it's crashed in. >> >>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 >>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 >>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 >>> 00 00 4c 89 f7 > > Tried to decode it, but couldn't match it to source code, my version of > compiled code is too different. Would it be possible to either send > mm/compaction.o from the matching build, or output of 'objdump -d -l' > for the __reset_isolation_pfn function? (dropping the fs lists) I got another crash, this time triggered by rsync (large tree with many small files, few files changed). Oops: [41969.140117] BUG: kernel NULL pointer dereference, address: 0000000000000000 [41969.140121] #PF: supervisor read access in kernel mode [41969.140122] #PF: error_code(0x0000) - not-present page [41969.140123] PGD 0 P4D 0 [41969.140125] Oops: 0000 [#1] SMP PTI [41969.140127] CPU: 5 PID: 144 Comm: kswapd0 Tainted: G I 5.2.18fw+ #10 [41969.140128] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 [41969.140133] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 [41969.140134] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 [41969.140135] RSP: 0018:ffffc900003ffde0 EFLAGS: 00010246 [41969.140137] RAX: 000000000004fdac RBX: 0000000000118000 RCX: 0000000000000000 [41969.140138] RDX: 0000000000000000 RSI: 0000000000000230 RDI: ffff88833fffa000 [41969.140138] RBP: ffffc900003ffe18 R08: 000000000000003c R09: ffff888335080000 [41969.140139] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000001 [41969.140140] R13: 0000000000000001 R14: ffff888338dc01c0 R15: 0000000000000001 [41969.140141] FS: 0000000000000000(0000) GS:ffff888333d40000(0000) knlGS:0000000000000000 [41969.140142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [41969.140143] CR2: 0000000000000000 CR3: 000000000200a001 CR4: 00000000000206e0 [41969.140144] Call Trace: [41969.140147] __reset_isolation_suitable+0x9b/0x120 [41969.140149] reset_isolation_suitable+0x3b/0x40 [41969.140152] kswapd+0x98/0x300 [41969.140154] ? wait_woken+0x80/0x80 [41969.140157] kthread+0x114/0x130 [41969.140158] ? balance_pgdat+0x450/0x450 [41969.140159] ? kthread_park+0x80/0x80 [41969.140162] ret_from_fork+0x1f/0x30 [41969.140163] Modules linked in: usb_storage nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp serio_raw snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e xhci_pci ptp ehci_pci uhci_hcd xhci_hcd pps_core ehci_hcd sky2 usbcore ttm usb_common sd_mod [41969.140187] CR2: 0000000000000000 [41969.140189] ---[ end trace e27ddb472a95c047 ]--- This time, I've got a kernel with debugging information (still 5.2.18). The crash is at offset 0x39f: if (!mem_section[SECTION_NR_TO_ROOT(nr)]) 384: 48 c1 ea 35 shr $0x35,%rdx 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx 38c: 48 c1 e8 2d shr $0x2d,%rax 390: 48 85 d2 test %rdx,%rdx 393: 74 0a je 39f <__reset_isolation_pfn+0x27f> return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; 395: 0f b6 c0 movzbl %al,%eax 398: 48 c1 e0 04 shl $0x4,%rax 39c: 48 01 c2 add %rax,%rdx unsigned long map = section->section_mem_map; 39f: 48 8b 02 mov (%rdx),%rax clear_pageblock_skip(page); 3a2: 4c 89 f2 mov %r14,%rdx 3a5: 41 b8 01 00 00 00 mov $0x1,%r8d 3ab: 31 f6 xor %esi,%esi 3ad: b9 03 00 00 00 mov $0x3,%ecx 3b2: 4c 89 f7 mov %r14,%rdi Hmm, -l output is likely more helpful here: /home/fw/src/linux/linux/mm/compaction.c:293 37a: a8 10 test $0x10,%al 37c: 74 bc je 33a <__reset_isolation_pfn+0x21a> page_to_section(): /home/fw/src/linux/linux/./include/linux/mm.h:1265 37e: 49 8b 16 mov (%r14),%rdx 381: 48 89 d0 mov %rdx,%rax __nr_to_section(): /home/fw/src/linux/linux/./include/linux/mmzone.h:1218 384: 48 c1 ea 35 shr $0x35,%rdx 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx page_to_section(): /home/fw/src/linux/linux/./include/linux/mm.h:1265 38c: 48 c1 e8 2d shr $0x2d,%rax __nr_to_section(): /home/fw/src/linux/linux/./include/linux/mmzone.h:1218 390: 48 85 d2 test %rdx,%rdx 393: 74 0a je 39f <__reset_isolation_pfn+0x27f> /home/fw/src/linux/linux/./include/linux/mmzone.h:1220 395: 0f b6 c0 movzbl %al,%eax 398: 48 c1 e0 04 shl $0x4,%rax 39c: 48 01 c2 add %rax,%rdx __section_mem_map_addr(): /home/fw/src/linux/linux/./include/linux/mmzone.h:1247 39f: 48 8b 02 mov (%rdx),%rax __reset_isolation_pfn(): /home/fw/src/linux/linux/mm/compaction.c:294 3a2: 4c 89 f2 mov %r14,%rdx 3a5: 41 b8 01 00 00 00 mov $0x1,%r8d 3ab: 31 f6 xor %esi,%esi It's this loop: 286 /* 287 * Only clear the hint if a sample indicates there is either a 288 * free page or an LRU page in the block. One or other condition 289 * is necessary for the block to be a migration source/target. 290 */ 291 do { 292 if (pfn_valid_within(pfn)) { 293 if (check_source && PageLRU(page)) { 294 clear_pageblock_skip(page); 295 return true; 296 } 297 298 if (check_target && PageBuddy(page)) { 299 clear_pageblock_skip(page); 300 return true; 301 } 302 } 303 304 page += (1 << PAGE_ALLOC_COSTLY_ORDER); 305 pfn += (1 << PAGE_ALLOC_COSTLY_ORDER); 306 } while (page < end_page);