Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 19, 2025 at 07:24:23PM +0000, Matthew Wilcox wrote:
> On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote:
> > On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote:
> > > FWIW, I'm not seeing this crash or any kernel splat within the
> > > same time (I'll let this run the full 2.5 hours now to verify) on
> > > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I
> > > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This
> > > particular crash seems likely to be an artifact on the development cycle on
> > > next-20250317.
> > 
> > I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5
> > hour run generic/750 doesn't crash at all. So indeed something on the
> > development cycle leads to this particular crash.
> 
> We can't debug two problems at once.
> 
> FOr the first problem, I've demonstrated what the cause is, and that's
> definitely introduced by your patch, so we need to figure out a
> solution.

Sure, yeah I followed that.

> For the second problem, we don't know what it is.  Do you want to bisect
> it to figure out which commit introduced it?

Sure, the culprit is the patch titled:

mm: page_alloc: trace type pollution from compaction capturing

Johannes, any ideas? You can reproduce easily (1-2 minutes) by running
fstests against ext4 with a 4k block size filesystem on linux-next
against the test generic/750.

Below is the splat decoded.

Mar 20 11:52:55 extra-ext4-4k kernel: Linux version 6.14.0-rc6+ (mcgrof@beefy) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #51 SMP PREEMPT_DYNAMIC Thu Mar 20 11:50:32 UTC 2025
Mar 20 11:52:55 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc6+ root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0

< -- etc -->

Mar 20 11:55:27 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-20 11:55:27
Mar 20 11:55:28 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem c20cbdee-a370-4743-80aa-95dec0beaaa2 r/w with ordered data mode. Quota mode: none.
Mar 20 11:56:29 extra-ext4-4k kernel: BUG: unable to handle page fault for address: ffff93098000ba00
Mar 20 11:56:29 extra-ext4-4k kernel: #PF: supervisor read access in kernel mode
Mar 20 11:56:29 extra-ext4-4k kernel: #PF: error_code(0x0000) - not-present page
Mar 20 11:56:29 extra-ext4-4k kernel: PGD 3a201067 P4D 3a201067 PUD 0
Mar 20 11:56:29 extra-ext4-4k kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mar 20 11:56:29 extra-ext4-4k kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc6+ #51
Mar 20 11:56:29 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 41 f7             	add    %al,-0x9(%rcx)
   5:	c0 38 02             	sarb   $0x2,(%rax)
   8:	00 00                	add    %al,(%rax)
   a:	0f 85 2c 01 00 00    	jne    0x13c
  10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
  14:	48 63 d2             	movslq %edx,%rdx
  17:	48 01 ca             	add    %rcx,%rdx
  1a:	85 db                	test   %ebx,%ebx
  1c:	0f 84 f3 00 00 00    	je     0x115
  22:	49 29 d1             	sub    %rdx,%r9
  25:	bb 80 00 00 00       	mov    $0x80,%ebx
  2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction
  2f:	31 d2                	xor    %edx,%edx
  31:	4d 39 ca             	cmp    %r9,%r10
  34:	0f 8d d2 00 00 00    	jge    0x10c
  3a:	ba 01 00 00 00       	mov    $0x1,%edx
  3f:	85                   	.byte 0x85

Code starting with the faulting instruction
===========================================
   0:	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10
   5:	31 d2                	xor    %edx,%edx
   7:	4d 39 ca             	cmp    %r9,%r10
   a:	0f 8d d2 00 00 00    	jge    0xe2
  10:	ba 01 00 00 00       	mov    $0x1,%edx
  15:	85                   	.byte 0x85
Mar 20 11:56:29 extra-ext4-4k kernel: RSP: 0018:ffffa5bb002b7c78 EFLAGS: 00010206
Mar 20 11:56:29 extra-ext4-4k kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 0000000000002431 RDI: ffff93097fff9840
Mar 20 11:56:29 extra-ext4-4k kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000005e90
Mar 20 11:56:29 extra-ext4-4k kernel: R10: 0000000000000c8e R11: 0000000000000c8e R12: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: R13: 0000000000002431 R14: 0000000000000002 R15: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: FS:  0000000000000000(0000) GS:ffff93097bc00000(0000) knlGS:0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00 CR3: 000000010c602004 CR4: 0000000000772ef0
Mar 20 11:56:29 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 20 11:56:29 extra-ext4-4k kernel: PKRU: 55555554
Mar 20 11:56:29 extra-ext4-4k kernel: Call Trace:
Mar 20 11:56:29 extra-ext4-4k kernel:  <TASK>
Mar 20 11:56:29 extra-ext4-4k kernel: ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? page_fault_oops (arch/x86/mm/fault.c:710 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? search_module_extables (kernel/module/main.c:3733 (discriminator 3)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? search_bpf_extables (kernel/bpf/core.c:804) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? exc_page_fault (arch/x86/mm/fault.c:1182 (discriminator 1) arch/x86/mm/fault.c:1478 (discriminator 1) arch/x86/mm/fault.c:1538 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:574) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:574) 
Mar 20 11:56:29 extra-ext4-4k kernel: compaction_suitable (mm/compaction.c:2438) 
Mar 20 11:56:29 extra-ext4-4k kernel: compaction_suit_allocation_order (mm/compaction.c:2525 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: kcompactd_do_work (mm/compaction.c:3106) 
Mar 20 11:56:29 extra-ext4-4k kernel: kcompactd (mm/compaction.c:3220) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_autoremove_wake_function (kernel/sched/wait.c:383) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kcompactd (mm/compaction.c:3184) 
Mar 20 11:56:29 extra-ext4-4k kernel: kthread (kernel/kthread.c:464) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:691 kernel/sched/sched.h:1533 kernel/sched/core.c:5132 kernel/sched/core.c:5250) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 20 11:56:29 extra-ext4-4k kernel: ret_from_fork (arch/x86/kernel/process.c:148) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 20 11:56:29 extra-ext4-4k kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:257) 
Mar 20 11:56:29 extra-ext4-4k kernel:  </TASK>
Mar 20 11:56:29 extra-ext4-4k kernel: Modules linked in: loop sunrpc 9p nls_iso8859_1 nls_cp437 vfat crc32c_generic fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd 9pnet_virtio cryptd virtio_console virtio_balloon button evdev joydev serio_raw dm_mod nvme_fabrics drm nvme_core nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk psmouse virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring
Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00
Mar 20 11:56:29 extra-ext4-4k kernel: ---[ end trace 0000000000000000 ]---
Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 41 f7             	add    %al,-0x9(%rcx)
   5:	c0 38 02             	sarb   $0x2,(%rax)
   8:	00 00                	add    %al,(%rax)
   a:	0f 85 2c 01 00 00    	jne    0x13c
  10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
  14:	48 63 d2             	movslq %edx,%rdx
  17:	48 01 ca             	add    %rcx,%rdx
  1a:	85 db                	test   %ebx,%ebx
  1c:	0f 84 f3 00 00 00    	je     0x115
  22:	49 29 d1             	sub    %rdx,%r9
  25:	bb 80 00 00 00       	mov    $0x80,%ebx
  2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction
  2f:	31 d2                	xor    %edx,%edx
  31:	4d 39 ca             	cmp    %r9,%r10
  34:	0f 8d d2 00 00 00    	jge    0x10c
  3a:	ba 01 00 00 00       	mov    $0x1,%edx
  3f:	85                   	.byte 0x85

Code starting with the faulting instruction
===========================================
   0:	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10
   5:	31 d2                	xor    %edx,%edx
   7:	4d 39 ca             	cmp    %r9,%r10
   a:	0f 8d d2 00 00 00    	jge    0xe2
  10:	ba 01 00 00 00       	mov    $0x1,%edx
  15:	85                   	.byte 0x85
Mar 20 11:56:29 extra-ext4-4k kernel: RSP: 0018:ffffa5bb002b7c78 EFLAGS: 00010206
Mar 20 11:56:29 extra-ext4-4k kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 0000000000002431 RDI: ffff93097fff9840
Mar 20 11:56:29 extra-ext4-4k kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000005e90
Mar 20 11:56:29 extra-ext4-4k kernel: R10: 0000000000000c8e R11: 0000000000000c8e R12: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: R13: 0000000000002431 R14: 0000000000000002 R15: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: FS:  0000000000000000(0000) GS:ffff93097bc00000(0000) knlGS:0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00 CR3: 000000010c602004 CR4: 0000000000772ef0
Mar 20 11:56:29 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 20 11:56:29 extra-ext4-4k kernel: PKRU: 55555554
Mar 20 11:56:29 extra-ext4-4k kernel: note: kcompactd0[74] exited with irqs disabled

  Luis




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux