Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Luis,

On Thu, Mar 20, 2025 at 05:11:19AM -0700, Luis Chamberlain wrote:
> On Wed, Mar 19, 2025 at 07:24:23PM +0000, Matthew Wilcox wrote:
> > On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote:
> > > On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote:
> > > > FWIW, I'm not seeing this crash or any kernel splat within the
> > > > same time (I'll let this run the full 2.5 hours now to verify) on
> > > > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I
> > > > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This
> > > > particular crash seems likely to be an artifact on the development cycle on
> > > > next-20250317.
> > > 
> > > I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5
> > > hour run generic/750 doesn't crash at all. So indeed something on the
> > > development cycle leads to this particular crash.
> > 
> > We can't debug two problems at once.
> > 
> > FOr the first problem, I've demonstrated what the cause is, and that's
> > definitely introduced by your patch, so we need to figure out a
> > solution.
> 
> Sure, yeah I followed that.
> 
> > For the second problem, we don't know what it is.  Do you want to bisect
> > it to figure out which commit introduced it?
> 
> Sure, the culprit is the patch titled:
> 
> mm: page_alloc: trace type pollution from compaction capturing
> 
> Johannes, any ideas? You can reproduce easily (1-2 minutes) by running
> fstests against ext4 with a 4k block size filesystem on linux-next
> against the test generic/750.

Sorry for the late reply, I just saw your emails now.

> Below is the splat decoded.
> 
> Mar 20 11:52:55 extra-ext4-4k kernel: Linux version 6.14.0-rc6+ (mcgrof@beefy) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #51 SMP PREEMPT_DYNAMIC Thu Mar 20 11:50:32 UTC 2025
> Mar 20 11:52:55 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc6+ root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0
> 
> < -- etc -->
> 
> Mar 20 11:55:27 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-20 11:55:27
> Mar 20 11:55:28 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem c20cbdee-a370-4743-80aa-95dec0beaaa2 r/w with ordered data mode. Quota mode: none.
> Mar 20 11:56:29 extra-ext4-4k kernel: BUG: unable to handle page fault for address: ffff93098000ba00
> Mar 20 11:56:29 extra-ext4-4k kernel: #PF: supervisor read access in kernel mode
> Mar 20 11:56:29 extra-ext4-4k kernel: #PF: error_code(0x0000) - not-present page
> Mar 20 11:56:29 extra-ext4-4k kernel: PGD 3a201067 P4D 3a201067 PUD 0
> Mar 20 11:56:29 extra-ext4-4k kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> Mar 20 11:56:29 extra-ext4-4k kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc6+ #51
> Mar 20 11:56:29 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
> Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
> Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
> All code
> ========
>    0:	00 00                	add    %al,(%rax)
>    2:	00 41 f7             	add    %al,-0x9(%rcx)
>    5:	c0 38 02             	sarb   $0x2,(%rax)
>    8:	00 00                	add    %al,(%rax)
>    a:	0f 85 2c 01 00 00    	jne    0x13c
>   10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
>   14:	48 63 d2             	movslq %edx,%rdx
>   17:	48 01 ca             	add    %rcx,%rdx
>   1a:	85 db                	test   %ebx,%ebx
>   1c:	0f 84 f3 00 00 00    	je     0x115
>   22:	49 29 d1             	sub    %rdx,%r9
>   25:	bb 80 00 00 00       	mov    $0x80,%ebx
>   2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction

This looks like the same issue the bot reported here:

https://lore.kernel.org/all/20250321135524.GA1888695@xxxxxxxxxxx/

There is a fix for it queued in next-20250318 and later. Could you
please double check with your reproducer against a more recent next?

Thanks




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux