Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 23 Jan 2024 09:09:11 +1100

On Mon, Jan 22, 2024 at 09:18:56PM +0800, Zorro Lang wrote:
> On Mon, Jan 22, 2024 at 10:21:07PM +1100, Dave Chinner wrote:
> > On Mon, Jan 22, 2024 at 03:23:12PM +0800, Zorro Lang wrote:
> > > On Sun, Jan 21, 2024 at 10:58:49AM +1100, Dave Chinner wrote:
> > > > On Sat, Jan 20, 2024 at 07:26:00PM +0800, Zorro Lang wrote:
> > > > > On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> > > > > > Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> > > > > > fixed it? Nothing in the for-next branch really looks relevant to
> > > > > > the problem to me....
> > > > > 
> > > > > Hi Dave,
> > > > > 
> > > > > Finally, I got a chance to reproduce this issue on latest upstream mainline
> > > > > linux (HEAD=9d64bf433c53) (and linux-xfs) again.
> > > > > 
> > > > > Looks like some userspace updates hide the issue, but I haven't found out what
> > > > > change does that, due to it's a big change about a whole system version. I
> > > > > reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
> > > > > (I'll try to find out what changes cause that later if it's necessary)
> > > > > 
> > > > > Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
> > > > > you suggested. And got the xfs metadump file after it crashed [1] and rebooted.
> > > > > 
> > > > > Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
> > > > > but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:
> > > > > 
> > > > > https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing
> > > > 
> > > > Ok, I forgot the log on s390 is in big endian format. I don't have a
> > > > bigendian machine here, so I can't replay the log to trace it or
> > > > find out what disk address the buffer belongs. I can't even use
> > > > xfs_logprint to dump the log.
> > > > 
> > > > Can you take that metadump, restore it on the s390 machine, and
> > > > trace a mount attempt? i.e in one shell run 'trace-cmd record -e
> > > > xfs\*' and then in another shell run 'mount testfs.img /mnt/test'
> > > 
> > > The 'mount testfs.img /mnt/test' will crash the kernel and reboot
> > > the system directly ...
> > 
> > Turn off panic-on-oops. Some thing like 'echo 0 >
> > /proc/sys/kernel/panic_on_oops' will do that, I think.
> 
> Thanks, it helps. I did below steps:

Thanks!

> 
> # trace-cmd record -e xfs\*

One modification to this:

# trace-cmd record -e xfs\* -e printk

So it captures the console output, too.

> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x5b7000
>     90112 bytes in size
> CPU1 data recorded at offset=0x5cd000
>     57344 bytes in size
> CPU2 data recorded at offset=0x5db000
>     9945088 bytes in size
> CPU3 data recorded at offset=0xf57000
>     786432 bytes in size
> # mount testfs.img /mnt/tmp
> Segmentation fault
> # (Ctrl^C the trace-cmd record process)
....
> 
> # trace-cmd report > testfs.trace.txt
> # bzip2 testfs.trace.txt
> 
> Please download it from:
> https://drive.google.com/file/d/1FgpPidbMZHSjZinyc_WbVGfvwp2btA86/view?usp=sharing

Excellent, but I also need the metadump to go with the trace. My
fault, I should have made that clear.

My initial scan of the trace indicates that there is something
whacky about the buffer that failed:

mount-6449  [002] 180724.335208: xfs_log_recover_buf_reg_buf: dev 7:0 daddr 0x331fb0, bbcount 0x58, flags 0x5000, size 2, map_size 11

It's got a size of 0x58 BBs, or 44kB. That's not a complete
directory buffer, the directory buffer should be 0x80 BBs (64kB) in
size.

I see that buf log format item in the journal over and over again at
that same size and that is how the buffer is initialised and read
from disk during recovery.  So it look slike the buf log item
written to the journal for this directory block is bad in a way I've
never seen before.

At this point I suspect that something has gone wrong at runtime,
maybe to do with logging a compound buffer, but my initial thought
is that this isn't a recovery bug at all. However, I'll need a
matching trace and metadump to confirm that.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx