Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 22, 2024 at 10:21:07PM +1100, Dave Chinner wrote:
> On Mon, Jan 22, 2024 at 03:23:12PM +0800, Zorro Lang wrote:
> > On Sun, Jan 21, 2024 at 10:58:49AM +1100, Dave Chinner wrote:
> > > On Sat, Jan 20, 2024 at 07:26:00PM +0800, Zorro Lang wrote:
> > > > On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> > > > > Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> > > > > fixed it? Nothing in the for-next branch really looks relevant to
> > > > > the problem to me....
> > > > 
> > > > Hi Dave,
> > > > 
> > > > Finally, I got a chance to reproduce this issue on latest upstream mainline
> > > > linux (HEAD=9d64bf433c53) (and linux-xfs) again.
> > > > 
> > > > Looks like some userspace updates hide the issue, but I haven't found out what
> > > > change does that, due to it's a big change about a whole system version. I
> > > > reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
> > > > (I'll try to find out what changes cause that later if it's necessary)
> > > > 
> > > > Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
> > > > you suggested. And got the xfs metadump file after it crashed [1] and rebooted.
> > > > 
> > > > Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
> > > > but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:
> > > > 
> > > > https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing
> > > 
> > > Ok, I forgot the log on s390 is in big endian format. I don't have a
> > > bigendian machine here, so I can't replay the log to trace it or
> > > find out what disk address the buffer belongs. I can't even use
> > > xfs_logprint to dump the log.
> > > 
> > > Can you take that metadump, restore it on the s390 machine, and
> > > trace a mount attempt? i.e in one shell run 'trace-cmd record -e
> > > xfs\*' and then in another shell run 'mount testfs.img /mnt/test'
> > 
> > The 'mount testfs.img /mnt/test' will crash the kernel and reboot
> > the system directly ...
> 
> Turn off panic-on-oops. Some thing like 'echo 0 >
> /proc/sys/kernel/panic_on_oops' will do that, I think.

Thanks, it helps. I did below steps:

# trace-cmd record -e xfs\*
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x5b7000
    90112 bytes in size
CPU1 data recorded at offset=0x5cd000
    57344 bytes in size
CPU2 data recorded at offset=0x5db000
    9945088 bytes in size
CPU3 data recorded at offset=0xf57000
    786432 bytes in size
# mount testfs.img /mnt/tmp
Segmentation fault
# (Ctrl^C the trace-cmd record process)
# dmesg
[180724.293443] loop: module loaded
[180724.294001] loop0: detected capacity change from 0 to 6876344
[180724.296987] XFS (loop0): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
[180724.309088] XFS (loop0): Starting recovery (logdev: internal)
[180724.335207] XFS (loop0): Bad dir block magic!
[180724.335210] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
[180724.335264] ------------[ cut here ]------------
[180724.335265] kernel BUG at fs/xfs/xfs_message.c:102!
[180724.335356] monitor event: 0040 ilc:2 [#1] SMP 
[180724.335362] Modules linked in: loop sunrpc rfkill vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio iommufd drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng virt
io_net des_s390 sha3_512_s390 net_failover sha3_256_s390 failover virtio_blk dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
[180724.335379] CPU: 2 PID: 6449 Comm: mount Kdump: loaded Not tainted 6.7.0+ #1
[180724.335382] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[180724.335384] Krnl PSW : 0704e00180000000 000003ff7fe692ca (assfail+0x62/0x68 [xfs])
[180724.335727]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[180724.335729] Krnl GPRS: c000000000000021 000003ff7fee3a40 ffffffffffffffea 000000000000000a
[180724.335731]            000003800005b3c0 0000000000000000 000003ff800004c4 0000000300014178
[180724.335732]            0000000090a87e80 0000000300014178 000000008bcf6000 00000000924a5000
[180724.335734]            000003ffbe72ef68 000003ff7ffe4c20 000003ff7fe692a8 000003800005b468
[180724.335742] Krnl Code: 000003ff7fe692bc: f0a8000407fe       srp     4(11,%r0),2046,8
                           000003ff7fe692c2: 47000700           bc      0,1792
                          #000003ff7fe692c6: af000000           mc      0,0
                          >000003ff7fe692ca: 0707               bcr     0,%r7
                           000003ff7fe692cc: 0707               bcr     0,%r7
                           000003ff7fe692ce: 0707               bcr     0,%r7
                           000003ff7fe692d0: c00400133e0c       brcl    0,000003ff800d0ee8
                           000003ff7fe692d6: eb6ff0480024       stmg    %r6,%r15,72(%r15)
[180724.335753] Call Trace:
[180724.335754]  [<000003ff7fe692ca>] assfail+0x62/0x68 [xfs] 
[180724.335835] ([<000003ff7fe692a8>] assfail+0x40/0x68 [xfs])
[180724.335915]  [<000003ff7fe8323e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
[180724.335997]  [<000003ff7fe845ba>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
[180724.336078]  [<000003ff7fe8e89a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
[180724.336159]  [<000003ff7fe8f7ce>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
[180724.336240]  [<000003ff7fe8f930>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
[180724.336321]  [<000003ff7fe8f9f8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
[180724.336402]  [<000003ff7fe9002e>] xlog_recover_process_data+0xb6/0x168 [xfs] 
[180724.336482]  [<000003ff7fe901e4>] xlog_recover_process+0x104/0x150 [xfs] 
[180724.336563]  [<000003ff7fe905e2>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] 
[180724.336643]  [<000003ff7fe90dd0>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
[180724.336727]  [<000003ff7fe90e6c>] xlog_do_recover+0x4c/0x218 [xfs] 
[180724.336808]  [<000003ff7fe9247a>] xlog_recover+0xda/0x1a0 [xfs] 
[180724.336888]  [<000003ff7fe78d36>] xfs_log_mount+0x11e/0x280 [xfs] 
[180724.336967]  [<000003ff7fe6a756>] xfs_mountfs+0x3e6/0x920 [xfs] 
[180724.337047]  [<000003ff7fe71ffc>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
[180724.337127]  [<00000000552adf88>] get_tree_bdev+0x120/0x1a8 
[180724.337142]  [<00000000552ab690>] vfs_get_tree+0x38/0x110 
[180724.337146]  [<00000000552dee28>] do_new_mount+0x188/0x2e0 
[180724.337150]  [<00000000552dfaa4>] path_mount+0x1ac/0x818 
[180724.337153]  [<00000000552e0214>] __s390x_sys_mount+0x104/0x148 
[180724.337156]  [<0000000055934796>] __do_syscall+0x21e/0x2b0 
[180724.337163]  [<0000000055944d60>] system_call+0x70/0x98 
[180724.337170] Last Breaking-Event-Address:
[180724.337221]  [<000003ff7fe692b2>] assfail+0x4a/0x68 [xfs]
[180724.337301] ---[ end trace 0000000000000000 ]---

# trace-cmd report > testfs.trace.txt
# bzip2 testfs.trace.txt

Please download it from:
https://drive.google.com/file/d/1FgpPidbMZHSjZinyc_WbVGfvwp2btA86/view?usp=sharing

Hope it's gotten what you need :)

Thanks,
Zorro

> 
> 
> > > and then after the assert fail terminate the tracing and run
> > > 'trace-cmd report > testfs.trace.txt'?
> > 
> > ... Can I still get the trace report after rebooting?
> 
> Not that I know of. But, then again, I don't reboot test machines
> when an oops or assert fail occurs - I like to have a warm corpse
> left behind that I can poke around in with various blunt instruments
> to see what went wrong....
> 
> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux