On Wed, May 17, 2023 at 10:04:49AM +1000, Dave Chinner wrote: > Lock order in XFS is AGI -> AGF, hence for operations involving > inode unlinked list operations we always lock the AGI first. Inode > unlinked list operations operate on the inode cluster buffer, > so the lock order there is AGI -> inode cluster buffer. Hi Dave, This commit reliably produces an assertion failure for me. I haven't tried to analyse why. It's pretty clear though; I can run generic/426 in a loop for hundreds of seconds on the parent commit (cb042117488d), but it'll die within 30 seconds on commit 82842fee6e59. export MKFS_OPTIONS="-m reflink=1,rmapbt=1 -i sparse=1 -b size=1024" I suspect the size=1024 is the important thing, but I haven't tested that hypothesis. This is on an x86-64 virtual machine; full qemu command line at the end [1] 00028 FSTYP -- xfs (debug) 00028 PLATFORM -- Linux/x86_64 pepe-kvm 6.4.0-rc5-00004-g82842fee6e59 #182 SMP PREEMPT_DYNAMIC Sat Jun 24 22:51:32 EDT 2023 00028 MKFS_OPTIONS -- -f -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024 /dev/sdc 00028 MOUNT_OPTIONS -- /dev/sdc /mnt/scratch 00028 00028 XFS (sdc): Mounting V5 Filesystem 591c2048-7cce-4eda-acf7-649e19cd8554 00028 XFS (sdc): Ending clean mount 00028 XFS (sdc): Unmounting Filesystem 591c2048-7cce-4eda-acf7-649e19cd8554 00028 XFS (sdb): EXPERIMENTAL online scrub feature in use. Use at your own risk! 00028 XFS (sdb): Unmounting Filesystem 9db9e0a2-c05b-4690-a938-ae8f7b70be8e 00028 XFS (sdb): Mounting V5 Filesystem 9db9e0a2-c05b-4690-a938-ae8f7b70be8e 00028 XFS (sdb): Ending clean mount 00028 generic/426 run fstests generic/426 at 2023-06-25 02:52:07 00029 XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241 00029 ------------[ cut here ]------------ 00029 kernel BUG at fs/xfs/xfs_message.c:102! 00029 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI 00029 CPU: 1 PID: 62 Comm: kworker/1:1 Kdump: loaded Not tainted 6.4.0-rc5-00004-g82842fee6e59 #182 00029 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 00029 Workqueue: xfs-inodegc/sdb xfs_inodegc_worker 00029 RIP: 0010:assfail+0x30/0x40 00029 Code: c9 48 c7 c2 48 f8 ea 81 48 89 f1 48 89 e5 48 89 fe 48 c7 c7 b9 cc e5 81 e8 fd fd ff ff 80 3d f6 2f d3 00 00 75 04 0f 0b 5d c3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 63 f6 49 89 00029 RSP: 0018:ffff88800317bc78 EFLAGS: 00010202 00029 RAX: 00000000ffffffea RBX: ffff88800611e000 RCX: 000000007fffffff 00029 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff81e5ccb9 00029 RBP: ffff88800317bc78 R08: 0000000000000000 R09: 000000000000000a 00029 R10: 000000000000000a R11: 0fffffffffffffff R12: ffff88800c780800 00029 R13: ffff88800317bce0 R14: 0000000000000001 R15: ffff88800c73d000 00029 FS: 0000000000000000(0000) GS:ffff88807d840000(0000) knlGS:0000000000000000 00029 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 00029 CR2: 00005623b1911068 CR3: 000000000ee28003 CR4: 0000000000770ea0 00029 PKRU: 55555554 00029 Call Trace: 00029 <TASK> 00029 ? show_regs+0x5c/0x70 00029 ? die+0x32/0x90 00029 ? do_trap+0xbb/0xe0 00029 ? do_error_trap+0x67/0x90 00029 ? assfail+0x30/0x40 00029 ? exc_invalid_op+0x52/0x70 00029 ? assfail+0x30/0x40 00029 ? asm_exc_invalid_op+0x1b/0x20 00029 ? assfail+0x30/0x40 00029 ? assfail+0x23/0x40 00029 xfs_trans_read_buf_map+0x2d9/0x480 00029 xfs_imap_to_bp+0x3d/0x40 00029 xfs_inode_item_precommit+0x176/0x200 00029 xfs_trans_run_precommits+0x65/0xc0 00029 __xfs_trans_commit+0x3d/0x360 00029 xfs_trans_commit+0xb/0x10 00029 xfs_inactive_ifree.isra.0+0xea/0x200 00029 xfs_inactive+0x132/0x230 00029 xfs_inodegc_worker+0xb6/0x1a0 00029 process_one_work+0x1a9/0x3a0 00029 worker_thread+0x4e/0x3a0 00029 ? process_one_work+0x3a0/0x3a0 00029 kthread+0xf9/0x130 In case things have moved around since that commit, the particular line throwing the assertion is in this paragraph: if (bp) { ASSERT(xfs_buf_islocked(bp)); ASSERT(bp->b_transp == tp); ASSERT(bp->b_log_item != NULL); ASSERT(!bp->b_error); ASSERT(bp->b_flags & XBF_DONE); It's the last one that trips. Sorry for not catching this earlier; my test suite experienced a bit of a failure and I only just got around to fixing it enough to run all the way through. [1] qemu-system-x86_64 -nodefaults -nographic -cpu host -machine type=q35,accel=kvm,nvdimm=on -m 2G,slots=8,maxmem=256G -smp 8 -kernel /home/willy/kernel/linux-next/.build_test_kernel-x86_64/kpgk/vmlinuz -append mitigations=off console=hvc0 root=/dev/sda rw log_buf_len=8M ktest.dir=/home/willy/kernel/ktest ktest.env=/tmp/build-test-kernel-FzOfFCHDVD/env crashkernel=128M no_console_suspend page_owner=on -device virtio-serial -chardev stdio,id=console -device virtconsole,chardev=console -serial unix:/tmp/build-test-kernel-FzOfFCHDVD/vm-kgdb,server,nowait -monitor unix:/tmp/build-test-kernel-FzOfFCHDVD/vm-mon,server,nowait -gdb unix:/tmp/build-test-kernel-FzOfFCHDVD/vm-gdb,server,nowait -device virtio-rng-pci -virtfs local,path=/,mount_tag=host,security_model=none -device virtio-scsi-pci,id=hba -nic user,model=virtio,hostfwd=tcp:127.0.0.1:28201-:22 -drive if=none,format=raw,id=disk0,file=/var/lib/ktest/root.amd64,snapshot=on -device scsi-hd,bus=hba.0,drive=disk0 -drive if=none,format=raw,id=disk1,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-1,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk1 -drive if=none,format=raw,id=disk2,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-2,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk2 -drive if=none,format=raw,id=disk3,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-3,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk3 -drive if=none,format=raw,id=disk4,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-4,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk4