Re: [PATCH 4/4] xfs: fix AGF vs inode cluster buffer deadlock

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sun, 25 Jun 2023 03:58:15 +0100

On Wed, May 17, 2023 at 10:04:49AM +1000, Dave Chinner wrote:
> Lock order in XFS is AGI -> AGF, hence for operations involving
> inode unlinked list operations we always lock the AGI first. Inode
> unlinked list operations operate on the inode cluster buffer,
> so the lock order there is AGI -> inode cluster buffer.

Hi Dave,

This commit reliably produces an assertion failure for me.  I haven't
tried to analyse why.  It's pretty clear though; I can run generic/426
in a loop for hundreds of seconds on the parent commit (cb042117488d),
but it'll die within 30 seconds on commit 82842fee6e59.

    export MKFS_OPTIONS="-m reflink=1,rmapbt=1 -i sparse=1 -b size=1024"

I suspect the size=1024 is the important thing, but I haven't tested
that hypothesis.  This is on an x86-64 virtual machine; full qemu
command line at the end [1]

00028 FSTYP         -- xfs (debug)
00028 PLATFORM      -- Linux/x86_64 pepe-kvm 6.4.0-rc5-00004-g82842fee6e59 #182 SMP PREEMPT_DYNAMIC Sat Jun 24 22:51:32 EDT 2023
00028 MKFS_OPTIONS  -- -f -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024 /dev/sdc
00028 MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
00028
00028 XFS (sdc): Mounting V5 Filesystem 591c2048-7cce-4eda-acf7-649e19cd8554
00028 XFS (sdc): Ending clean mount
00028 XFS (sdc): Unmounting Filesystem 591c2048-7cce-4eda-acf7-649e19cd8554
00028 XFS (sdb): EXPERIMENTAL online scrub feature in use. Use at your own risk!
00028 XFS (sdb): Unmounting Filesystem 9db9e0a2-c05b-4690-a938-ae8f7b70be8e
00028 XFS (sdb): Mounting V5 Filesystem 9db9e0a2-c05b-4690-a938-ae8f7b70be8e
00028 XFS (sdb): Ending clean mount
00028 generic/426       run fstests generic/426 at 2023-06-25 02:52:07
00029 XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241
00029 ------------[ cut here ]------------
00029 kernel BUG at fs/xfs/xfs_message.c:102!
00029 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
00029 CPU: 1 PID: 62 Comm: kworker/1:1 Kdump: loaded Not tainted 6.4.0-rc5-00004-g82842fee6e59 #182
00029 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
00029 Workqueue: xfs-inodegc/sdb xfs_inodegc_worker
00029 RIP: 0010:assfail+0x30/0x40
00029 Code: c9 48 c7 c2 48 f8 ea 81 48 89 f1 48 89 e5 48 89 fe 48 c7 c7 b9 cc e5 81 e8 fd fd ff ff 80 3d f6 2f d3 00 00 75 04 0f 0b 5d c3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 63 f6 49 89
00029 RSP: 0018:ffff88800317bc78 EFLAGS: 00010202
00029 RAX: 00000000ffffffea RBX: ffff88800611e000 RCX: 000000007fffffff
00029 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff81e5ccb9
00029 RBP: ffff88800317bc78 R08: 0000000000000000 R09: 000000000000000a
00029 R10: 000000000000000a R11: 0fffffffffffffff R12: ffff88800c780800
00029 R13: ffff88800317bce0 R14: 0000000000000001 R15: ffff88800c73d000
00029 FS:  0000000000000000(0000) GS:ffff88807d840000(0000) knlGS:0000000000000000
00029 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
00029 CR2: 00005623b1911068 CR3: 000000000ee28003 CR4: 0000000000770ea0
00029 PKRU: 55555554
00029 Call Trace:
00029  <TASK>
00029  ? show_regs+0x5c/0x70
00029  ? die+0x32/0x90
00029  ? do_trap+0xbb/0xe0
00029  ? do_error_trap+0x67/0x90
00029  ? assfail+0x30/0x40
00029  ? exc_invalid_op+0x52/0x70
00029  ? assfail+0x30/0x40
00029  ? asm_exc_invalid_op+0x1b/0x20
00029  ? assfail+0x30/0x40
00029  ? assfail+0x23/0x40
00029  xfs_trans_read_buf_map+0x2d9/0x480
00029  xfs_imap_to_bp+0x3d/0x40
00029  xfs_inode_item_precommit+0x176/0x200
00029  xfs_trans_run_precommits+0x65/0xc0
00029  __xfs_trans_commit+0x3d/0x360
00029  xfs_trans_commit+0xb/0x10
00029  xfs_inactive_ifree.isra.0+0xea/0x200
00029  xfs_inactive+0x132/0x230
00029  xfs_inodegc_worker+0xb6/0x1a0
00029  process_one_work+0x1a9/0x3a0
00029  worker_thread+0x4e/0x3a0
00029  ? process_one_work+0x3a0/0x3a0
00029  kthread+0xf9/0x130

In case things have moved around since that commit, the particular line
throwing the assertion is in this paragraph:

        if (bp) {
                ASSERT(xfs_buf_islocked(bp));
                ASSERT(bp->b_transp == tp);
                ASSERT(bp->b_log_item != NULL);
                ASSERT(!bp->b_error);
                ASSERT(bp->b_flags & XBF_DONE);

It's the last one that trips.  Sorry for not catching this earlier; my
test suite experienced a bit of a failure and I only just got around to
fixing it enough to run all the way through.

[1] qemu-system-x86_64 -nodefaults -nographic -cpu host -machine type=q35,accel=kvm,nvdimm=on -m 2G,slots=8,maxmem=256G -smp 8 -kernel /home/willy/kernel/linux-next/.build_test_kernel-x86_64/kpgk/vmlinuz -append mitigations=off console=hvc0 root=/dev/sda rw log_buf_len=8M ktest.dir=/home/willy/kernel/ktest ktest.env=/tmp/build-test-kernel-FzOfFCHDVD/env crashkernel=128M no_console_suspend page_owner=on -device virtio-serial -chardev stdio,id=console -device virtconsole,chardev=console -serial unix:/tmp/build-test-kernel-FzOfFCHDVD/vm-kgdb,server,nowait -monitor unix:/tmp/build-test-kernel-FzOfFCHDVD/vm-mon,server,nowait -gdb unix:/tmp/build-test-kernel-FzOfFCHDVD/vm-gdb,server,nowait -device virtio-rng-pci -virtfs local,path=/,mount_tag=host,security_model=none -device virtio-scsi-pci,id=hba -nic user,model=virtio,hostfwd=tcp:127.0.0.1:28201-:22 -drive if=none,format=raw,id=disk0,file=/var/lib/ktest/root.amd64,snapshot=on -device scsi-hd,bus=hba.0,drive=disk0 -drive if=none,format=raw,id=disk1,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-1,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk1 -drive if=none,format=raw,id=disk2,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-2,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk2 -drive if=none,format=raw,id=disk3,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-3,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk3 -drive if=none,format=raw,id=disk4,file=/tmp/build-test-kernel-FzOfFCHDVD/dev-4,cache=unsafe -device scsi-hd,bus=hba.0,drive=disk4