On Mon, Mar 21, 2022 at 12:23:27PM +1100, Dave Chinner wrote: > Hi folks, > > These two patches are followups to my recent series of > shutdown/recovery fixes. The cluster buffer lock patch addresses a > race condition that started to show up regularly once the fixes in > the previous series were done - it is a regression from the async > inode reclaim work that was done almost 2 years ago now. > > The second patch is something I'm really surprised has taken this > long to uncover. There is a check in intent recovery/cancellation > that checks that there are no intent items in the AIL after the > first non-intent item is found. This behaviour was correct back when > we only had standalone intent items (i.e. EFI/EFD), but when we > started to chain complex operations by intents, the recovery of an > incomplete intent can log and commit new intents and they can end up > in the AIL before log recovery is complete and finished processing > the deferred items. Hence the ASSERT() check that no intents > exist in the AIL after the first non-intent item is simply invalid. > > With these two patches, I'm back to being able to run hundreds of > cycles of g/388 or -g recoveryloop without seeing any failures. Hmm, with this series applied to current for-next + your other log patches + my xfs_reserve_blocks infinite loop fix and cleanup series I see this while running fstests on an all-defaults v5 filesystem. I /think/ it's crashing in the "xfs_buf_ispinned(bp)" right below the "ASSERT(iip->ili_item.li_buf);" in xfs_inode_item_push. [ 1294.636121] run fstests generic/251 at 2022-03-21 16:06:27 [ 1295.168287] XFS (sda): Mounting V5 Filesystem [ 1295.180601] XFS (sda): Ending clean mount [ 1295.471352] XFS (sdf): Mounting V5 Filesystem [ 1295.482082] XFS (sdf): Ending clean mount [ 1295.483612] XFS (sdf): Quotacheck needed: Please wait. [ 1295.487265] XFS (sdf): Quotacheck: Done. [ 1376.010962] XFS (sdf): xlog_verify_grant_tail: space > BBTOB(tail_blocks) [ 1386.006736] XFS: Assertion failed: iip->ili_item.li_buf, file: fs/xfs/xfs_inode_item.c, line: 547 [ 1386.007571] ------------[ cut here ]------------ [ 1386.007991] WARNING: CPU: 0 PID: 419577 at fs/xfs/xfs_message.c:112 assfail+0x3c/0x40 [xfs] [ 1386.008837] Modules linked in: btrfs blake2b_generic xor lzo_compress lzo_decompress zlib_deflate raid6_pq zstd_compress ext4 mbcache jbd2 dm_flakey dm_snapshot dm_bufio dm_zero xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_ip xt_REDIRECT xt_set ip_set_hash_net ip_set_hash_mac ip_set iptable_nat nf_nat nf_conntrack nfnetlink nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bfq pvpanic_mmio pvpanic sch_fq_codel ip_tables x_tables overlay nfsv4 af_packet [ 1386.012759] CPU: 0 PID: 419577 Comm: xfsaild/sdf Not tainted 5.17.0-rc6-djwx #rc6 38f9b0b98565b531708bd3bd34525d3eec280440 [ 1386.013718] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1386.014493] RIP: 0010:assfail+0x3c/0x40 [xfs] [ 1386.015098] Code: 90 9c 45 a0 e8 81 f9 ff ff 8a 1d d3 28 0c 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 60 81 4f a0 e8 b3 4d fd e0 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24 [ 1386.016706] RSP: 0018:ffffc9000530be18 EFLAGS: 00010246 [ 1386.017172] RAX: 0000000000000000 RBX: ffff888063ace300 RCX: 0000000000000000 [ 1386.017801] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffa0449eb0 [ 1386.018435] RBP: ffff88805fa897c0 R08: 0000000000000000 R09: 000000000000000a [ 1386.019059] R10: 000000000000000a R11: f000000000000000 R12: ffff88800ad06f40 [ 1386.019685] R13: ffff88800ad06f68 R14: 00000016000191e9 R15: 0000000000000000 [ 1386.020395] FS: 0000000000000000(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 1386.021172] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1386.021680] CR2: 0000564ba4fc9968 CR3: 000000002852c002 CR4: 00000000001706b0 [ 1386.022313] Call Trace: [ 1386.022555] <TASK> [ 1386.022763] xfs_inode_item_push+0xf4/0x140 [xfs 9a019b85afd13531ec29ae154df8383f289b73eb] [ 1386.023579] xfsaild+0x402/0xc30 [xfs 9a019b85afd13531ec29ae154df8383f289b73eb] [ 1386.024367] ? __set_cpus_allowed_ptr_locked+0xe0/0x1a0 [ 1386.024841] ? preempt_count_add+0x73/0xa0 [ 1386.025255] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs 9a019b85afd13531ec29ae154df8383f289b73eb] [ 1386.026163] kthread+0xea/0x110 [ 1386.026487] ? kthread_complete_and_exit+0x20/0x20 [ 1386.026924] ret_from_fork+0x1f/0x30 [ 1386.027261] </TASK> [ 1386.027475] ---[ end trace 0000000000000000 ]--- [ 1386.027889] BUG: kernel NULL pointer dereference, address: 000000000000017c [ 1386.028498] #PF: supervisor read access in kernel mode [ 1386.028955] #PF: error_code(0x0000) - not-present page [ 1386.029419] PGD 0 P4D 0 [ 1386.029663] Oops: 0000 [#1] PREEMPT SMP [ 1386.030012] CPU: 0 PID: 419577 Comm: xfsaild/sdf Tainted: G W 5.17.0-rc6-djwx #rc6 38f9b0b98565b531708bd3bd34525d3eec280440 [ 1386.031100] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1386.031855] RIP: 0010:xfs_inode_item_push+0x3d/0x140 [xfs] [ 1386.032435] Code: 8b 7f 48 48 89 fb 48 8b af 90 00 00 00 4d 85 ff 0f 84 b1 00 00 00 8b 85 c8 00 00 00 41 bc 01 00 00 00 85 c0 0f 85 87 00 00 00 <41> 8b 87 7c 01 00 00 85 c0 75 7c f6 85 f8 00 00 00 02 75 73 4c 8d [ 1386.034005] RSP: 0018:ffffc9000530be28 EFLAGS: 00010246 [ 1386.034569] RAX: 0000000000000000 RBX: ffff888063ace360 RCX: 0000000000000000 [ 1386.035230] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffa0449eb0 [ 1386.035845] RBP: ffff88805fa897c0 R08: 0000000000000000 R09: 000000000000000a [ 1386.036462] R10: 000000000000000a R11: f000000000000000 R12: 0000000000000001 [ 1386.037069] R13: ffff88800ad06f68 R14: 00000016000191e9 R15: 0000000000000000 [ 1386.037722] FS: 0000000000000000(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 1386.038504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1386.038998] CR2: 000000000000017c CR3: 000000002852c002 CR4: 00000000001706b0 [ 1386.039616] Call Trace: [ 1386.039849] <TASK> [ 1386.040055] xfsaild+0x402/0xc30 [xfs 9a019b85afd13531ec29ae154df8383f289b73eb] [ 1386.040800] ? __set_cpus_allowed_ptr_locked+0xe0/0x1a0 [ 1386.041267] ? preempt_count_add+0x73/0xa0 [ 1386.041634] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs 9a019b85afd13531ec29ae154df8383f289b73eb] [ 1386.042487] kthread+0xea/0x110 [ 1386.042777] ? kthread_complete_and_exit+0x20/0x20 [ 1386.043207] ret_from_fork+0x1f/0x30 [ 1386.043584] </TASK> [ 1386.043795] Modules linked in: btrfs blake2b_generic xor lzo_compress lzo_decompress zlib_deflate raid6_pq zstd_compress ext4 mbcache jbd2 dm_flakey dm_snapshot dm_bufio dm_zero xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_ip xt_REDIRECT xt_set ip_set_hash_net ip_set_hash_mac ip_set iptable_nat nf_nat nf_conntrack nfnetlink nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bfq pvpanic_mmio pvpanic sch_fq_codel ip_tables x_tables overlay nfsv4 af_packet [ 1386.047627] Dumping ftrace buffer: [ 1386.047940] (ftrace buffer empty) [ 1386.048267] CR2: 000000000000017c [ 1386.048569] ---[ end trace 0000000000000000 ]--- [ 1386.049097] RIP: 0010:xfs_inode_item_push+0x3d/0x140 [xfs] [ 1386.049704] Code: 8b 7f 48 48 89 fb 48 8b af 90 00 00 00 4d 85 ff 0f 84 b1 00 00 00 8b 85 c8 00 00 00 41 bc 01 00 00 00 85 c0 0f 85 87 00 00 00 <41> 8b 87 7c 01 00 00 85 c0 75 7c f6 85 f8 00 00 00 02 75 73 4c 8d [ 1386.051276] RSP: 0018:ffffc9000530be28 EFLAGS: 00010246 [ 1386.051732] RAX: 0000000000000000 RBX: ffff888063ace360 RCX: 0000000000000000 [ 1386.052338] RDX: 00000000ffffffc0 RSI: 0000000000000000 RDI: ffffffffa0449eb0 [ 1386.052945] RBP: ffff88805fa897c0 R08: 0000000000000000 R09: 000000000000000a [ 1386.053555] R10: 000000000000000a R11: f000000000000000 R12: 0000000000000001 [ 1386.054228] R13: ffff88800ad06f68 R14: 00000016000191e9 R15: 0000000000000000 [ 1386.054855] FS: 0000000000000000(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 1386.055535] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1386.056034] CR2: 000000000000017c CR3: 000000002852c002 CR4: 00000000001706b0 [ 1386.056653] note: xfsaild/sdf[419577] exited with preempt_count 1 --D