> On 13-Oct-2023, at 2:49 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Thu, Oct 12, 2023 at 03:51:04PM +0530, Sachin Sant wrote: >> While running xfstests on a IBM Power10 server having xfs file system with >> 4k block size following crash was seen >> >> [ 1609.771548] run fstests xfs/238 at 2023-10-11 17:00:40 >> [ 1609.971214] XFS (sdb1): Mounting V5 Filesystem 1105d60c-9514-4e7a-af6a-632d99bf06ea >> [ 1609.980693] XFS (sdb1): Ending clean mount >> [ 1609.982166] xfs filesystem being mounted at /mnt/test supports timestamps until 2038-01-19 (0x7fffffff) >> [ 1610.024793] XFS (sdb2): Mounting V5 Filesystem 60de8f0c-c80e-4a2a-b60a-f41a9cc0feca >> [ 1610.030295] XFS (sdb2): Ending clean mount >> [ 1610.031342] xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038-01-19 (0x7fffffff) >> [ 1610.087139] XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241 >> [ 1610.087162] ------------[ cut here ]------------ >> [ 1610.087165] kernel BUG at fs/xfs/xfs_message.c:102! >> [ 1610.087168] Oops: Exception in kernel mode, sig: 5 [#1] >> [ 1610.087171] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries >> [ 1610.087175] Modules linked in: overlay dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio loop dm_flakey xfs dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill ip_set tls nf_tables libcrc32c nfnetlink pseries_rng vmx_crypto ext4 mbcache jbd2 sd_mod t10_pi crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth fuse [last unloaded: scsi_debug] >> [ 1610.087220] CPU: 11 PID: 225897 Comm: kworker/11:46 Not tainted 6.6.0-rc5-gb8b05bc6d83c #1 >> [ 1610.087224] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries >> [ 1610.087227] Workqueue: xfs-inodegc/sdb2 xfs_inodegc_worker [xfs] >> [ 1610.087303] NIP: c008000002857edc LR: c008000002857ec8 CTR: 000000007fffffff >> [ 1610.087306] REGS: c00000002441b810 TRAP: 0700 Not tainted (6.6.0-rc5-gb8b05bc6d83c) >> [ 1610.087309] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000424 XER: 00000007 >> [ 1610.087318] CFAR: c008000002857d1c IRQMASK: 0 >> [ 1610.087318] GPR00: c008000002857ec8 c00000002441bab0 c008000002a68300 ffffffffffffffea >> [ 1610.087318] GPR04: 000000000000000a c00000002441b9b0 0000000000000000 c0080000016c6c40 >> [ 1610.087318] GPR08: ffffffffffffffc0 0000000000000001 0000000000000000 c00800000285a3a8 >> [ 1610.087318] GPR12: c0000000008311d0 c00000117fff1b00 c000000000197de8 c00000000936bf40 >> [ 1610.087318] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000009d16d48b0 >> [ 1610.087318] GPR20: c000000079e80205 c00000001cb52f80 c00000001cb52fc0 0000000080000000 >> [ 1610.087318] GPR24: 0000000000000001 c00000002441bbc8 c0000000e77a7000 c0000000209b7e00 >> [ 1610.087318] GPR28: c00800000279ae48 c0080000016b25f0 c00000002441bc10 c00000002dabaee8 >> [ 1610.087354] NIP [c008000002857edc] assfail+0x54/0x74 [xfs] >> [ 1610.087420] LR [c008000002857ec8] assfail+0x40/0x74 [xfs] >> [ 1610.087485] Call Trace: >> [ 1610.087486] [c00000002441bab0] [c008000002857ec8] assfail+0x40/0x74 [xfs] (unreliable) >> [ 1610.087551] [c00000002441bb10] [c00800000281cebc] xfs_trans_read_buf_map+0x384/0x590 [xfs] >> [ 1610.087622] [c00000002441bba0] [c00800000279ae48] xfs_imap_to_bp+0x70/0xa8 [xfs] >> [ 1610.087691] [c00000002441bbf0] [c00800000280c3ec] xfs_inode_item_precommit+0x244/0x320 [xfs] >> [ 1610.087760] [c00000002441bc60] [c0080000027f3034] xfs_trans_run_precommits+0xac/0x160 [xfs] >> [ 1610.087830] [c00000002441bcb0] [c0080000027f45b0] __xfs_trans_commit+0x68/0x430 [xfs] >> [ 1610.087900] [c00000002441bd20] [c0080000027dfc30] xfs_inactive_ifree+0x158/0x2a0 [xfs] >> [ 1610.087969] [c00000002441bdb0] [c0080000027dff84] xfs_inactive+0x20c/0x420 [xfs] >> [ 1610.088037] [c00000002441bdf0] [c0080000027ceb90] xfs_inodegc_worker+0x148/0x250 [xfs] >> [ 1610.088106] [c00000002441be40] [c000000000188130] process_scheduled_works+0x230/0x4f0 >> [ 1610.088113] [c00000002441bf10] [c00000000018b164] worker_thread+0x1e4/0x500 >> [ 1610.088118] [c00000002441bf90] [c000000000197f18] kthread+0x138/0x140 >> [ 1610.088122] [c00000002441bfe0] [c00000000000df98] start_kernel_thread+0x14/0x18 >> [ 1610.088127] Code: e8a5ca38 7c641b78 3c620000 e863ca48 f8010010 f821ffa1 4bfffd91 3d220000 e929ca50 89290010 2f890000 419e0008 <0fe00000> 0fe00000 38210060 e8010010 >> [ 1610.088140] ---[ end trace 0000000000000000 ]--- >> [ 1610.093928] sh (1070303): drop_caches: 3 >> [ 1610.095600] pstore: backend (nvram) writing error (-1) >> [ 1610.095605] >> >> xfs/238 test was executed when the crash was encountered. >> Devices were formatted with 4k block size. > > Yeah, I've seen this once before, I think I know what the problem is > from analysis of that failure, but I've been unable to reproduce it > again so I've not been able to confirm the diagnosis nor test a fix. > Yeah, the problem is not seen consistently. In the last week I have seen this twice. Interesting aspect is the problem is only seen with powerpc/merge branch. Same test run on mainline or linux-next does not reproduce this problem. > tl;dr: we just unlinked an inode whose cluster buffer has been > invalidated by xfs_icluster_free(). We go to log the inode, but this > is the first time we've logged the inode since it was last cleaned, > so it goes to read the cluster buffer to attach it. It finds the > cluster buffer already marked stale in the transaction, so the DONE > flag is not set and the ASSERT fires. > > i.e. it appears to me that this requires inode cluster buffer > writeback between the unlink() operation and the inodegc > inactivation process to set the initial conditions for the problem > to trigger, and then have just a single inode in the inobt chunk > that triggers freeing of the chunk whilst the inode itself is clean. > > I need to confirm that this is the case before trying to fix it, > because this inode log item vs stale inode cluster buffer path is > tricky and nasty and there might be something else going on. > However, I haven't been able to reproduce this to be able to confirm > this hypothesis yet. > > I suspect the fix may well be to use xfs_trans_buf_get() in the > xfs_inode_item_precommit() path if XFS_ISTALE is already set on the > inode we are trying to log. We don't need a populated cluster buffer > to read data out of or write data into in this path - all we need to > do is attach the inode to the buffer so that when the buffer > invalidation is committed to the journal it will also correctly > finish the stale inode log item. > Thanks for the analysis. - Sachin