"Darrick J. Wong" <djwong@xxxxxxxxxx> writes: Hi Darrick, Thanks for your analysis and quick help on this. >> >> Hi Darrick, >> >> Please find the information collected from the system. We added some >> debug logs and looks like it is exactly what is happening which you >> pointed out. >> >> We added a debug kernel patch to get more info from the system which >> you had requested [1] >> >> 1. We first breaked into emergency shell where root fs is first getting >> mounted on /sysroot as "ro" filesystem. Here are the logs. >> >> [ OK ] Started File System Check on /dev/mapper/rhel_ltcden3--lp1-root. >> Mounting /sysroot... >> [ 7.203990] SGI XFS with ACLs, security attributes, quota, no debug enabled >> [ 7.205835] XFS (dm-0): Mounting V5 Filesystem 7b801289-75a7-4d39-8cd3-24526e9e9da7 >> [ ***] A start job is running for /sysroot (15s / 1min 35s)[ 17.439377] XFS (dm-0): Starting recovery (logdev: internal) >> [ *** ] A start job is running for /sysroot (16s / 1min 35s)[ 17.771158] xfs_log_mount_finish: Recovery needed is set >> [ 17.771172] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:0 >> [ 17.771179] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:1 >> [ 17.771184] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:2 >> [ 17.771190] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:3 >> [ 17.771196] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:4 >> [ 17.771201] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:5 >> [ 17.801033] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:6 >> [ 17.801041] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:7 >> [ 17.801046] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:8 >> [ 17.801052] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:9 >> [ 17.801057] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:10 >> [ 17.801063] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:11 >> [ 17.801068] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:12 >> [ 17.801272] xlog_recover_iunlink_bucket: bucket: 13, agino: 3064909, ino: 3064909, iget ret: 0, previno:18446744073709551615, prev_agino:4294967295 >> >> <previno, prev_agino> is just <-1 %ull and -1 %u> in above. That's why >> the huge value. > > Ok, so log recovery finds 3064909 and clears it... > >> [ 17.801281] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:13 >> [ 17.801287] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:14 > > <snip the rest of these...> > >> [ 17.844910] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:3, bucket:62 >> [ 17.844916] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:3, bucket:63 >> [ 17.886079] XFS (dm-0): Ending recovery (logdev: internal) >> [ OK ] Mounted /sysroot. >> [ OK ] Reached target Initrd Root File System. >> >> >> 2. Then these are the logs from xfs_repair -n /dev/dm-0 >> Here you will notice the same agi 3064909 in bucket 13 (from phase-2) which got also >> printed in above xlog_recover_iunlink_ag() function. >> >> switch_root:/# xfs_repair -n /dev/dm-0 >> Phase 1 - find and verify superblock... >> Phase 2 - using internal log >> - zero log... >> - scan filesystem freespace and inode maps... >> agi unlinked bucket 13 is 3064909 in ag 0 (inode=3064909) > > ...yet here we find that 3064909 is still on the unlinked list? > > Just to confirm -- you ran xfs_repair -n after the successful recovery > above, right? > Yes, that's right. >> - found root inode chunk >> Phase 3 - for each AG... >> - scan (but don't clear) agi unlinked lists... >> - process known inodes and perform inode discovery... >> - agno = 0 >> - agno = 1 >> - agno = 2 >> - agno = 3 >> - process newly discovered inodes... >> Phase 4 - check for duplicate blocks... >> - setting up duplicate extent list... >> - check for inodes claiming duplicate blocks... >> - agno = 0 >> - agno = 2 >> - agno = 1 >> - agno = 3 >> No modify flag set, skipping phase 5 >> Phase 6 - check inode connectivity... >> - traversing filesystem ... >> - traversal finished ... >> - moving disconnected inodes to lost+found ... >> Phase 7 - verify link counts... >> would have reset inode 3064909 nlinks from 4294967291 to 2 > > Oh now that's interesting. Inode on unlinked list with grossly nonzero > (but probably underflowed) link count. That might explain why iunlink > recovery ignores the inode. Is inode 3064909 reachable via the > directory tree? > > Would you mind sending me a metadump to play with? metadump -ago would > be best, if filenames/xattrnames aren't sensitive customer data. Sorry about the delay. I am checking for any permissions part internally. Meanwhile - I can help out if you would like me to try anything. >> No modify flag set, skipping filesystem flush and exiting. >> >> >> 3. Then we exit from the shell for the system to continue booting. >> Here it will continue.. Just pasting the logs where the warning gets >> generated and some extra logs are getting printed for the same inode >> with our patch. >> >> >> it continues >> ================ >> [ 587.999113] ------------[ cut here ]------------ >> [ 587.999121] WARNING: CPU: 48 PID: 2026 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs] >> [ 587.999185] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink rfkill sunrpc xts pseries_rng vmx_crypto xfs libcrc32c sd_mod sg ibmvscsi ibmveth scsi_transport_srp nvme nvme_core t10_pi crc64_rocksoft crc64 dm_mirror dm_region_hash dm_log dm_mod >> [ 587.999215] CPU: 48 PID: 2026 Comm: in:imjournal Not tainted 6.2.0-rc8ssh+ #38 >> [ 587.999219] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1010.22 (NH1010_122) hv:phyp pSeries >> [ 587.999222] NIP: c00800000065fa80 LR: c00800000065fa4c CTR: c000000000ea4d40 >> [ 587.999226] REGS: c00000001aa83650 TRAP: 0700 Not tainted (6.2.0-rc8ssh+) >> [ 587.999228] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24224842 XER: 00000000 >> [ 587.999236] CFAR: c00800000065fa54 IRQMASK: 0 >> [ 587.999236] GPR00: c00000004570b2c8 c00000001aa838f0 c008000000708300 0000000000000000 >> [ 587.999236] GPR04: 00000000002ec44d 0000000000000000 0000000000000000 c00000000413faf0 >> [ 587.999236] GPR08: 0000000000000000 c00000000413fba0 0000000000000000 fffffffffffffffd >> [ 587.999236] GPR12: 0000000000000040 c000004afeccd880 0000000000000000 0000000004000000 >> [ 587.999236] GPR16: c00000001aa83b38 c00000001aa83a38 c00000001aa83a68 c000000035264c00 >> [ 587.999236] GPR20: c00000004570b200 0000000000008000 c00000000886c400 00000000002ec44d >> [ 587.999236] GPR24: 000000000014040d 000000000000000d c000000051875400 00000000002ec44d >> [ 587.999236] GPR28: c000000035262c00 c00000004570b200 c00000008f2d8b90 000000000014040d >> [ 587.999272] NIP [c00800000065fa80] xfs_iunlink_lookup+0x58/0x80 [xfs] >> [ 587.999327] LR [c00800000065fa4c] xfs_iunlink_lookup+0x24/0x80 [xfs] >> [ 587.999379] Call Trace: >> [ 587.999381] [c00000001aa838f0] [c00000008f2d8b90] 0xc00000008f2d8b90 (unreliable) >> [ 587.999385] [c00000001aa83910] [c008000000660094] xfs_iunlink+0x1bc/0x2c0 [xfs] >> [ 587.999438] [c00000001aa839d0] [c008000000664804] xfs_rename+0x69c/0xd10 [xfs] >> [ 587.999491] [c00000001aa83b10] [c00800000065e020] xfs_vn_rename+0xf8/0x1f0 [xfs] >> [ 587.999544] [c00000001aa83ba0] [c000000000579efc] vfs_rename+0x9bc/0xdf0 >> [ 587.999549] [c00000001aa83c90] [c00000000058018c] do_renameat2+0x3dc/0x5c0 >> [ 587.999553] [c00000001aa83de0] [c000000000580520] sys_rename+0x60/0x80 >> [ 587.999557] [c00000001aa83e10] [c000000000033630] system_call_exception+0x150/0x3b0 >> [ 587.999562] [c00000001aa83e50] [c00000000000c554] system_call_common+0xf4/0x258 >> [ 587.999567] --- interrupt: c00 at 0x7fff96082e20 >> [ 587.999569] NIP: 00007fff96082e20 LR: 00007fff95c45e24 CTR: 0000000000000000 >> [ 587.999572] REGS: c00000001aa83e80 TRAP: 0c00 Not tainted (6.2.0-rc8ssh+) >> [ 587.999575] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 2a082202 XER: 00000000 >> [ 587.999584] IRQMASK: 0 >> [ 587.999584] GPR00: 0000000000000026 00007fff94f5d220 00007fff96207300 00007fff94f5d288 >> [ 587.999584] GPR04: 0000000172b76b70 0000000000000000 0600000000000000 0000000000000002 >> [ 587.999584] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 587.999584] GPR12: 0000000000000000 00007fff94f666e0 0000000172b65930 0000000000000000 >> [ 587.999584] GPR16: 0000000132fe7648 00007fff95c47ce0 0000000000000000 000000000000004c >> [ 587.999584] GPR20: 0000000000000082 00007fff94f5e3a0 00007fff94f5e398 00007fff880029b0 >> [ 587.999584] GPR24: 0000000000000000 00007fff94f5e388 00007fff94f5e378 00007fff94f5e3b0 >> [ 587.999584] GPR28: 00007fff95c60000 00007fff95c605e0 00007fff88000c10 00007fff94f5d288 >> [ 587.999618] NIP [00007fff96082e20] 0x7fff96082e20 >> [ 587.999620] LR [00007fff95c45e24] 0x7fff95c45e24 >> [ 587.999622] --- interrupt: c00 >> [ 587.999624] Code: 2c230000 4182002c e9230020 2fa90000 419e0020 38210020 e8010010 7c0803a6 4e800020 60000000 60000000 60000000 <0fe00000> 60000000 60000000 60000000 >> [ 587.999637] ---[ end trace 0000000000000000 ]--- >> [ 587.999640] xfs_iunlink_update_backref: next_agino: 3064909 cannot be found >> [ 587.999643] xfs_iunlink_insert_inode: Cannot find backref for agino:1311757, ip->i_ino:1311757, next_agino: 3064909 agno:0 >> [ 587.999646] XFS (dm-0): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_rename+0x9cc/0xd10 [xfs] >> >> ^^^ There are the extra info printing the next_agino to be the same >> agino 3064909 for xfs_iunlink_lookup has failed. > > Yep, then runtime code encounters agi[0].unlinked[13] == 3064909, but > doesn't find an xfs_inode in the cache for it, and shuts down the > filesystem. > Sure, thanks for the info. > --D > >> >> [ 587.999701] CPU: 48 PID: 2026 Comm: in:imjournal Tainted: G W 6.2.0-rc8ssh+ #38 >> [ 587.999705] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1010.22 (NH1010_122) hv:phyp pSeries >> [ 587.999708] Call Trace: >> [ 587.999709] [c00000001aa838f0] [c000000000e87328] dump_stack_lvl+0x6c/0x9c (unreliable) >> [ 587.999715] [c00000001aa83920] [c008000000646a84] xfs_error_report+0x5c/0x80 [xfs] >> [ 587.999767] [c00000001aa83980] [c008000000676860] xfs_trans_cancel+0x178/0x1b0 [xfs] >> [ 587.999823] [c00000001aa839d0] [c008000000664b34] xfs_rename+0x9cc/0xd10 [xfs] >> [ 587.999876] [c00000001aa83b10] [c00800000065e020] xfs_vn_rename+0xf8/0x1f0 [xfs] >> [ 587.999929] [c00000001aa83ba0] [c000000000579efc] vfs_rename+0x9bc/0xdf0 >> [ 587.999933] [c00000001aa83c90] [c00000000058018c] do_renameat2+0x3dc/0x5c0 >> [ 587.999937] [c00000001aa83de0] [c000000000580520] sys_rename+0x60/0x80 >> [ 587.999941] [c00000001aa83e10] [c000000000033630] system_call_exception+0x150/0x3b0 >> [ 587.999945] [c00000001aa83e50] [c00000000000c554] system_call_common+0xf4/0x258 >> [ 587.999950] --- interrupt: c00 at 0x7fff96082e20 >> [ 587.999952] NIP: 00007fff96082e20 LR: 00007fff95c45e24 CTR: 0000000000000000 >> [ 587.999955] REGS: c00000001aa83e80 TRAP: 0c00 Tainted: G W (6.2.0-rc8ssh+) >> [ 587.999958] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 2a082202 XER: 00000000 >> [ 587.999967] IRQMASK: 0 >> [ 587.999967] GPR00: 0000000000000026 00007fff94f5d220 00007fff96207300 00007fff94f5d288 >> [ 587.999967] GPR04: 0000000172b76b70 0000000000000000 0600000000000000 0000000000000002 >> [ 587.999967] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 587.999967] GPR12: 0000000000000000 00007fff94f666e0 0000000172b65930 0000000000000000 >> [ 587.999967] GPR16: 0000000132fe7648 00007fff95c47ce0 0000000000000000 000000000000004c >> [ 587.999967] GPR20: 0000000000000082 00007fff94f5e3a0 00007fff94f5e398 00007fff880029b0 >> [ 587.999967] GPR24: 0000000000000000 00007fff94f5e388 00007fff94f5e378 00007fff94f5e3b0 >> [ 587.999967] GPR28: 00007fff95c60000 00007fff95c605e0 00007fff88000c10 00007fff94f5d288 >> [ 588.000000] NIP [00007fff96082e20] 0x7fff96082e20 >> [ 588.000002] LR [00007fff95c45e24] 0x7fff95c45e24 >> [ 588.000004] --- interrupt: c00 >> [ 588.012398] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed >> [ 588.020328] XFS (dm-0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x190/0x1b0 [xfs] (fs/xfs/xfs_trans.c:1098). Shutting down filesystem. >> [ 588.020388] XFS (dm-0): Please unmount the filesystem and rectify the problem(s) >> >> >> 4. Here is the patch diff which we used to collect the info. >> >> <patch> >> ============= >> >> root-> git diff >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c >> index 5808abab786c..86b8cab7f759 100644 >> --- a/fs/xfs/xfs_inode.c >> +++ b/fs/xfs/xfs_inode.c >> @@ -1859,8 +1859,10 @@ xfs_iunlink_update_backref( >> return 0; >> >> ip = xfs_iunlink_lookup(pag, next_agino); >> - if (!ip) >> + if (!ip) { >> + pr_err("%s: next_agino: %u cannot be found\n", __func__, next_agino); >> return -EFSCORRUPTED; >> + } >> ip->i_prev_unlinked = prev_agino; >> return 0; >> } >> @@ -1935,8 +1937,11 @@ xfs_iunlink_insert_inode( >> * inode. >> */ >> error = xfs_iunlink_update_backref(pag, agino, next_agino); >> - if (error) >> + if (error) { >> + pr_crit("%s: Cannot find backref for agino:%u, ip->i_ino:%llu, next_agino: %u agno:%u\n", >> + __func__, agino, ip->i_ino, next_agino, pag->pag_agno); >> return error; >> + } >> >> if (next_agino != NULLAGINO) { >> /* >> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c >> index fc61cc024023..035fc1eba871 100644 >> --- a/fs/xfs/xfs_log.c >> +++ b/fs/xfs/xfs_log.c >> @@ -825,6 +825,10 @@ xfs_log_mount_finish( >> */ >> mp->m_super->s_flags |= SB_ACTIVE; >> xfs_log_work_queue(mp); >> + if (xlog_recovery_needed(log)) >> + pr_crit("%s: Recovery needed is set\n", __func__); >> + else >> + pr_crit("%s: Recovery needed not set\n", __func__); >> if (xlog_recovery_needed(log)) >> error = xlog_recover_finish(log); >> mp->m_super->s_flags &= ~SB_ACTIVE; >> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c >> index 322eb2ee6c55..6caa8147b443 100644 >> --- a/fs/xfs/xfs_log_recover.c >> +++ b/fs/xfs/xfs_log_recover.c >> @@ -2696,8 +2696,13 @@ xlog_recover_iunlink_bucket( >> ASSERT(VFS_I(ip)->i_nlink == 0); >> ASSERT(VFS_I(ip)->i_mode != 0); >> xfs_iflags_clear(ip, XFS_IRECOVERY); >> - agino = ip->i_next_unlinked; >> >> + if (bucket == 13) { >> + pr_crit("%s: bucket: %d, agino: %u, ino: %llu, iget ret: %d, previno:%llu, prev_agino:%u\n", >> + __func__, bucket, agino, ip->i_ino, error, prev_ip ? prev_ip->i_ino : -1, prev_ip ? XFS_INO_TO_AGINO(mp, prev_ip->i_ino) : -1); >> + } >> + >> + agino = ip->i_next_unlinked; >> if (prev_ip) { >> ip->i_prev_unlinked = prev_agino; >> xfs_irele(prev_ip); >> @@ -2789,8 +2794,11 @@ xlog_recover_iunlink_ag( >> * bucket and remaining inodes on it unreferenced and >> * unfreeable. >> */ >> + pr_crit("%s: Failed in xlog_recover_iunlink_bucket %d\n", __func__, error); >> xfs_inodegc_flush(pag->pag_mount); >> xlog_recover_clear_agi_bucket(pag, bucket); >> + } else { >> + pr_crit("%s: ran xlog_recover_iunlink_bucket for agi:%u, bucket:%d\n", __func__, pag->pag_agno, bucket); >> } >> } >> >> >> > That evidence will guide us towards a kernel patch. >> >> I can spend some more time to debug and understand on how to fix this. >> But thought of sharing this info meanwhile and see if there are any >> pointers on how to fix this in kernel. >> >> Let me know if any other info is needed. We haven't yet run xfs_repair >> on the device w/o -n option. >> >> -ritesh