Dave Chinner <david@xxxxxxxxxxxxx> writes: > On Thu, Mar 09, 2023 at 07:56:06PM +0530, Ritesh Harjani wrote: >> Hi Darrick, >> >> Please find the information collected from the system. We added some >> debug logs and looks like it is exactly what is happening which you >> pointed out. >> >> We added a debug kernel patch to get more info from the system which >> you had requested [1] >> >> 1. We first breaked into emergency shell where root fs is first getting >> mounted on /sysroot as "ro" filesystem. Here are the logs. >> >> [ OK ] Started File System Check on /dev/mapper/rhel_ltcden3--lp1-root. >> Mounting /sysroot... >> [ 7.203990] SGI XFS with ACLs, security attributes, quota, no debug enabled >> [ 7.205835] XFS (dm-0): Mounting V5 Filesystem 7b801289-75a7-4d39-8cd3-24526e9e9da7 >> [ ***] A start job is running for /sysroot (15s / 1min 35s)[ 17.439377] XFS (dm-0): Starting recovery (logdev: internal) >> [ *** ] A start job is running for /sysroot (16s / 1min 35s)[ 17.771158] xfs_log_mount_finish: Recovery needed is set >> [ 17.771172] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:0 >> [ 17.771179] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:1 >> [ 17.771184] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:2 >> [ 17.771190] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:3 >> [ 17.771196] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:4 >> [ 17.771201] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:5 >> [ 17.801033] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:6 >> [ 17.801041] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:7 >> [ 17.801046] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:8 >> [ 17.801052] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:9 >> [ 17.801057] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:10 >> [ 17.801063] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:11 >> [ 17.801068] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:12 >> [ 17.801272] xlog_recover_iunlink_bucket: bucket: 13, agino: 3064909, ino: 3064909, iget ret: 0, previno:18446744073709551615, prev_agino:4294967295 >> >> <previno, prev_agino> is just <-1 %ull and -1 %u> in above. That's why >> the huge value. > > That's NULLFSINO and NULLAGINO respectively. That's the indication > that this is the last inode on the chain and that the recovery loop > should terminate at this inode. > > Nothing looks wrong to me there. Yes, that's right. > > >> 2. Then these are the logs from xfs_repair -n /dev/dm-0 >> Here you will notice the same agi 3064909 in bucket 13 (from phase-2) which got also >> printed in above xlog_recover_iunlink_ag() function. >> >> switch_root:/# xfs_repair -n /dev/dm-0 >> Phase 1 - find and verify superblock... >> Phase 2 - using internal log >> - zero log... >> - scan filesystem freespace and inode maps... >> agi unlinked bucket 13 is 3064909 in ag 0 (inode=3064909) >> - found root inode chunk >> Phase 3 - for each AG... >> - scan (but don't clear) agi unlinked lists... >> - process known inodes and perform inode discovery... >> - agno = 0 >> - agno = 1 >> - agno = 2 >> - agno = 3 >> - process newly discovered inodes... >> Phase 4 - check for duplicate blocks... >> - setting up duplicate extent list... >> - check for inodes claiming duplicate blocks... >> - agno = 0 >> - agno = 2 >> - agno = 1 >> - agno = 3 >> No modify flag set, skipping phase 5 >> Phase 6 - check inode connectivity... >> - traversing filesystem ... >> - traversal finished ... >> - moving disconnected inodes to lost+found ... >> Phase 7 - verify link counts... >> would have reset inode 3064909 nlinks from 4294967291 to 2 > > And there's the problem. The link count, as an signed int, is -5. > It got reset to 2 probably because it was moved to lost+found due to > having a non-zero link count. Not sure how it got to -5 and I > suspect we'll never really know. However, I think that's irrelevant > because if we fix the problem with unlink recovery failing to > register an unlink recovery error from inodegc we'll intentionally > leak the inode and it won't ever be a problem again. > > >> No modify flag set, skipping filesystem flush and exiting. >> >> >> 3. Then we exit from the shell for the system to continue booting. >> Here it will continue.. Just pasting the logs where the warning gets >> generated and some extra logs are getting printed for the same inode >> with our patch. >> >> >> it continues >> ================ >> [ 587.999113] ------------[ cut here ]------------ >> [ 587.999121] WARNING: CPU: 48 PID: 2026 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs] > > Yeah, that happens because we failed to recover the unlinked inode, > we don't feed errors that occur in inodegc back to the unlink > recovery code, so it didn't clear the AGI bucket when we failed to > remove the inode from the unlinked list in inodegc. Hence it goes to > add the new inode to the start of unlinked list, and doesn't find the > broken, unrecovered inode that the bucket points to in cache. > > IOWs, we fix the problem with feeding back errors from inodegc to > iunlink recovery, and this specific issue will go away. > >> I can spend some more time to debug and understand on how to fix this. >> But thought of sharing this info meanwhile and see if there are any >> pointers on how to fix this in kernel. > > I suspect that we need to do something like gather an error in > xfs_inodegc_flush() like we do for gathering the error in > xfs_btree_split() from the offloaded xfs_btree_split_worker()... > Thanks Dave for pointers. I haven't yet started looking into it. Once I get few other things off my plate. I can start looking into this. -ritesh