On Thu, Aug 31, 2023 at 03:39:28PM -0500, Eric Sandeen wrote: > On 8/31/23 7:39 AM, Ritesh Harjani (IBM) wrote: > > "Darrick J. Wong" <djwong@xxxxxxxxxx> writes: > > > >> From: Darrick J. Wong <djwong@xxxxxxxxxx> > >> > >> shrikanth hegde reports that filesystems fail shortly after mount with > >> the following failure: > >> > >> WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs] > >> > >> This of course is the WARN_ON_ONCE in xfs_iunlink_lookup: > >> > >> ip = radix_tree_lookup(&pag->pag_ici_root, agino); > >> if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... } > >> > >> From diagnostic data collected by the bug reporters, it would appear > >> that we cleanly mounted a filesystem that contained unlinked inodes. > >> Unlinked inodes are only processed as a final step of log recovery, > >> which means that clean mounts do not process the unlinked list at all. > >> > >> Prior to the introduction of the incore unlinked lists, this wasn't a > >> problem because the unlink code would (very expensively) traverse the > >> entire ondisk metadata iunlink chain to keep things up to date. > >> However, the incore unlinked list code complains when it realizes that > >> it is out of sync with the ondisk metadata and shuts down the fs, which > >> is bad. > >> > >> Ritesh proposed to solve this problem by unconditionally parsing the > >> unlinked lists at mount time, but this imposes a mount time cost for > >> every filesystem to catch something that should be very infrequent. > >> Instead, let's target the places where we can encounter a next_unlinked > >> pointer that refers to an inode that is not in cache, and load it into > >> cache. > >> > >> Note: This patch does not address the problem of iget loading an inode > >> from the middle of the iunlink list and needing to set i_prev_unlinked > >> correctly. > >> > >> Reported-by: shrikanth hegde <sshegde@xxxxxxxxxxxxxxxxxx> > >> Triaged-by: Ritesh Harjani <ritesh.list@xxxxxxxxx> > >> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > >> --- > >> v2: log that we're doing runtime recovery, dont mess with DONTCACHE, > >> and actually return ENOLINK > >> --- > >> fs/xfs/xfs_inode.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++--- > >> fs/xfs/xfs_trace.h | 25 +++++++++++++++++ > >> 2 files changed, 96 insertions(+), 4 deletions(-) > > > > Hi Darrick, > > > > Thanks for taking a look at this. I tested this patch on the setup where > > Shrikanth earlier saw the crash. I still can see a problem. I saw it is > > taking the branch from > > > > + /* If this is not an unlinked inode, something is very wrong. */ > > + if (VFS_I(next_ip)->i_nlink != 0) { > > + error = -EFSCORRUPTED; > > + goto rele; > > + } > > > > Here are the logs of reference - > > > > [ 21.399573] XFS (dm-0): Found unrecovered unlinked inode 0x2ec44d in AG 0x0. Initiating recovery. > > [ 21.400150] XFS (dm-0): Internal error xfs_trans_cancel at line 1104 of file fs/xfs/xfs_trans.c. Caller xfs_remove+0x1a0/0x310 [xfs] > > Do you have a metadump for that filesystem, to examine that inode? IIRC, Ritesh's problem was that there were inodes on the unlinked list *and* they had nonzero i_nlink. This patch doesn't address that; you'll have to wait for the online repair version. --D > -Eric > > > [ 21.400222] CPU: 0 PID: 1629 Comm: systemd-tmpfile Not tainted 6.5.0+ #2 > > [ 21.400226] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1010.22 (NH1010_122) hv:phyp pSeries > > [ 21.400230] Call Trace: > > [ 21.400231] [c000000014cdbb70] [c000000000f377b8] dump_stack_lvl+0x6c/0x9c (unreliable) > > [ 21.400239] [c000000014cdbba0] [c008000000f7c204] xfs_error_report+0x5c/0x80 [xfs] > > [ 21.400303] [c000000014cdbc00] [c008000000fab320] xfs_trans_cancel+0x178/0x1b0 [xfs] > > [ 21.400371] [c000000014cdbc50] [c008000000f999d8] xfs_remove+0x1a0/0x310 [xfs] > > [ 21.400432] [c000000014cdbcc0] [c008000000f93eb0] xfs_vn_unlink+0x68/0xf0 [xfs] > > [ 21.400493] [c000000014cdbd20] [c0000000005b8038] vfs_rmdir+0x178/0x300 > > [ 21.400498] [c000000014cdbd60] [c0000000005be444] do_rmdir+0x124/0x240 > > [ 21.400502] [c000000014cdbdf0] [c0000000005be594] sys_rmdir+0x34/0x50 > > [ 21.400506] [c000000014cdbe10] [c000000000033c38] system_call_exception+0x148/0x3a0 > > [ 21.400511] [c000000014cdbe50] [c00000000000c6d4] system_call_common+0xf4/0x258 > > >