On Mon, May 20, 2024 at 06:04:47PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > An internal user complained about log recovery failing on a symlink > ("Bad dinode after recovery") with the following (excerpted) format: > > core.magic = 0x494e > core.mode = 0120777 > core.version = 3 > core.format = 2 (extents) > core.nlinkv2 = 1 > core.nextents = 1 > core.size = 297 > core.nblocks = 1 > core.naextents = 0 > core.forkoff = 0 > core.aformat = 2 (extents) > u3.bmx[0] = [startoff,startblock,blockcount,extentflag] > 0:[0,12,1,0] > > This is a symbolic link with a 297-byte target stored in a disk block, > which is to say this is a symlink with a remote target. The forkoff is > 0, which is to say that there's 512 - 176 == 336 bytes in the inode core > to store the data fork. > > Eventually, testing of generic/388 failed with the same inode corruption > message during inode recovery. In writing a debugging patch to call > xfs_dinode_verify on dirty inode log items when we're committing > transactions, I observed that xfs/298 can reproduce the problem quite > quickly. > > xfs/298 creates a symbolic link, adds some extended attributes, then > deletes them all. The test failure occurs when the final removexattr > also deletes the attr fork because that does not convert the remote > symlink back into a shortform symlink. That is how we trip this test. > The only reason why xfs/298 only triggers with the debug patch added is > that it deletes the symlink, so the final iflush shows the inode as > free. > > I wrote a quick fstest to emulate the behavior of xfs/298, except that > it leaves the symlinks on the filesystem after inducing the "corrupt" > state. Kernels going back at least as far as 4.18 have written out > symlink inodes in this manner and prior to 1eb70f54c445f they did not > object to reading them back in. > > Because we've been writing out inodes this way for quite some time, the > only way to fix this is to relax the check for symbolic links. > Directories don't have this problem because di_size is bumped to > blocksize during the sf->data conversion. > > Fixes: 1eb70f54c445f ("xfs: validate inode fork size against fork format") > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_inode_buf.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index 2305e64a4d5a9..88f4f2a1855ae 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -375,16 +375,27 @@ xfs_dinode_verify_fork( > * For fork types that can contain local data, check that the fork > * format matches the size of local data contained within the fork. > * > + * A symlink with a small target can have a data fork can be in extents This doesn't parse. Do you mean something like: * Even a symlink with a target small enough to fit into the inode can * be stored in extent format if ... ? The existing parts of the comment could also use a bit of an overhaul and be moved closer to the code they are documenting while you are at it. Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@xxxxxx>