Re: [PATCH] xfs: symlinks can be zero length during log recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 16, 2018 at 10:10:34AM +1000, Dave Chinner wrote:
> On Fri, Jun 15, 2018 at 07:31:26AM -0400, Brian Foster wrote:
> > On Fri, Jun 15, 2018 at 11:43:14AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > A log recovery failure has been reproduced where a symlink inode has
> > > a zero length in extent form. It was caused by a shutdown during a
> > > combined fstress+fsmark workload.
> > > 
> > > To fix it, we have to allow zero length symlink inodes through
> > > xfs_dinode_verify() during log recovery. We already specifically
> > > check and allow this case in the shortform symlink fork verifier,
> > > but in this case we don't get that far, and the inode is not in
> > > shortform format.
> > > 
> > > Update the dinode verifier to handle this case, and change the
> > > symlink fork verifier to only allow this case to exist during log
> > > recovery.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > ---
> > 
> > Seems Ok to me, but before we restrict some of the existing checks to
> > log recovery I am curious about one thing. xfs_inactive_symlink() has
> > this:
> > 
> >         /*
> >          * Zero length symlinks _can_ exist.
> >          */
> >         pathlen = (int)ip->i_d.di_size;
> >         if (!pathlen) {
> >                 xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >                 return 0;
> >         }
> > 
> > I'm not quite sure what case that covers, but it seems slightly
> > inconsistent with the fork verifer change (simply because that path is
> > not exclusive to the read from disk case), at least. Any idea?
> 
> Yeah, that's what I'm trying to chase down right now. I had the
> verifier fire on inode writeback during generic/269. I don't know
> yet where these zero length symlinks are coming from, and none of
> the comments (there's a couple that say the above)
> actually give any hint to their source.

Ok, so there is this comment in fs/namei.c w.r.t. symlink handling
before getname_flags():

* POSIX.1 2.4: an empty pathname is invalid (ENOENT).

So the call chain is

sys_symlink(oldname ....)
  do_symlinkat(oldname ...)
    getname(oldname)
      getname_flags(oldname, 0, NULL)
        len = strncpy_from_user(... oldname ....)
	....
	if (!len) {
		if (!(flags & LOOKUP_EMPTY))
			return -ENOENT;
	}

So we should never see a zero length symlink from userspace as flags
is always zero. Hence if we are seeing zero length symlinks on disk,
then that's an XFS implementation issue, not a user API requirement.

There's two issues in the symlink code that can lead to zero length
symlinks firing the verifiers. They are symptoms of the same core
issue in xfs_inactive_symlink(): the inode is unlocked between the
symlink inactivation/truncation and the inode being freed. This
opens a window for the inode to be written to disk before it
xfs_ifree() removes it from the unlinked list, marks it free in the
inobt and zeros the mode.

The first, and simplest to solve issue is the shortform verifier.
This verifier doesn't actually verify on disk state - it verifies
*in memory inode fork state*. Specifically, it checks for a zero
length inode fork (ifp->if_bytes) and says specifically "this can
happen". The only place it can happen in in the window between
xfs_inactive_symlink() and xfs_ifree() because
xfs_inactive_symlink() tears down the data fork. It doesn't,
however, change the inode size, so if the inode is written back to
disk in this window, it's written with a non-zero size, leaving the
data fork in the inode untouched. i.e. the inode on disk is still a
valid symlink.

To fix this is easy. xfs_ifree() actually cleans up the in-memory
data and attr fork structures, and so there is absolutely no need to
do it in xfs_inactive_symlink(). With that change, the symlink
verifier error goes away.

Which leaves remote symlink inactivation. This runs a transaction
that truncates away the symlink extent and sets the inode size to
zero. IOWs, it creates an actual path for zero length symlinks on
disk. However, the symlink inode at this point is unreferenced by
userspace and is on the unlinked list, and hence userspace can never
see a zero length symlink inode. This does, however, create a
problem - we can get zero length symlinks on disk in log recovery
because the inode size is set to zero at the same time the EFI
intents are recorded. hence if log recovery then reads the inode off
disk to replay EFIs or other non-completed intents, it can see a
symlink inode with zero length.

So we have a choice here: either special case log recovery for
symlink inode verification, or prevent zero length extent for
symlink inodes from existing on disk. The former is essentially the
patch I posted, the latter requires discussion.

If we want to avoid zero length extent form symlinks on disk we
either need to make the inactivation and freeing atomic, or we can
make symlink inactivation change the type of inode to something that
allows zero length. The former is complex and a major undertaking
(new deferred op, a bunch of new intents, log recovery work, etc)
while the latter is one line of code. i.e. we simply change the
mode of the inode to a regular file at the same time we set the size
to zero.

If the transaction with the EFIs and zero size goes to disk, we
don't really care what the inode type is. it's on the unlinked list,
can't be seen from userspace, and we just need to run extent removal
and freeing on it in log recovery. Hence if we change it to be a
regular file inode, then we maintain the "no zero length symlinks
on disk" rule, and we get cleanup occurring without any new code or
concerns being created.

Thoughts?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux