Re: [PATCH] xfs: recheck appropriateness of map_shared lock

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 20 Jan 2023 07:34:04 +1100

On Thu, Jan 19, 2023 at 10:39:34AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 19, 2023 at 04:14:11PM +1100, Dave Chinner wrote:
> > If we hit this race condition, re-reading the extent list from disk
> > isn't going to fix the corruption, so I don't see much point in
> > papering over the problem just by changing the locking and failing
> > to read in the extent list again and returning -EFSCORRUPTED to the
> > operation.
> 
> Yep.
> 
> > So.... shouldn't we mark the inode as sick when we detect the extent
> > list corruption issue? i.e. before destroying the iext tree, calling
> > xfs_inode_mark_sick(XFS_SICK_INO_BMBTD) (or BMBTA, depending on the
> > fork being read) so that there is a record of the BMBT being
> > corrupt?
> 
> Yes.
> 
> > That would mean that this path simply becomes:
> > 
> > 	if (ip->i_sick & XFS_SICK_INO_BMBTD) {
> > 		xfs_iunlock(ip, lock_mode);
> > 		return -EFSCORRUPTED;
> > 	}
> 
> This path being xfs_ilock_{data,attr}_map_shared?  These don't
> return an error.

I was thinking we just change the function parameters to take a "int
*lockmode" parameter and return an error state similar to
what we do in the IO path with the xfs_ilock_iocb() wrapper.

> But if we make sure xfs_need_iread_extents
> returns true for XFS_SICK_INO_BMBTD, xfs_iread_extents can
> return -EFSCORRUPTED.

I don't think that solves the race condition because
xfs_need_iread_extents() is run unlocked. Just like it can race with
filling the extent list and then removing it again while we wait on
the ILOCK, it can return true before XFS_SICK_INO_BMBTD is set and
then when we get the lock we find XFS_SICK_INO_BMBTD is set and
extent list is empty...

Hence I think the check for extent list corruption has to be done
after we gain the inode lock so we wait correctly for the result of
the racing extent loading before proceeding.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx