Re: [PATCH 3/7] xfs: fix a bug in the online fsck directory leaf1 bestcount check

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 16 Dec 2021 13:40:46 -0800

On Fri, Dec 17, 2021 at 08:17:48AM +1100, Dave Chinner wrote:
> On Thu, Dec 16, 2021 at 11:25:49AM -0800, Darrick J. Wong wrote:
> > On Thu, Dec 16, 2021 at 04:05:37PM +1100, Dave Chinner wrote:
> > > On Wed, Dec 15, 2021 at 05:09:32PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > > 
> > > > When xfs_scrub encounters a directory with a leaf1 block, it tries to
> > > > validate that the leaf1 block's bestcount (aka the best free count of
> > > > each directory data block) is the correct size.  Previously, this author
> > > > believed that comparing bestcount to the directory isize (since
> > > > directory data blocks are under isize, and leaf/bestfree blocks are
> > > > above it) was sufficient.
> > > > 
> > > > Unfortunately during testing of online repair, it was discovered that it
> > > > is possible to create a directory with a hole between the last directory
> > > > block and isize.
> > > 
> > > We have xfs_da3_swap_lastblock() that can leave an -empty- da block
> > > between the last referenced block and isize, but that's not a "hole"
> > > in the file. If you don't mean xfs_da3_swap_lastblock(), then can
> > > you clarify what you mean by a "hole" here and explain to me how the
> > > situation it occurs in comes about?
> > 
> > I don't actually know how it comes about.  I wrote a test that sets up
> > fsstress to expand and contract directories and races xfs_scrub -n, and
> > noticed that I'd periodically get complaints about directories (usually
> > $SCRATCH_MNT/p$CPU) where the last block(s) before i_size were actually
> > holes.
> 
> Is that test getting to ENOSPC at all?

Yes.  That particular VM has a generous 8GB of SCRATCH_DEV to make the
repairs more interesting.

> > I began reading the dir2 code to try to figure out how this came about
> > (clearly we're not updating i_size somewhere) but then took the shortcut
> > of seeing if xfs_repair or xfs_check complained about this situation.
> > Neither of them did, and I found a couple more directories in a similar
> > situation on my crash test dummy machine, and concluded "Wellllp, I
> > guess this is part of the ondisk format!" and committed the patch.
> > 
> > Also, I thought xfs_da3_swap_lastblock only operates on leaf and da
> > btree blocks, not the blocks containing directory entries?
> 
> Ah, right you are. I noticed xfs_da_shrink_inode() being called from
> leaf_to_block() and thought it might be swapping the leaf with the
> last data block that we probably just removed. Looking at the code,
> that is not going to happend AFAICT...
> 
> > I /think/
> > the actual explanation is that something goes wrong in
> > xfs_dir2_shrink_inode (maybe?) such that the mapping goes away but
> > i_disk_size doesn't get updated?  Not sure how /that/ can happen,
> > though...
> 
> Actually, the ENOSPC case in xfs_dir2_shrink_inode is the likely
> case. If we can't free the block because bunmapi gets ENOSPC due
> to xfs_dir_rename() being called without a block reservation, it'll
> just get left there as an empty data block. If all the other dir
> data blocks around it get removed properly, it could eventually end
> up between the last valid entry and isize....
> 
> There are lots of weird corner cases around ENOSPC in the directory
> code, perhaps this is just another of them...

<nod> The next time I reproduce it, I'll send you a metadump.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx