On Wed, Jan 05, 2022 at 11:09:47AM +1100, Dave Chinner wrote: > On Thu, Dec 16, 2021 at 03:56:09PM +1100, Dave Chinner wrote: > > On Wed, Dec 15, 2021 at 05:09:21PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > I was poking around in the directory code while diagnosing online fsck > > > bugs, and noticed that xfs_readdir doesn't actually take the directory > > > ILOCK when it calls xfs_dir2_isblock. xfs_dir_open most probably loaded > > > the data fork mappings > > > > Yup, that is pretty much guaranteed. If the inode is extent or btree form as the > > extent count will be non-zero, hence we can only get to the > > xfs_dir2_isblock() check if the inode has moved from local to block > > form between the open and xfs_dir2_isblock() get in the getdents > > code. > > > > > and the VFS took i_rwsem (aka IOLOCK_SHARED) so > > > we're protected against writer threads, but we really need to follow the > > > locking model like we do in other places. The same applies to the > > > shortform getdents function. > > > > Locking rules should be the same as xfs_dir_lookup()..... > > Ok, I assumed the locking in xfs_dir_lookup() is optimal.... > > .... which it turns out it isn't. All calls to xfs_dir_lookup() > occur with the directory locked at the VFS level, so the internal > contents of the directory can never change during a lookup. Hence > holding the ILOCK across the entire lookup is both unnecessary and > excessive. > > What xfs_dir_lookup() should be doing is what xfs_readdir() is > largely already doing - just locking the ILOCK around buffer read > operations when we are mapping directory offsets to physical disk > locations and reading them from disk. Changing this is a > significant set of changes, so it's not something that needs to be > done right now. > > However, we still need to protect the xfs_dir2_isblock() lookup call > in xfs_readdir(). > > > > While we're at it, clean up the somewhat strange structure of this > > > function. > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_dir2_readdir.c | 28 +++++++++++++++++----------- > > > 1 file changed, 17 insertions(+), 11 deletions(-) > > > > > > > > > diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c > > > index 8310005af00f..25560151c273 100644 > > > --- a/fs/xfs/xfs_dir2_readdir.c > > > +++ b/fs/xfs/xfs_dir2_readdir.c > > > @@ -507,8 +507,9 @@ xfs_readdir( > > > size_t bufsize) > > > { > > > struct xfs_da_args args = { NULL }; > > > - int rval; > > > - int v; > > > + unsigned int lock_mode; > > > + int error; > > > + int isblock; > > > > > > trace_xfs_readdir(dp); > > > > > > @@ -522,14 +523,19 @@ xfs_readdir( > > > args.geo = dp->i_mount->m_dir_geo; > > > args.trans = tp; > > > > > > - if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) > > > - rval = xfs_dir2_sf_getdents(&args, ctx); > > > - else if ((rval = xfs_dir2_isblock(&args, &v))) > > > - ; > > > - else if (v) > > > - rval = xfs_dir2_block_getdents(&args, ctx); > > > - else > > > - rval = xfs_dir2_leaf_getdents(&args, ctx, bufsize); > > > + lock_mode = xfs_ilock_data_map_shared(dp); > > > + if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) { > > > + xfs_iunlock(dp, lock_mode); > > > + return xfs_dir2_sf_getdents(&args, ctx); > > > + } > > Directory inode format cannot change here, so we don't need to > hold the ILOCK at all to do shortform checks. Ok. > > > > > > - return rval; > > > + error = xfs_dir2_isblock(&args, &isblock); > > > + xfs_iunlock(dp, lock_mode); > > > + if (error) > > > + return error; > > > + > > > + if (isblock) > > > + return xfs_dir2_block_getdents(&args, ctx); > > Can the xfs_dir2_isblock() call be moved into > xfs_dir2_block_getdents() where it already takes the ILOCK to read > the block? Yeah. > > > + > > > + return xfs_dir2_leaf_getdents(&args, ctx, bufsize); > > Otherwise this patch is correct and this is where we should start > fixing the directory locking mess... <nod> --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx