On Mon, Apr 10, 2023 at 06:06:38PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > While fuzzing the data fork extent count on a btree-format directory > with xfs/375, I observed the following (excerpted) splat: > > XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] > Call Trace: > <TASK> > xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] > __x64_sys_ioctl+0x82/0xa0 > do_syscall_64+0x2b/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > The cause of this is a race condition in xfs_ilock_data_map_shared, > which performs an unlocked access to the data fork to guess which lock > mode it needs: > > Thread 0 Thread 1 > > xfs_need_iread_extents > <observe no iext tree> > xfs_ilock(..., ILOCK_EXCL) > xfs_iread_extents > <observe no iext tree> > <check ILOCK_EXCL> > <load bmbt extents into iext> > <notice iext size doesn't > match nextents> > xfs_need_iread_extents > <observe iext tree> > xfs_ilock(..., ILOCK_SHARED) > <tear down iext tree> > xfs_iunlock(..., ILOCK_EXCL) > xfs_iread_extents > <observe no iext tree> > <check ILOCK_EXCL> > *BOOM* > > Fix this race by adding a flag to the xfs_ifork structure to indicate > that we have not yet read in the extent records and changing the > predicate to look at the flag state, not if_height. The memory barrier > ensures that the flag will not be set until the very end of the > function. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_bmap.c | 2 ++ > fs/xfs/libxfs/xfs_inode_fork.c | 2 ++ > fs/xfs/libxfs/xfs_inode_fork.h | 3 ++- > 3 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 34de6e6898c4..5f96e7ce7b4a 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -1171,6 +1171,8 @@ xfs_iread_extents( > goto out; > } > ASSERT(ir.loaded == xfs_iext_count(ifp)); > + smp_mb(); > + ifp->if_needextents = 0; Hmmm - if this is to ensure that everything above is completed before the clearing of this flag is visible everywhere else, then we should be able to use load_acquire/store_release semantics? i.e. the above is smp_store_release(ifp->if_needextents, 0); and we use smp_load_acquire(ifp->if_needextents) when we need to read the value to ensure that all the changes made before the relevant stores are also visible? > return 0; > out: > xfs_iext_destroy(ifp); > diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c > index 6b21760184d9..eadae924dc42 100644 > --- a/fs/xfs/libxfs/xfs_inode_fork.c > +++ b/fs/xfs/libxfs/xfs_inode_fork.c > @@ -174,6 +174,8 @@ xfs_iformat_btree( > int level; > > ifp = xfs_ifork_ptr(ip, whichfork); > + ifp->if_needextents = 1; Hmmm - what's the guarantee that the reader will see ifp->if_format set correctly if they if_needextents = 1? Wouldn't it be better to set this at the same time we set the ifp->if_format value? We clear it unconditionally above in xfs_iread_extents(), so why not set it unconditionally there, too, before we start. i.e. /* * Set the format before we set needsextents with release * semantics. This ensures that we can use acquire semantics * on needextents in xfs_need_iread_extents() and be * guaranteed to see a valid format value after that load. */ ifp->if_format = dip->di_format; smp_store_release(ifp->if_needextents, 1); That then means xfs_need_iread_extents() is guaranteed to see a valid ifp->if_format if ifp->if_needextents is set if we do: /* returns true if the fork has extents but they are not read in yet. */ static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp) { /* see xfs_iread_extents() for needextents semantics */ return smp_load_acquire(ifp->if_needextents) && ifp->if_format == XFS_DINODE_FMT_BTREE; } Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx