On Sun, Aug 11, 2024 at 10:59:52AM +0200, Christoph Hellwig wrote: > On Fri, Aug 09, 2024 at 09:03:24AM +1000, Dave Chinner wrote: > > The test and set here is racy. A long time can pass between the test > > and the setting of the flag, > > The race window is much tighter due to the iolock, but if we really > care about the race here, the right fix for that is to keep a second > check for the XFS_EOFBLOCKS_RELEASED flag inside the iolock. Right, that's exactly what the code I proposed below does. > > so maybe this should be optimised to > > something like: > > > > if (inode->i_nlink && > > (file->f_mode & FMODE_WRITE) && > > (!(ip->i_flags & XFS_EOFBLOCKS_RELEASED)) && > > xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) { > > if (xfs_can_free_eofblocks(ip) && > > !xfs_iflags_test_and_set(ip, XFS_EOFBLOCKS_RELEASED)) > > xfs_free_eofblocks(ip); > > xfs_iunlock(ip, XFS_IOLOCK_EXCL); > > } > > All these direct i_flags access actually are racy too (at least in > theory). Yes, but we really don't care about racing against the bit being set. The flag never gets cleared unless a truncate down occurs, so we don't really have to care about racing with that case - there will be no eofblocks to free. If the test races with another release call setting the flag (i.e. we see it clear) then we are going to go the slow way and then do exactly the right thing according to the current bit state once we hold the IO lock and the i_flags_lock. > We'd probably be better off moving those over to the atomic > bitops and only using i_lock for any coordination beyond the actual > flags. I'd rather not get into that here for now, even if it is a > worthwhile project for later. That doesn't solve the exclusive cacheline access problem Mateusz reported. It allows us to isolate the bitop updates, but in this case here the atomic test-and-set op still requires exclusive cacheline access. Hence we'd still need test-test-and-set optimisations here to avoid the exclusive cacheline contention when the bit is already set... > > I do wonder, though - why do we need to hold the IOLOCK to call > > xfs_can_free_eofblocks()? The only thing that really needs > > serialisation is the xfS_bmapi_read() call, and that's done under > > the ILOCK not the IOLOCK. Sure, xfs_free_eofblocks() needs the > > IOLOCK because it's effectively a truncate w.r.t. extending writes, > > but races with extending writes while checking if we need to do that > > operation aren't really a big deal. Worst case is we take the > > lock and free the EOF blocks beyond the writes we raced with. > > > > What am I missing here? > > I think the prime part of the story is that xfs_can_free_eofblocks was > split out of xfs_free_eofblocks, which requires the iolock. But I'm > not sure if some of the checks are a little racy without the iolock, Ok. I think the checks are racy even with the iolock - most of the checks are for inode metadata that is modified under the ilock (e.g. i_diflags, i_delayed_blks) or the ip->i_flags_lock (e.g. VFS_I(ip)->i_size for serialisation with updates via xfs_dio_write_end_io()). Hence I don't think that holding the IO lock here makes any difference here at all... > although I doubt it matter in practice as they are all optimizations. > I'd need to take a deeper look at this, so maybe it's worth a follow > on together with the changes in i_flags handling. *nod* -Dave. -- Dave Chinner david@xxxxxxxxxxxxx