On Fri, May 22, 2020 at 05:13:34PM -0700, Darrick J. Wong wrote: > On Fri, May 22, 2020 at 01:50:27PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Now that we have all the dirty inodes attached to the cluster > > buffer, we don't actually have to do radix tree lookups to find > > them. Sure, the radix tree is efficient, but walking a linked list > > of just the dirty inodes attached to the buffer is much better. > > > > We are also no longer dependent on having a locked inode passed into > > the function to determine where to start the lookup. This means we > > can drop it from the function call and treat all inodes the same. > > > > We also make xfs_iflush_cluster skip inodes marked with > > XFS_IRECLAIM. This we avoid races with inodes that reclaim is > > actively referencing or are being re-initialised by inode lookup. If > > they are actually dirty, they'll get written by a future cluster > > flush.... > > > > We also add a shutdown check after obtaining the flush lock so that > > we catch inodes that are dirty in memory and may have inconsistent > > state due to the shutdown in progress. We abort these inodes > > directly and so they remove themselves directly from the buffer list > > and the AIL rather than having to wait for the buffer to be failed > > and callbacks run to be processed correctly. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_inode.c | 150 ++++++++++++++++------------------------ > > fs/xfs/xfs_inode.h | 2 +- > > fs/xfs/xfs_inode_item.c | 15 +++- > > 3 files changed, 74 insertions(+), 93 deletions(-) > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index cbf8edf62d102..7db0f97e537e3 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -3428,7 +3428,7 @@ xfs_rename( > > return error; > > } > > > > -static int > > +int > > xfs_iflush( > > Not sure why this drops the static? Stray hunk from reordering the patchset. This used to be before the removal of writeback from reclaim, so it needed to be called from there. But that ordering made no sense as it required heaps of temporary changes to the reclaim code, so I put it first and got rid of of all the reclaim writeback futzing. Clearly I forgot to remove this hunk when I cleared xfs_iflush() out of the header file. > > - if (!cip->i_ino) { > > - xfs_ifunlock(cip); > > - xfs_iunlock(cip, XFS_ILOCK_SHARED); > > + if (XFS_FORCED_SHUTDOWN(mp)) { > > + xfs_iunpin_wait(ip); > > + /* xfs_iflush_abort() drops the flush lock */ > > + xfs_iflush_abort(ip); > > + xfs_iunlock(ip, XFS_ILOCK_SHARED); > > + error = EIO; > > error = -EIO? Yup, good catch. > > - error = xfs_iflush_cluster(ip, bp); > > + /* > > + * We need to hold a reference for flushing the cluster buffer as it may > > + * fail the buffer without IO submission. In which case, we better have > > + * a reference for that completion as otherwise we don't get a reference > > + * for IO until we queue it for delwri submission. > > <confused> > > What completion are we talking about? Does this refer to the fact that > xfs_iflush_cluster handles a flush failure by simulating an async write > failure which could result in us giving away the inode log item's > reference to the buffer? Yes. This is the way the code currently works before this patchset. The IO reference used by the AIL writeback comes from xfs_imap_to_bp(), and getting rid of that from the writeback path means we no longer have an IO reference to the buffer before calling xfs_iflush_cluster(). And, yes, the xfs_buf_ioend_fail() call in xfs_iflush_cluster() implicitly relies on this reference existing. Hence we have to take one ourselves and we cannot rely on the IO reference we get from adding the buffer to the delwri list. This was a bug I found a couple of hours before I posted the patchset, and the bisect pointed at this patch as the cause. It may be that it should be in the patch that gets rid of the xfs_iflush() call and propagate through that way. However, after 3 days of continuous bisecting to find bugs as a result of implicit, undocumented stuff like this over and over again, the novelty was starting to wear a bit thin. I'll revisit this when I've regained some patience.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx