On Mon, 2011-07-04 at 15:27 +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > When inodes are marked stale in a transaction, they are treated > specially when the iinode log item is being inserted into the AIL. > It trieѕ to avoid moving the log item forward in the AIL due to a > race condition with the writing the underlying buffer back to disk. > The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes in > the AIL"). > > To avoid moving the item forward, we return a LSN smaller than the > commit_lsn of the completing transaction, thereby trying to trick > the commit code into not moving the inode forward at all. I'm not > sure this ever worked as intended - it assumes the inode is already > in the AIL, but I don't think the returned LSN would have been small > enough to prevent moving the inode. It appears that the reason it > worked is that the lower LSN of the inodes meant they were inserted > into the AIL and flushed before the inode buffer (which was moved to > the commit_lsn of the transaction). > > The big problem is that with delayed logging, the returning of the > different LSN means insertion takes the slow, non-bulk path. Worse > yet is that insertion is to a position -before- the commit_lsn so it > is doing a AIL traversal on every insertion, and has to walk over > all the items that have already been inserted into the AIL. It's > expensive. > > To compound the matter further, with delayed logging inodes are > likely to go from clean to stale in a single checkpoint, which means > they aren't even in the AIL at all when we come across them at AIL > insertion time. Hence these were all getting inserted into the AIL > when they simply do not need to be as inodes marked XFS_ISTALE are > never written back. > > Transactional/recovery integrity is maintained in this case by the > other items in the unlink transaction that were modified (e.g. the > AGI btree blocks) and committed in the same checkpoint. > > So to fix this, simply unpin the stale inodes directly in > xfs_inode_item_committed() and return -1 to indicate that the AIL > insertion code does not need to do any further processing of these > inodes. > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> I suggest one comment update, which I can do for you or it can be done at another time. But this looks good. I'll send it to Linus tomorrow. Reviewed-by: Alex Elder <aelder@xxxxxxx> > --- > fs/xfs/xfs_inode_item.c | 14 ++++++++------ > fs/xfs/xfs_trans.c | 2 +- > 2 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > index 09983a3..b1e88d5 100644 . . . > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > index 7c7bc2b..3744337 100644 > --- a/fs/xfs/xfs_trans.c > +++ b/fs/xfs/xfs_trans.c > @@ -1474,7 +1474,7 @@ xfs_trans_committed_bulk( > lip->li_flags |= XFS_LI_ABORTED; > item_lsn = IOP_COMMITTED(lip, commit_lsn); > > - /* item_lsn of -1 means the item was freed */ > + /* item_lsn of -1 means the item needs no further processing */ Probably should update the corresponding comment in xfs_trans_item_committed() too. I have done this in my local copy. > if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0) > continue; > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs