On Wed, Jul 01, 2020 at 10:32:19AM -0400, Brian Foster wrote: > On Tue, Jun 23, 2020 at 07:50:15PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Tracking dirty inodes via cluster buffers creates lock ordering > > issues with logging unlinked inode updates direct to the cluster > > buffer. The unlinked inode list is unordered, so we can lock cluster > > buffers in random orders and that causes deadlocks. > > > > To solve this problem, we really want to dealy locking the cluster > > buffers until the pre-commit phase where we can order the buffers > > correctly along with all the other inode cluster buffers that are > > locked by the transaction. However, to do this we need to be able to > > tell the transaction which inodes need to have there unlinked list > > updated and what it should be updated to. > > > > We can delay the buffer update to the pre-commit phase based on the > > fact taht all unlinked inode list updates are serialised by the AGI > > buffer. It will be locked into the transaction before the list > > update starts, and will remain locked until the transaction commits. > > Hence we can lock and update the cluster buffers safely any time > > during the transaction and we are still safe from other racing > > unlinked list updates. > > > > The iunlink log item currently only exists in memory. we need a log > > item to attach information to the transaction, but it's context > > is completely owned by the transaction. Hence it is never formatted > > or inserted into the CIL, nor is it seen by the journal, the AIL or > > log recovery. > > > > This makes it a very simple log item, and the changes makes results > > in adding addition buffer log items to the transaction. Hence once > > the iunlink log item has run it's pre-commit operation, it can be > > dropped by the transaction and released. > > > > The creation of this in-memory intent does not prevent us from > > extending it in future to the journal to replace buffer based > > logging of the unlinked list. Changing the format of the items we > > write to the on disk journal is beyond the scope of this patchset, > > hence we limit it to being in-memory only. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/Makefile | 1 + > > fs/xfs/xfs_inode.c | 70 +++---------------- > > fs/xfs/xfs_inode_item.c | 3 +- > > fs/xfs/xfs_iunlink_item.c | 141 ++++++++++++++++++++++++++++++++++++++ > > fs/xfs/xfs_iunlink_item.h | 24 +++++++ > > fs/xfs/xfs_super.c | 10 +++ > > 6 files changed, 189 insertions(+), 60 deletions(-) > > create mode 100644 fs/xfs/xfs_iunlink_item.c > > create mode 100644 fs/xfs/xfs_iunlink_item.h > > > ... > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c > > index 0494b907c63d..bc1970c37edc 100644 > > --- a/fs/xfs/xfs_inode_item.c > > +++ b/fs/xfs/xfs_inode_item.c > > @@ -488,8 +488,9 @@ xfs_inode_item_push( > > ASSERT(iip->ili_item.li_buf); > > > > if (xfs_ipincount(ip) > 0 || xfs_buf_ispinned(bp) || > > - (ip->i_flags & XFS_ISTALE)) > > + (ip->i_flags & XFS_ISTALE)) { > > return XFS_ITEM_PINNED; > > + } > > Spurious change..? *nod* Likely a left over from adding debug traceprints... > > /* If the inode is already flush locked, we're already flushing. */ > > if (xfs_iflags_test(ip, XFS_IFLUSHING)) > > diff --git a/fs/xfs/xfs_iunlink_item.c b/fs/xfs/xfs_iunlink_item.c > > new file mode 100644 > > index 000000000000..83f1dc81133b > > --- /dev/null > > +++ b/fs/xfs/xfs_iunlink_item.c > > @@ -0,0 +1,141 @@ > ... > > + > > +static const struct xfs_item_ops xfs_iunlink_item_ops = { > > + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED, > > + .iop_release = xfs_iunlink_item_release, > > Presumably we need the release callback for transaction abort, but the > flag looks unnecessary. That triggers a release on commit to the on-disk > log, which IIUC should never happen for this item. You are probably right - I didn't look that further than "it should be freed at commit time" and the flag name implies it is freed at commit time. Which, of course, then raises the question: "Which commit are we talking about here?". But because it's RFC work at this point I didn't bother chasing that detail down because the code worked and I had other things to do..... > > + .iop_sort = xfs_iunlink_item_sort, > > + .iop_precommit = xfs_iunlink_item_precommit, > > +}; > > + > > + > > +/* > > + * Initialize the inode log item for a newly allocated (in-core) inode. > > + * > > + * Inode extents can only reside within an AG. Hence specify the starting > > + * block for the inode chunk by offset within an AG as well as the > > + * length of the allocated extent. > > + * > > + * This joins the item to the transaction and marks it dirty so > > + * that we don't need a separate call to do this, nor does the > > + * caller need to know anything about the iunlink item. > > + */ > > Looks like some copy/paste remnants in the comment. Yup, I did just copy-pasta at lot of stuff around here... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx