On Thu, Jul 02, 2020 at 08:24:28AM +1000, Dave Chinner wrote: > On Wed, Jul 01, 2020 at 10:32:19AM -0400, Brian Foster wrote: > > On Tue, Jun 23, 2020 at 07:50:15PM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > Tracking dirty inodes via cluster buffers creates lock ordering > > > issues with logging unlinked inode updates direct to the cluster > > > buffer. The unlinked inode list is unordered, so we can lock cluster > > > buffers in random orders and that causes deadlocks. > > > > > > To solve this problem, we really want to dealy locking the cluster > > > buffers until the pre-commit phase where we can order the buffers > > > correctly along with all the other inode cluster buffers that are > > > locked by the transaction. However, to do this we need to be able to > > > tell the transaction which inodes need to have there unlinked list > > > updated and what it should be updated to. > > > > > > We can delay the buffer update to the pre-commit phase based on the > > > fact taht all unlinked inode list updates are serialised by the AGI > > > buffer. It will be locked into the transaction before the list > > > update starts, and will remain locked until the transaction commits. > > > Hence we can lock and update the cluster buffers safely any time > > > during the transaction and we are still safe from other racing > > > unlinked list updates. > > > > > > The iunlink log item currently only exists in memory. we need a log > > > item to attach information to the transaction, but it's context > > > is completely owned by the transaction. Hence it is never formatted > > > or inserted into the CIL, nor is it seen by the journal, the AIL or > > > log recovery. > > > > > > This makes it a very simple log item, and the changes makes results > > > in adding addition buffer log items to the transaction. Hence once > > > the iunlink log item has run it's pre-commit operation, it can be > > > dropped by the transaction and released. > > > > > > The creation of this in-memory intent does not prevent us from > > > extending it in future to the journal to replace buffer based > > > logging of the unlinked list. Changing the format of the items we > > > write to the on disk journal is beyond the scope of this patchset, > > > hence we limit it to being in-memory only. > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > --- > > > fs/xfs/Makefile | 1 + > > > fs/xfs/xfs_inode.c | 70 +++---------------- > > > fs/xfs/xfs_inode_item.c | 3 +- > > > fs/xfs/xfs_iunlink_item.c | 141 ++++++++++++++++++++++++++++++++++++++ > > > fs/xfs/xfs_iunlink_item.h | 24 +++++++ > > > fs/xfs/xfs_super.c | 10 +++ > > > 6 files changed, 189 insertions(+), 60 deletions(-) > > > create mode 100644 fs/xfs/xfs_iunlink_item.c > > > create mode 100644 fs/xfs/xfs_iunlink_item.h > > > ... > > > diff --git a/fs/xfs/xfs_iunlink_item.c b/fs/xfs/xfs_iunlink_item.c > > > new file mode 100644 > > > index 000000000000..83f1dc81133b > > > --- /dev/null > > > +++ b/fs/xfs/xfs_iunlink_item.c > > > @@ -0,0 +1,141 @@ > > ... > > > + > > > +static const struct xfs_item_ops xfs_iunlink_item_ops = { > > > + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED, > > > + .iop_release = xfs_iunlink_item_release, > > > > Presumably we need the release callback for transaction abort, but the > > flag looks unnecessary. That triggers a release on commit to the on-disk > > log, which IIUC should never happen for this item. > > You are probably right - I didn't look that further than "it should > be freed at commit time" and the flag name implies it is freed at > commit time. > > Which, of course, then raises the question: "Which commit are we > talking about here?". But because it's RFC work at this point I > didn't bother chasing that detail down because the code worked and I > had other things to do..... > I think it's funcionally harmless since this item would never be in a situation where that flag has an effect. FWIW, I was never a big fan of the iop_release() factoring and it hasn't really grown on me since. I routinely have to double/triple check these callbacks where I didn't before, especially when we have to consider things like this behavior altering flag and cases where some of the other callbacks route back into the release callback, etc. Brian > > > + .iop_sort = xfs_iunlink_item_sort, > > > + .iop_precommit = xfs_iunlink_item_precommit, > > > +}; > > > + > > > + > > > +/* > > > + * Initialize the inode log item for a newly allocated (in-core) inode. > > > + * > > > + * Inode extents can only reside within an AG. Hence specify the starting > > > + * block for the inode chunk by offset within an AG as well as the > > > + * length of the allocated extent. > > > + * > > > + * This joins the item to the transaction and marks it dirty so > > > + * that we don't need a separate call to do this, nor does the > > > + * caller need to know anything about the iunlink item. > > > + */ > > > > Looks like some copy/paste remnants in the comment. > > Yup, I did just copy-pasta at lot of stuff around here... > > Cheers, > > Dave. > > -- > Dave Chinner > david@xxxxxxxxxxxxx >