Re: [RFC PATCH 2/2] xfs: don't access AGI on unlinked inodes if it can

Gao Xiang <hsiangkao@xxxxxxxxxx> · Thu, 9 Jul 2020 08:55:26 +0800

On Thu, Jul 09, 2020 at 09:33:11AM +1000, Dave Chinner wrote:
> On Tue, Jul 07, 2020 at 09:57:41PM +0800, Gao Xiang wrote:
> > Currently, we use AGI buffer lock to protect in-memory linked list for
> > unlinked inodes but since it's not necessary to modify AGI unless the
> > head of the unlinked list is modified. So let's removing the AGI buffer
> > modification dependency if possible, including 1) adding another per-AG
> > dedicated lock to protect the whole list and 2) inserting unlinked
> > inodes from tail.
> > 
> > For 2), the tail of bucket 0 is now recorded in perag for xfs_iunlink()
> > to use. xfs_iunlink_remove() still support old multiple short bucket
> > lists for recovery code.
> 
> I would split this into two separate patches. One to move to a perag
> based locking strategy, another to change from head to tail
> addition as they are largely independent algorithmic changes.

Yes, that is much better from the perspective of spilting patches and
I thought that before. It seems that is like 2 steps but the proposed
target solution is as a whole (in other words, 2 steps are code-related)
and I'm not sure how large these code is sharable or can be inherited
but rather than introduce some code in patch 2 and then remove immediately
and turn into a new code in patch 3). I'm not sure how large logic could
be sharable between these 2 dependent steps so I didn't do that.

I will spilt patches in the next RFC version to make a try.

> 
> > Note that some paths take AGI lock in its transaction in advance,
> > so the proper locking order is only AGI lock -> unlinked list lock.
> 
> These paths should be documented in the commit message as well
> as in code comments so the reviewer is aware of those code paths
> and can verify that your assumptions about locking order are
> correct.

Yeah, It has been documented in the following.

+xfs_iunlink(
 	struct xfs_trans	*tp,
-	struct xfs_buf		*agibp,
 	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	struct xfs_agi		*agi = agibp->b_addr;
-	xfs_agino_t		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+	struct xfs_perag	*pag;
+	struct xfs_inode	*pip;
 	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
-	xfs_agino_t		next_agino;
+	int			error;
+
+	ASSERT(VFS_I(ip)->i_nlink == 0);
+	ASSERT(VFS_I(ip)->i_mode != 0);
+	trace_xfs_iunlink(ip);
+
+	pag = xfs_perag_get(mp, agno);

 	/*
-	 * Get the index into the agi hash table for the list this inode will
-	 * go on.  Make sure the pointer isn't garbage and that this inode
-	 * isn't already on the list.
+	 * some paths (e.g. xfs_create_tmpfile) could take AGI lock
+	 * in this transaction in advance and the only locking order
+	 * AGI buf lock -> pag_unlinked_mutex is safe.

> 
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_inode.c       | 251 ++++++++++++++++++++-------------------
> >  fs/xfs/xfs_log_recover.c |   6 +
> >  fs/xfs/xfs_mount.c       |   3 +
> >  fs/xfs/xfs_mount.h       |   3 +
> >  4 files changed, 144 insertions(+), 119 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 10565fa5ace4..d33e5b198534 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -1994,182 +1994,195 @@ xfs_iunlink_update_bucket(
> >  }
> >  
> >  /*
> > - * Always insert at the head, so we only have to do a next inode lookup to
> > - * update it's prev pointer. The AGI bucket will point at the one we are
> > - * inserting.
> > + * This is called when the inode's link count has gone to 0 or we are creating
> > + * a tmpfile via O_TMPFILE.  The inode @ip must have nlink == 0.
> > + *
> > + * We place the on-disk inode on a list in the AGI.  It will be pulled from this
> > + * list when the inode is freed.
> >   */
> > -static int
> > -xfs_iunlink_insert_inode(
> 
> Hmmm. Flattening the code also make the patch harder to follow as it
> combines code movement/rearrangement with algorithmic changes.  We
> try to separate out code movement/rearrangement into their own
> patches so that the movement is easy to verify by itself.
> 
> Also, the helper functions help document the separation of the
> unlinked list manipulations from the from setup and locking
> requirements for the list manipulations, and this largely undoes all
> that. I added these helpers because it completely untangled the mess
> that was present before the RFC patchset I posted. THat is, I
> couldn't easily modify the existing code because it interleaved
> the locking, the backref hash manipulations and
> the on-disk list manipulations in ways I found difficult to
> understand and manage. Short, simple, clear functions are much
> better than long, multiple operation functions...
> 
> i.e. this:
> 
> xfs_iunlink()
> {
> 
> 	get locks
> 	do list insert
> 	drop locks
> }
> 
> Is better for understanding, maintenance and future modification
> than:
> 
> xfs_iunlink()
> {
> 
> 	get perag
> 	lock perag
> 	look at tail of list
> 	if (empty) {
> 		unlock perag
> 		read/lock AGI
> 		lock perag
> 		look at tail of list
> 		if (empty)
> 			do head insert
> 			goto out
> 	}
> 	do tail insert
> out:
> 	update inode/pag tails
> 	unlock
> 	drop perag
> }
> 
> It's trivial for a reader to understand what the first version of
> xfs_iunlink() is going to do without needing to understand the
> intraccies of the locking strategies. However, it takes time and
> effort to undestand exactly waht the second one is doing because
> it's not clear where lock ends and list modifications start, nor
> what the locking rules are for the different modifications that are
> being made. Essentially, it goes back to the complex
> locking-intertwined-with-modification-algorithm problem the current
> TOT code has.
> 
> I'd much prefer to see something like this:
> 
> /*
>  * Inode allocation in the O_TMPFILE path defines the AGI/unlinked
>  * list lock order as being AGI->perag unlinked list lock. We are
>  * inverting it here as the fast path tail addition does not need to
>  * modify the AGI at all. Hence we only need the AGI lock if the
>  * tail is empty, but if we fail to get it without blocking then we
>  * need to fall back to the slower, correct lock order.
>  */
> xfs_iunlink_insert_lock()
> {
> 	get perag;
> 	lock_perag();
> 	if (!tail empty)
> 		return;
> 	if (trylock AGI)
> 		return;

(adding some notes here, this patch doesn't use try lock here
 finally but unlock perag and take AGI and relock and recheck tail_empty....
 since the tail non-empty is rare...)

> 
> 	/*
> 	 * Slow path, need to lock AGI first. Don't even bother
> 	 * rechecking tail pointers or trying to optimise for
> 	 * minimal AGI lock hold time as racing unlink list mods
> 	 * will all block on the perag lock once we take that. They
> 	 * will then hit the !tail empty fast path and not require
> 	 * the AGI lock at all.
> 	 */
> 	lock AGI
> 	lock_perag()
> 	return;
> }
> 
> The non-AGI locking fast path is slightly different in the remove
> case, so we'll have a slightly different helper function in that
> case which checks where the inode being removed is in the list.
> 
> In both cases, though, the unlock should be the same:
> 
> xfs_iunlink_unlock()
> {
> 	/* Does not unlock AGI, ever. commit does that. */
> 	unlock perag
> 	put perag
> }
> 
> This keeps the list locking completely separate from the list
> manipulations and allows us to document the locking constraints and
> reasons for why it is or isn't optimised for specific conditions
> without cluttering up the list manipulations code.

Yeah, the _lock and _unlock pair is helpful, I will go for this direction.

Thanks,
Gao Xiang

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>