Re: [PATCH 16/26] xfs: synchronous AIL pushing

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 12 Oct 2019 10:27:16 +1100

On Fri, Oct 11, 2019 at 11:29:45AM -0400, Brian Foster wrote:
> On Fri, Oct 11, 2019 at 03:18:25AM -0700, Christoph Hellwig wrote:
> > On Wed, Oct 09, 2019 at 02:21:14PM +1100, Dave Chinner wrote:
> > > Factor the common AIL deletion code that does all the wakeups into a
> > > helper so we only have one copy of this somewhat tricky code to
> > > interface with all the wakeups necessary when the LSN of the log
> > > tail changes.
> > > 
> > > xfs_ail_push_sync() is temporary infrastructure to facilitate
> > > non-blocking, IO-less inode reclaim throttling that allows further
> > > structural changes to be made. Once those structural changes are
> > > made, the need for this function goes away and it is removed,
> > > leaving us with only the xfs_ail_update_finish() factoring when this
> > > is all done.
> > 
> > The xfs_ail_update_finish work here is in an earlier patch, so the
> > changelog will need some updates.
> > 
> > > +	spin_lock(&ailp->ail_lock);
> > > +	while ((lip = xfs_ail_min(ailp)) != NULL) {
> > > +		prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE);
> > > +		if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
> > > +		    XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0)
> 
> Wasn't this supposed to change to < 0? The rfc series had that logic,
> but it changed from <= to < later in the wrong patch.

I probably forgot because this code gets removed at the end of the
series. Hence I haven't cared about exact correctness of neatness
as it's just temporary scaffolding to keep stuff from breaking
horribly as the changeover to non-blocking algorithms is done.

It works well enough that I can't break it as it stands - I've
tested each patch individually with both load and fstests, and so
this code as it stands doesn't introduce any bisect landmines - it
prevents a bunch of problems in OOM conditions by retaining the
blocking behaviour of reclaim until we no longer need it...

> > > +			break;
> > > +		/* XXX: cmpxchg? */
> > > +		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
> > > +			xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
> > 
> > This code looks broken on 32-bit given that xfs_trans_ail_copy_lsn takes
> > the ail_lock there.  Just replacing the xfs_trans_ail_copy_lsn call with
> > a direct assignment would fix that, no need for cmpxchg either as far
> > as I can tell (and it would fix that too long line as well).

Oh, right. I'll fix that.

> > still looks odd, I think this should simply be an if. 
> > 
> > > +		wake_up_process(ailp->ail_task);
> > > +		spin_unlock(&ailp->ail_lock);
> > 
> > xfsaild will take ail_lock pretty quickly.  I think we should drop
> > the lock before waking it.
> 
> Can't we replace this whole thing with something that repurposes
> xfs_ail_push_all_sync()? That only requires some tweaks to the existing
> function and the new _push_all_sync() wrapper ends up looking something
> like:
> 
> 	while ((threshold_lsn = xfs_ail_max_lsn(ailp)) != 0)
> 		xfs_ail_push_sync(ailp, threshold_lsn);
> 
> There's an extra lock cycle, but that's still only on tail updates. That
> doesn't seem unreasonable to me for the usage of _push_all_sync().

The whole thing goes away, so there is zero point in trying to
optimise or perfect this code. It's temporary code, treat it as
such.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx