Re: [PATCH 6/6] xfs: convert xfsbufd to use a workqueue

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 26 Aug 2011 09:46:56 +1000

On Thu, Aug 25, 2011 at 03:57:19PM -0500, Alex Elder wrote:
> On Thu, 2011-08-25 at 17:17 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > There is no reason we need a thread per filesystem to do the
> > flushing of the delayed write buffer queue. This can be easily
> > handled by a global concurrency managed workqueue.
> > 
> > Convert the delayed write buffer handling to use workqueues and
> > workqueue flushes to implement buffer writeback by embedding a
> > delayed work structure into the struct xfs_buftarg and using that to
> > control flushing.  This greatly simplifes the process of flushing
> > and also removes a bunch of duplicated code between buftarg flushing
> > and delwri buffer writeback.
> 
> I point out one bug below.  I also question one of the
> changes and have some suggestions.  I'd like to see this
> updated and get another chance to review the result.
> 
> 					-Alex
> 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_buf.c       |  165 ++++++++++++++++++++----------------------------
> >  fs/xfs/xfs_buf.h       |    5 +-
> >  fs/xfs/xfs_dquot.c     |    1 -
> >  fs/xfs/xfs_trans_ail.c |    2 +-
> >  4 files changed, 72 insertions(+), 101 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 410de9f..b1b8c0c 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> 
> . . .
> 
> > @@ -1407,8 +1407,9 @@ xfs_buf_delwri_queue(
> >  	}
> >  
> >  	if (list_empty(dwq)) {
> > -		/* start xfsbufd as it is about to have something to do */
> > -		wake_up_process(bp->b_target->bt_task);
> > +		/* queue a delayed flush as we are about to queue a buffer */
> > +		queue_delayed_work(xfs_buf_wq, &bp->b_target->bt_delwrite_work,
> > +			xfs_buf_timer_centisecs * msecs_to_jiffies(10));
> 
> I think this should be done *after* the buffer has been
> added to the delayed work queue.  I realize there will be
> a small delay so it should be fine, but...  It also doesn't
> have to be done with bt_delwrite_lock held.

The reason it is done there is so we don't need a local variable to
store whether we need to queue work. The fact that the worker must
then first grab the bt_delwrite_lock before it will do anything
means that even if th worker runs immediately, it will block until
we've finished adding this item to the list and dropped the lock
so it really doesn't matter that we queue the work before adding the
buffer to the list. And the code is actually simpler doing it this way....

> 
> >  	}
> >  
> >  	bp->b_flags |= _XBF_DELWRI_Q;
> > @@ -1486,13 +1487,13 @@ STATIC int
> >  xfs_buf_delwri_split(
> >  	xfs_buftarg_t	*target,
> >  	struct list_head *list,
> > -	unsigned long	age)
> > +	unsigned long	age,
> > +	int		force)
> >  {
> >  	xfs_buf_t	*bp, *n;
> >  	struct list_head *dwq = &target->bt_delwrite_queue;
> >  	spinlock_t	*dwlk = &target->bt_delwrite_lock;
> >  	int		skipped = 0;
> > -	int		force;
> >  
> >  	force = test_and_clear_bit(XBT_FORCE_FLUSH, &target->bt_flags);
> 
> You forgot to delete this line when you made "force" be
> an argument to this function.

Oops. So I did. I screwed that up when updating it after all the
recent xfs_buf.c changes. Will fix.

> I think if you delete this line all is well, but you
> need to test this.

Of course ;)

> > -	xfs_buf_runall_queues(xfsconvertd_workqueue);
> > -	xfs_buf_runall_queues(xfsdatad_workqueue);
> > -	xfs_buf_runall_queues(xfslogd_workqueue);
> > +	long		age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
> > +	int		force = 0;
> >  
> > -	set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
> > -	pincount = xfs_buf_delwri_split(target, &tmp_list, 0);
> > +	if (test_and_clear_bit(XBT_FORCE_FLUSH, &btp->bt_flags)) {
> > +		force = 1;
> > +		age = 0;
> > +	}
> 
> xfs_buf_delwri_split() ignores its "age" argument if "force"
> is set.  This function never uses "age" otherwise.  So the
> above can just be:
> 
> 	force = test_and_clear_bit(XBT_FORCE_FLUSH, &btp->bt_flags);

Ok.

> > +/*
> > + * Flush all the queued buffer work, then flush any remaining dirty buffers
> > + * and wait for them to complete. If there are buffers remaining on the delwri
> > + * queue, then they were pinned so couldn't be flushed. Return a value of 1 to
> > + * indicate that there were pinned buffers and the caller needs to retry the
> > + * flush.
> > + */
> > +int
> > +xfs_flush_buftarg(
> > +	xfs_buftarg_t	*target,
> > +	int		wait)
> 
> Since this function now ignores its "wait" argument,
> you could eliminate it, and perhaps get rid of the
> one (first) call in xfs_quiesce_fs() that passes 0.

I'll leave that to a another patch.

> 
> > +{
> > +	xfs_buf_runall_queues(xfsconvertd_workqueue);
> > +	xfs_buf_runall_queues(xfsdatad_workqueue);
> > +	xfs_buf_runall_queues(xfslogd_workqueue);
> > +
> > +	set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
> > +	flush_delayed_work_sync(&target->bt_delwrite_work);
> > +
> > +	if (!list_empty(&target->bt_delwrite_queue))
> > +		return 1;
> > +	return 0;
>  
> Maybe just:
> 	return !list_empty(&target->bt_delwrite_queue);
> 
> (But I understand why you might prefer what you did.)

Yeah, I prefer the explict return values than having to work out
whether it is returning 0 or 1. Call me lazy. ;)

> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index db62959..1fb9d93 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -1446,7 +1446,6 @@ xfs_qm_dqflock_pushbuf_wait(
> >  		if (xfs_buf_ispinned(bp))
> >  			xfs_log_force(mp, 0);
> >  		xfs_buf_delwri_promote(bp);
> > -		wake_up_process(bp->b_target->bt_task);
> 
> Why does this not need to be replaced with a
> flush_delayed_work() call?  Was this wake_up_process()
> not needed before?

The wake up that is done here is simply to cause the delwri buffer
queue to be flushed immediately, rather than wait for the next
timeout. The thing is that the xfsbufd can idle with no next defined
wakeup, so this was added to ensure the xfsbufd was woken and the
buffer flushed. xfs_qm_dqflock_pushbuf_wait, however, is
never called in a performance or latency critical path, so this
was purely defensive to ensure the xfsbufd was awake as I wasn't
certain I'd got everything right in the xfsbufd idling code.

With the workqueue changeover, I'm pretty certain that the delayed
workqueue processing will continue until the delwri queue is empty,
so tehre is no need for a defensive flush of it here - the dquot
buffer will get flushed in a short while now that it is at the head
of the delwri queue without needing to explicitly ensure the queue
is running...

> If that's the case it seems like
> that change is independent of the switch to work queues
> and should be done separately (first).

I don't think it is separable from the workqueue changeover...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs