Re: [PATCH 5/5] xfs: don't chain ioends during writepage submission

Brian Foster <bfoster@xxxxxxxxxx> · Tue, 9 Feb 2016 09:23:55 -0500

On Mon, Feb 08, 2016 at 04:44:18PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Currently we can build a long ioend chain during ->writepages that
> gets attached to the writepage context. IO submission only then
> occurs when we finish all the writepage processing. This means we
> can have many ioends allocated and pending, and this violates the
> mempool guarantees that we need to give about forwards progress.
> i.e. we really should only have one ioend being built at a time,
> otherwise we may drain the mempool trying to allocate a new ioend
> and that blocks submission, completion and freeing of ioends that
> are already in progress.
> 
> To prevent this situation from happening, we need to submit ioends
> for IO as soon as they are ready for dispatch rather than queuing
> them for later submission. This means the ioends have bios built
> immediately and they get queued on any plug that is current active.
> Hence if we schedule away from writeback, the ioends that have been
> built will make forwards progress due to the plug flushing on
> context switch. This will also prevent context switches from
> creating unnecessary IO submission latency.
> 
> We can't completely avoid having nested IO allocation - when we have
> a block size smaller than a page size, we still need to hold the
> ioend submission until after we have marked the current page dirty.
> Hence we may need multiple ioends to be held while the current page
> is completely mapped and made ready for IO dispatch. We cannot avoid
> this problem - the current code already has this ioend chaining
> within a page so we can mostly ignore that it occurs.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_aops.c | 94 ++++++++++++++++++++++++++++---------------------------
>  1 file changed, 48 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index e5b8214..852ecf2 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -738,29 +726,22 @@ xfs_writepage_submit(
>  	struct writeback_control *wbc,
>  	int			status)
>  {
> -	struct blk_plug		plug;
> -
> -	/* Reserve log space if we might write beyond the on-disk inode size. */
> -	if (!status && wpc->ioend && wpc->ioend->io_type != XFS_IO_UNWRITTEN &&
> -	    xfs_ioend_is_append(wpc->ioend))
> -		status = xfs_setfilesize_trans_alloc(wpc->ioend);
> -
> -	if (wpc->iohead) {
> -		blk_start_plug(&plug);
> -		xfs_submit_ioend(wbc, wpc->iohead, status);
> -		blk_finish_plug(&plug);
> -	}

We've dropped our plug here but I don't see anything added in
xfs_vm_writepages(). Shouldn't we have one there now that ioends are
submitted as we go? generic_writepages() uses one around its
write_cache_pages() call..

> +	if (wpc->ioend)
> +		xfs_submit_ioend(wbc, wpc->ioend, status);
>  	return status;
>  }
>  
>  static int
>  xfs_writepage_map(
>  	struct xfs_writepage_ctx *wpc,
> +	struct writeback_control *wbc,
>  	struct inode		*inode,
>  	struct page		*page,
>  	loff_t			offset,
>  	__uint64_t              end_offset)
>  {
> +	struct xfs_ioend	*ioend_to_submit = NULL;
> +	struct xfs_ioend	*ioend_tail = NULL;
>  	struct buffer_head	*bh, *head;
>  	ssize_t			len = 1 << inode->i_blkbits;
>  	int			error = 0;
> @@ -827,23 +808,37 @@ xfs_writepage_map(
>  							 offset);
>  		}
>  		if (wpc->imap_valid) {
> +			struct xfs_ioend *ioend;
> +
>  			lock_buffer(bh);
>  			if (wpc->io_type != XFS_IO_OVERWRITE)
>  				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -			xfs_add_to_ioend(inode, bh, offset, wpc);

"Big picture" comment here please, i.e. something along the lines of
(feel free to rewrite/fix/enhance):

"This implements an immediate ioend submission policy. If a new ioend is
required, the old ioend is returned and slated for submission on
function exit. The purpose of this policy is to avoid creating and
holding large chains of ioend objects in memory. While ioends are
submitted immediately after they are completed, block plugging helps
provide batching."

> +			ioend = xfs_add_to_ioend(inode, bh, offset, wpc);
> +			if (ioend) {
> +				ioend->io_list = NULL;
> +				if (!ioend_to_submit)
> +					ioend_to_submit = ioend;
> +				else
> +					ioend_tail->io_list = ioend;
> +				ioend_tail = ioend;
> +			}
>  			count++;
>  		}
>  
> -		if (!wpc->iohead)
> -			wpc->iohead = wpc->ioend;
> -
>  	} while (offset += len, ((bh = bh->b_this_page) != head));
>  
>  	if (uptodate && bh == head)
>  		SetPageUptodate(page);
>  
>  	xfs_start_page_writeback(page, 1, count);
> -	ASSERT(wpc->iohead || !count);
> +	ASSERT(wpc->ioend || !count);
> +	while (ioend_to_submit) {
> +		struct xfs_ioend *next = ioend_to_submit->io_list;
> +
> +		ioend_to_submit->io_list = NULL;
> +		xfs_submit_ioend(wbc, ioend_to_submit, 0);
> +		ioend_to_submit = next;
> +	}
>  	return 0;
>  
>  out_error:
> @@ -853,9 +848,16 @@ out_error:
>  	 * ioend, then we can't touch it here and need to rely on IO submission
>  	 * to unlock it.
>  	 */
> -	if (count)
> +	if (count) {
>  		xfs_start_page_writeback(page, 0, count);
> -	else {
> +		while (ioend_to_submit) {
> +			struct xfs_ioend *next = ioend_to_submit->io_list;
> +
> +			ioend_to_submit->io_list = NULL;
> +			xfs_submit_ioend(wbc, ioend_to_submit, 0);
> +			ioend_to_submit = next;
> +		}
> +	} else {
>  		xfs_aops_discard_page(page);
>  		ClearPageUptodate(page);
>  		unlock_page(page);

If we have an error and count == 0, we know that ioend_to_submit is NULL
because that is only potentially set once the first buffer is added.
That said, this doesn't mean that we don't have an ioend waiting on the
wpc. If we do, we still return the error and the ioend is errored out.

I wonder if that is really necessary if we haven't added any buffers
from the page..? Could we submit the ioend properly in that case? OTOH,
that might complicate the error reporting and an error here might be
serious enough that it isn't worth it, as opposed to just making sure we
clean up everything appropriately.

Brian

> @@ -975,14 +977,14 @@ xfs_do_writepage(
>  		end_offset = offset;
>  	}
>  
> -	error = xfs_writepage_map(wpc, inode, page, offset, end_offset);
> +	error = xfs_writepage_map(wpc, wbc, inode, page, offset, end_offset);
>  	if (error)
>  		goto out_error;
>  	return 0;
>  
>  out_error:
>  	/*
> -	 * We have to fail the iohead here because we buffers locked in the
> +	 * We have to fail the ioend here because we buffers locked in the
>  	 * ioend chain. If we don't do this, we'll deadlock invalidating the
>  	 * page as that tries to lock the buffers on the page. Also, because we
>  	 * have set pages under writeback, we have to run IO completion to mark
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs