Re: [PATCH 1/2] xfs: xfs_check_page_type buffer checks need help

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 5 Mar 2014 17:08:20 -0500

On Wed, Mar 05, 2014 at 12:11:32PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> xfs_aops_discard_page() was introduced in the following commit:
> 
>   xfs: truncate delalloc extents when IO fails in writeback
> 
> ... to clean up left over delalloc ranges after I/O failure in
> ->writepage(). generic/224 tests for this scenario and occasionally
> reproduces panics on sub-4k blocksize filesystems.
> 
> The cause of this is failure to clean up the delalloc range on a
> page where the first buffer does not match one of the expected
> states of xfs_check_page_type(). If a buffer is not unwritten,
> delayed or dirty&mapped, xfs_check_page_type() stops and
> immediately returns 0.
> 
> The stress test of generic/224 creates a scenario where the first
> several buffers of a page with delayed buffers are mapped & uptodate
> and some subsequent buffer is delayed. If the ->writepage() happens
> to fail for this page, xfs_aops_discard_page() incorrectly skips
> the entire page.
> 
> This then causes later failures either when direct IO maps the range
> and finds the stale delayed buffer, or we evict the inode and find
> that the inode still has a delayed block reservation accounted to
> it.
> 
> We can easily fix this xfs_aops_discard_page() failure by making
> xfs_check_page_type() check all buffers, but this breaks
> xfs_convert_page() more than it is already broken. Indeed,
> xfs_convert_page() wants xfs_check_page_type() to tell it if the
> first buffers on the pages are of a type that can be aggregated into
> the contiguous IO that is already being built.
> 
> xfs_convert_page() should not be writing random buffers out of a
> page, but the current behaviour will cause it to do so if there are
> buffers that don't match the current specification on the page.
> Hence for xfs_convert_page() we need to:
> 
> 	a) return "not ok" if the first buffer on the page does not
> 	match the specification provided to we don't write anything;
> 	and
> 	b) abort it's buffer-add-to-io loop the moment we come
> 	across a buffer that does not match the specification.
> 
> Hence we need to fix both xfs_check_page_type() and
> xfs_convert_page() to work correctly with pages that have mixed
> buffer types, whilst allowing xfs_aops_discard_page() to scan all
> buffers on the page for a type match.
> 
> Reported-by: Brian Foster <bfoster@xxxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---

This looks pretty good to me as well. I notice one little thing that
might not be a real problem, but worth a quick thought...

>  fs/xfs/xfs_aops.c | 81 ++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 50 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index ef62c6b..98016b3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -632,38 +632,46 @@ xfs_map_at_offset(
>  }
>  
...
>  /*
> @@ -697,7 +705,7 @@ xfs_convert_page(
>  		goto fail_unlock_page;
>  	if (page->mapping != inode->i_mapping)
>  		goto fail_unlock_page;
> -	if (!xfs_check_page_type(page, (*ioendp)->io_type))
> +	if (!xfs_check_page_type(page, (*ioendp)->io_type, false))
>  		goto fail_unlock_page;
>  
>  	/*
> @@ -742,6 +750,15 @@ xfs_convert_page(
>  	p_offset = p_offset ? roundup(p_offset, len) : PAGE_CACHE_SIZE;
>  	page_dirty = p_offset / len;
>  
> +	/*
> +	 * The moment we find a buffer that doesn't match our current type
> +	 * specification or can't be written, abort the loop and start
> +	 * writeback. As per the above xfs_imap_valid() check, only
> +	 * xfs_vm_writepage() can handle partial page writeback fully - we are
> +	 * limited here to the buffers that are contiguous with the current
> +	 * ioend, and hence a buffer we can't write breaks that contiguity and
> +	 * we have to defer the rest of the IO to xfs_vm_writepage().
> +	 */
>  	bh = head = page_buffers(page);
>  	do {
>  		if (offset >= end_offset)
> @@ -750,7 +767,7 @@ xfs_convert_page(
>  			uptodate = 0;
>  		if (!(PageUptodate(page) || buffer_uptodate(bh))) {
>  			done = 1;
> -			continue;
> +			break;
>  		}
>  
>  		if (buffer_unwritten(bh) || buffer_delay(bh) ||
> @@ -762,10 +779,11 @@ xfs_convert_page(
>  			else
>  				type = XFS_IO_OVERWRITE;
>  
> -			if (!xfs_imap_valid(inode, imap, offset)) {
> -				done = 1;
> -				continue;
> -			}
> +			/*
> +			 * imap should always be valid because of the above
> +			 * partial page end_offset check on the imap.
> +			 */
> +			ASSERT(xfs_imap_valid(inode, imap, offset));
>  
>  			lock_buffer(bh);
>  			if (type != XFS_IO_OVERWRITE)
> @@ -777,6 +795,7 @@ xfs_convert_page(
>  			count++;
>  		} else {
>  			done = 1;
> +			break;
>  		}
>  	} while (offset += len, (bh = bh->b_this_page) != head);
> 

The next couple lines after the loop are:

	if (uptodate && bh == head)
		SetPageUptodate(page);

Now that we can break out of the loop, the "bh == head" part of that
check might not necessarily mean what it used to mean. The uptodate
variable is initialized to 1 and we reset to 0 the moment we encounter a
!uptodate buffer. Do you think it's possible to get here on the first
buffer of the page, without having reset 'uptodate,'  and potentially
incorrectly set the page uptodate?

Brian

> @@ -868,7 +887,7 @@ xfs_aops_discard_page(
>  	struct buffer_head	*bh, *head;
>  	loff_t			offset = page_offset(page);
>  
> -	if (!xfs_check_page_type(page, XFS_IO_DELALLOC))
> +	if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
>  		goto out_invalidate;
>  
>  	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> -- 
> 1.9.0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs