Re: [PATCH 01/14] xfs: fix sub-page blocksize data integrity writes

Brian Foster <bfoster@xxxxxxxxxx> · Mon, 20 May 2013 14:02:59 -0400

On 05/19/2013 07:51 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> FSX on 512 byte block size filesystems has been failing for some
> time with corrupted data. The fault dates back to the change in
> the writeback data integrity algorithm that uses a mark-and-sweep
> approach to avoid data writeback livelocks.
> 
> Unfortunately, a side effect of this mark-and-sweep approach is that
> each page will only be written once for a data integrity sync, and
> there is a condition in writeback in XFS where a page may require
> two writeback attempts to be fully written. As a result of the high
> level change, we now only get a partial page writeback during the
> integrity sync because the first pass through writeback clears the
> mark left on the page index to tell writeback that the page needs
> writeback....
> 
> The cause is writing a partial page in the clustering code. This can
> happen when a mapping boundary falls in the middle of a page - we
> end up writing back the first part of the page that the mapping
> covers, but then never revisit the page to have the remainder mapped
> and written.
> 
> The fix is simple - if the mapping boundary falls inside a page,
> then simple abort clustering without touching the page. This means
> that the next ->writepage entry that write_cache_pages() will make
> is the page we aborted on, and xfs_vm_writepage() will map all
> sections of the page correctly. This behaviour is also optimal for
> non-data integrity writes, as it results in contiguous sequential
> writeback of the file rather than missing small holes and having to
> write them a "random" writes in a future pass.
> 
> With this fix, all the fsx tests in xfstests now pass on a 512 byte
> block size filesystem on a 4k page machine.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---

Looks good to me.

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

>  fs/xfs/xfs_aops.c |   19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 2b2691b..f04eceb 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -725,6 +725,25 @@ xfs_convert_page(
>  			(xfs_off_t)(page->index + 1) << PAGE_CACHE_SHIFT,
>  			i_size_read(inode));
>  
> +	/*
> +	 * If the current map does not span the entire page we are about to try
> +	 * to write, then give up. The only way we can write a page that spans
> +	 * multiple mappings in a single writeback iteration is via the
> +	 * xfs_vm_writepage() function. Data integrity writeback requires the
> +	 * entire page to be written in a single attempt, otherwise the part of
> +	 * the page we don't write here doesn't get written as part of the data
> +	 * integrity sync.
> +	 *
> +	 * For normal writeback, we also don't attempt to write partial pages
> +	 * here as it simply means that write_cache_pages() will see it under
> +	 * writeback and ignore the page until some pointin the future, at which
> +	 * time this will be the only page inteh file that needs writeback.
> +	 * Hence for more optimal IO patterns, we should always avoid partial
> +	 * page writeback due to multiple mappings on a page here.
> +	 */
> +	if (!xfs_imap_valid(inode, imap, end_offset))
> +		goto fail_unlock_page;
> +
>  	len = 1 << inode->i_blkbits;
>  	p_offset = min_t(unsigned long, end_offset & (PAGE_CACHE_SIZE - 1),
>  					PAGE_CACHE_SIZE);
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs