Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote:
> Instead of implementing our own writeback clustering use write_cache_pages
> to do it for us.  This means the guts of the current writepage implementation
> become a new helper used both for implementing ->writepage and as a callback
> to write_cache_pages for ->writepages.  A new struct xfs_writeback_ctx
> is used to track block mapping state and the ioend chain over multiple
> invocation of it.
> 
> The advantage over the old code is that we avoid a double pagevec lookup,
> and a more efficient handling of extent boundaries inside a page for
> small blocksize filesystems, as well as having less XFS specific code.

It's not more efficient right now, due to a little bug:

> @@ -973,36 +821,38 @@ xfs_vm_writepage(
>  		 * buffers covering holes here.
>  		 */
>  		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
> -			imap_valid = 0;
> +			ctx->imap_valid = 0;
>  			continue;
>  		}
>  
>  		if (buffer_unwritten(bh)) {
>  			if (type != IO_UNWRITTEN) {
>  				type = IO_UNWRITTEN;
> -				imap_valid = 0;
> +				ctx->imap_valid = 0;
>  			}
>  		} else if (buffer_delay(bh)) {
>  			if (type != IO_DELALLOC) {
>  				type = IO_DELALLOC;
> -				imap_valid = 0;
> +				ctx->imap_valid = 0;
>  			}
>  		} else if (buffer_uptodate(bh)) {
>  			if (type != IO_OVERWRITE) {
>  				type = IO_OVERWRITE;
> -				imap_valid = 0;
> +				ctx->imap_valid = 0;
>  			}
>  		} else {
>  			if (PageUptodate(page)) {
>  				ASSERT(buffer_mapped(bh));
> -				imap_valid = 0;
> +				ctx->imap_valid = 0;
>  			}
>  			continue;
>  		}

This piece of logic checks is the type of buffer has changed from the
previous buffer. This used to work just fine, but now "type" is
local to the __xfs_vm_writepage() function, while the imap life
spanѕ multiple calls to the __xfs_vm_writepage() function. Hence
type is reinitialised to IO_OVERWRITE on every page that written,
and so for delalloc we are invalidating the imap and looking it up
again on every page. Traces show this sort of behaviour:

           <...>-514   [000] 689640.881953: xfs_writepage:        dev 253:16 ino 0x552248 pgoff 0xf7000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-514   [000] 689640.881954: xfs_ilock:            dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks
           <...>-514   [000] 689640.881954: xfs_iunlock:          dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks
           <...>-514   [000] 689640.881954: xfs_map_blocks_found: dev 253:16 ino 0x552248 size 0x0 new_size 0x0 offset 0xf7000 count 1024 type  startoff 0x0 startblock 6297609 blockcount 0x2800
           <...>-514   [000] 689640.881956: xfs_writepage:        dev 253:16 ino 0x552248 pgoff 0xf8000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-514   [000] 689640.881957: xfs_ilock:            dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks
           <...>-514   [000] 689640.881957: xfs_iunlock:          dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks
           <...>-514   [000] 689640.881957: xfs_map_blocks_found: dev 253:16 ino 0x552248 size 0x0 new_size 0x0 offset 0xf8000 count 1024 type  startoff 0x0 startblock 6297609 blockcount 0x2800
           <...>-514   [000] 689640.881960: xfs_writepage:        dev 253:16 ino 0x552248 pgoff 0xf9000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-514   [000] 689640.881960: xfs_ilock:            dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks
           <...>-514   [000] 689640.881961: xfs_iunlock:          dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks
           <...>-514   [000] 689640.881961: xfs_map_blocks_found: dev 253:16 ino 0x552248 size 0x0 new_size 0x0 offset 0xf9000 count 1024 type  startoff 0x0 startblock 6297609 blockcount 0x2800

IOWs, the type field also needs to be moved into the writepage
context structure so that we don't keep doing needless extent map
lookups.

With the following patch, the trace output now looks like this for
delalloc writeback:

           <...>-12623 [000] 694093.594883: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x505000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-12623 [000] 694093.594884: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x506000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-12623 [000] 694093.594884: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x507000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-12623 [000] 694093.594885: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x508000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-12623 [000] 694093.594885: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x509000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-12623 [000] 694093.594886: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x50a000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-12623 [000] 694093.594887: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x50b000 size 0xa00000 offset 0 delalloc 1 unwritten 0
           <...>-12623 [000] 694093.594888: xfs_writepage:        dev 253:16 ino 0x2300a5 pgoff 0x50c000 size 0xa00000 offset 0 delalloc 1 unwritten 0


i.e. there mapping lookup is no longer occurring for every page.

As a side effect, the failure case I'm seeing with test 180 has gone
from 5-10 files with the wrong size to >200 files with the wrong
size with this patch, so clearly there is something wrong with file
size updates getting to disk that this patch set makes worse.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

xfs: io type needs to be part of the writepage context

From: Dave Chinner <dchinner@xxxxxxxxxx>

If we don't pass the IO type we are mapping with the writeage
context, then the imap is recalculated on every delalloc page that
is passed to _xfs_vm_writepage(). This defeats the purpose of having
a cached imap between calls and increases the overhead of delalloc
writeback significantly.

Fix this by moving the io type into the writepage context structure
so that it moves with the cached imap through the stack.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 fs/xfs/linux-2.6/xfs_aops.c |   30 ++++++++++++++++++------------
 1 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 73dac4b..25b63cd 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -40,6 +40,7 @@
 
 struct xfs_writeback_ctx {
 	unsigned int		imap_valid;
+	unsigned int		io_type;
 	struct xfs_bmbt_irec	imap;
 	struct xfs_ioend	*iohead;
 	struct xfs_ioend	*ioend;
@@ -804,7 +805,6 @@ __xfs_vm_writepage(
 
 	bh = head = page_buffers(page);
 	offset = page_offset(page);
-	type = IO_OVERWRITE;
 
 	do {
 		int new_ioend = 0;
@@ -826,18 +826,18 @@ __xfs_vm_writepage(
 		}
 
 		if (buffer_unwritten(bh)) {
-			if (type != IO_UNWRITTEN) {
-				type = IO_UNWRITTEN;
+			if (ctx->io_type != IO_UNWRITTEN) {
+				ctx->io_type = IO_UNWRITTEN;
 				ctx->imap_valid = 0;
 			}
 		} else if (buffer_delay(bh)) {
-			if (type != IO_DELALLOC) {
-				type = IO_DELALLOC;
+			if (ctx->io_type != IO_DELALLOC) {
+				ctx->io_type = IO_DELALLOC;
 				ctx->imap_valid = 0;
 			}
 		} else if (buffer_uptodate(bh)) {
-			if (type != IO_OVERWRITE) {
-				type = IO_OVERWRITE;
+			if (ctx->io_type != IO_OVERWRITE) {
+				ctx->io_type = IO_OVERWRITE;
 				ctx->imap_valid = 0;
 			}
 		} else {
@@ -862,7 +862,8 @@ __xfs_vm_writepage(
 			 * time.
 			 */
 			new_ioend = 1;
-			err = xfs_map_blocks(inode, offset, &ctx->imap, type);
+			err = xfs_map_blocks(inode, offset, &ctx->imap,
+					     ctx->io_type);
 			if (err)
 				goto error;
 			ctx->imap_valid =
@@ -870,11 +871,12 @@ __xfs_vm_writepage(
 		}
 		if (ctx->imap_valid) {
 			lock_buffer(bh);
-			if (type != IO_OVERWRITE) {
+			if (ctx->io_type != IO_OVERWRITE) {
 				xfs_map_at_offset(inode, bh, &ctx->imap,
 						  offset);
 			}
-			xfs_add_to_ioend(ctx, inode, bh, offset, type, new_ioend);
+			xfs_add_to_ioend(ctx, inode, bh, offset, ctx->io_type,
+					 new_ioend);
 			count++;
 		}
 	} while (offset += len, ((bh = bh->b_this_page) != head));
@@ -902,7 +904,9 @@ xfs_vm_writepage(
 	struct page		*page,
 	struct writeback_control *wbc)
 {
-	struct xfs_writeback_ctx ctx = { };
+	struct xfs_writeback_ctx ctx = {
+		.io_type = IO_OVERWRITE,
+	};
 	int ret;
 
 	/*
@@ -939,7 +943,9 @@ xfs_vm_writepages(
 	struct address_space	*mapping,
 	struct writeback_control *wbc)
 {
-	struct xfs_writeback_ctx ctx = { };
+	struct xfs_writeback_ctx ctx = {
+		.io_type = IO_OVERWRITE,
+	};
 	int ret;
 
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux