On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote: > Instead of implementing our own writeback clustering use write_cache_pages > to do it for us. This means the guts of the current writepage implementation > become a new helper used both for implementing ->writepage and as a callback > to write_cache_pages for ->writepages. A new struct xfs_writeback_ctx > is used to track block mapping state and the ioend chain over multiple > invocation of it. > > The advantage over the old code is that we avoid a double pagevec lookup, > and a more efficient handling of extent boundaries inside a page for > small blocksize filesystems, as well as having less XFS specific code. It's not more efficient right now, due to a little bug: > @@ -973,36 +821,38 @@ xfs_vm_writepage( > * buffers covering holes here. > */ > if (!buffer_mapped(bh) && buffer_uptodate(bh)) { > - imap_valid = 0; > + ctx->imap_valid = 0; > continue; > } > > if (buffer_unwritten(bh)) { > if (type != IO_UNWRITTEN) { > type = IO_UNWRITTEN; > - imap_valid = 0; > + ctx->imap_valid = 0; > } > } else if (buffer_delay(bh)) { > if (type != IO_DELALLOC) { > type = IO_DELALLOC; > - imap_valid = 0; > + ctx->imap_valid = 0; > } > } else if (buffer_uptodate(bh)) { > if (type != IO_OVERWRITE) { > type = IO_OVERWRITE; > - imap_valid = 0; > + ctx->imap_valid = 0; > } > } else { > if (PageUptodate(page)) { > ASSERT(buffer_mapped(bh)); > - imap_valid = 0; > + ctx->imap_valid = 0; > } > continue; > } This piece of logic checks is the type of buffer has changed from the previous buffer. This used to work just fine, but now "type" is local to the __xfs_vm_writepage() function, while the imap life spanѕ multiple calls to the __xfs_vm_writepage() function. Hence type is reinitialised to IO_OVERWRITE on every page that written, and so for delalloc we are invalidating the imap and looking it up again on every page. Traces show this sort of behaviour: <...>-514 [000] 689640.881953: xfs_writepage: dev 253:16 ino 0x552248 pgoff 0xf7000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-514 [000] 689640.881954: xfs_ilock: dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks <...>-514 [000] 689640.881954: xfs_iunlock: dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks <...>-514 [000] 689640.881954: xfs_map_blocks_found: dev 253:16 ino 0x552248 size 0x0 new_size 0x0 offset 0xf7000 count 1024 type startoff 0x0 startblock 6297609 blockcount 0x2800 <...>-514 [000] 689640.881956: xfs_writepage: dev 253:16 ino 0x552248 pgoff 0xf8000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-514 [000] 689640.881957: xfs_ilock: dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks <...>-514 [000] 689640.881957: xfs_iunlock: dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks <...>-514 [000] 689640.881957: xfs_map_blocks_found: dev 253:16 ino 0x552248 size 0x0 new_size 0x0 offset 0xf8000 count 1024 type startoff 0x0 startblock 6297609 blockcount 0x2800 <...>-514 [000] 689640.881960: xfs_writepage: dev 253:16 ino 0x552248 pgoff 0xf9000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-514 [000] 689640.881960: xfs_ilock: dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks <...>-514 [000] 689640.881961: xfs_iunlock: dev 253:16 ino 0x552248 flags ILOCK_SHARED caller xfs_map_blocks <...>-514 [000] 689640.881961: xfs_map_blocks_found: dev 253:16 ino 0x552248 size 0x0 new_size 0x0 offset 0xf9000 count 1024 type startoff 0x0 startblock 6297609 blockcount 0x2800 IOWs, the type field also needs to be moved into the writepage context structure so that we don't keep doing needless extent map lookups. With the following patch, the trace output now looks like this for delalloc writeback: <...>-12623 [000] 694093.594883: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x505000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-12623 [000] 694093.594884: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x506000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-12623 [000] 694093.594884: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x507000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-12623 [000] 694093.594885: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x508000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-12623 [000] 694093.594885: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x509000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-12623 [000] 694093.594886: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x50a000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-12623 [000] 694093.594887: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x50b000 size 0xa00000 offset 0 delalloc 1 unwritten 0 <...>-12623 [000] 694093.594888: xfs_writepage: dev 253:16 ino 0x2300a5 pgoff 0x50c000 size 0xa00000 offset 0 delalloc 1 unwritten 0 i.e. there mapping lookup is no longer occurring for every page. As a side effect, the failure case I'm seeing with test 180 has gone from 5-10 files with the wrong size to >200 files with the wrong size with this patch, so clearly there is something wrong with file size updates getting to disk that this patch set makes worse. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx xfs: io type needs to be part of the writepage context From: Dave Chinner <dchinner@xxxxxxxxxx> If we don't pass the IO type we are mapping with the writeage context, then the imap is recalculated on every delalloc page that is passed to _xfs_vm_writepage(). This defeats the purpose of having a cached imap between calls and increases the overhead of delalloc writeback significantly. Fix this by moving the io type into the writepage context structure so that it moves with the cached imap through the stack. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/linux-2.6/xfs_aops.c | 30 ++++++++++++++++++------------ 1 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index 73dac4b..25b63cd 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -40,6 +40,7 @@ struct xfs_writeback_ctx { unsigned int imap_valid; + unsigned int io_type; struct xfs_bmbt_irec imap; struct xfs_ioend *iohead; struct xfs_ioend *ioend; @@ -804,7 +805,6 @@ __xfs_vm_writepage( bh = head = page_buffers(page); offset = page_offset(page); - type = IO_OVERWRITE; do { int new_ioend = 0; @@ -826,18 +826,18 @@ __xfs_vm_writepage( } if (buffer_unwritten(bh)) { - if (type != IO_UNWRITTEN) { - type = IO_UNWRITTEN; + if (ctx->io_type != IO_UNWRITTEN) { + ctx->io_type = IO_UNWRITTEN; ctx->imap_valid = 0; } } else if (buffer_delay(bh)) { - if (type != IO_DELALLOC) { - type = IO_DELALLOC; + if (ctx->io_type != IO_DELALLOC) { + ctx->io_type = IO_DELALLOC; ctx->imap_valid = 0; } } else if (buffer_uptodate(bh)) { - if (type != IO_OVERWRITE) { - type = IO_OVERWRITE; + if (ctx->io_type != IO_OVERWRITE) { + ctx->io_type = IO_OVERWRITE; ctx->imap_valid = 0; } } else { @@ -862,7 +862,8 @@ __xfs_vm_writepage( * time. */ new_ioend = 1; - err = xfs_map_blocks(inode, offset, &ctx->imap, type); + err = xfs_map_blocks(inode, offset, &ctx->imap, + ctx->io_type); if (err) goto error; ctx->imap_valid = @@ -870,11 +871,12 @@ __xfs_vm_writepage( } if (ctx->imap_valid) { lock_buffer(bh); - if (type != IO_OVERWRITE) { + if (ctx->io_type != IO_OVERWRITE) { xfs_map_at_offset(inode, bh, &ctx->imap, offset); } - xfs_add_to_ioend(ctx, inode, bh, offset, type, new_ioend); + xfs_add_to_ioend(ctx, inode, bh, offset, ctx->io_type, + new_ioend); count++; } } while (offset += len, ((bh = bh->b_this_page) != head)); @@ -902,7 +904,9 @@ xfs_vm_writepage( struct page *page, struct writeback_control *wbc) { - struct xfs_writeback_ctx ctx = { }; + struct xfs_writeback_ctx ctx = { + .io_type = IO_OVERWRITE, + }; int ret; /* @@ -939,7 +943,9 @@ xfs_vm_writepages( struct address_space *mapping, struct writeback_control *wbc) { - struct xfs_writeback_ctx ctx = { }; + struct xfs_writeback_ctx ctx = { + .io_type = IO_OVERWRITE, + }; int ret; xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs