On Saturday October 31, markdelfman@xxxxxxxxxxxxxx wrote:
I am hopeful that you or another member of this group could offer some
advice / patch to implement the print options you suggested... if so i
would happily allocated resource and time to do what i can to help
with this.
I've spent a little while exploring this.
It appears to very definitely be an XFS problem, interacting in
interesting ways with the VM.
I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and
2.6.28.6 using each of xfs and ext2.
ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6
xfs gives 86MB/sec on .5 and only 51MB/sec on .6
When write_cache_pages is called it calls 'writepage' some number of
times. On ext2, writepage will write at most one page.
On xfs writepage will sometimes write multiple pages.
I created a patch as below that prints (in a fairly cryptic way)
the number of 'writepage' calls and the number of pages that XFS
actually wrote.
For ext2, the number of writepage calls is at most 1536 and averages
around 140
For xfs with .5, there is usually only one call to writepage and it
writes around 800 pages.
For .6 there are about 200 calls to writepages but the achieve
an average of about 700 pages together.
So as you can see, there is very different behaviour.
I notice a more recent patch in XFS in mainline which looks like a
dirty hack to try to address this problem.
I suggest you try that patch and/or take this to the XFS developers.
NeilBrown
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 08d2b96..aa4bccc 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping,
int cycled;
int range_whole = 0;
long nr_to_write = wbc->nr_to_write;
+ long hidden_writes = 0;
+ long clear_writes = 0;
if (wbc->nonblocking && bdi_write_congested(bdi)) {
wbc->encountered_congestion = 1;
@@ -961,7 +963,11 @@ continue_unlock:
if (!clear_page_dirty_for_io(page))
goto continue_unlock;
+ { int orig_nr_to_write = wbc->nr_to_write;
ret = (*writepage)(page, wbc, data);
+ hidden_writes += orig_nr_to_write - wbc->nr_to_write;
+ clear_writes ++;
+ }
if (unlikely(ret)) {
if (ret == AOP_WRITEPAGE_ACTIVATE) {
unlock_page(page);
@@ -1008,12 +1014,37 @@ continue_unlock:
end = writeback_index - 1;
goto retry;
}
+
if (!wbc->no_nrwrite_index_update) {
if (wbc->range_cyclic || (range_whole && nr_to_write > 0))
mapping->writeback_index = done_index;
wbc->nr_to_write = nr_to_write;
}
+ { static int sum, cnt, max;
+ static unsigned long previous;
+ static int sum2, max2;
+
+ sum += clear_writes;
+ cnt += 1;
+
+ if (max < clear_writes) max = clear_writes;
+
+ sum2 += hidden_writes;
+ if (max2 < hidden_writes) max2 = hidden_writes;
+
+ if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) {
+ printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d sum2=%d max2=%d mean2=%d\n",
+ sum, cnt, max, sum/cnt,
+ sum2, max2, sum2/cnt);
+ sum = 0;
+ cnt = 0;
+ max = 0;
+ max2 = 0;
+ sum2 = 0;
+ previous = jiffies;
+ }
+ }
return ret;
}
EXPORT_SYMBOL(write_cache_pages);
------------------------------------------------------
From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Fri, 31 Jul 2009 00:02:17 -0500
Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage
VM calculation for nr_to_write seems off. Bump it way
up, this gets simple streaming writes zippy again.
To be reviewed again after Jens' writeback changes.
Signed-off-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxxx>
Cc: Chris Mason <chris.mason@xxxxxxxxxx>
Reviewed-by: Felix Blyakher <felixb@xxxxxxx>
Signed-off-by: Felix Blyakher <felixb@xxxxxxx>
---
fs/xfs/linux-2.6/xfs_aops.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 7ec89fc..aecf251 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1268,6 +1268,14 @@ xfs_vm_writepage(
if (!page_has_buffers(page))
create_empty_buffers(page, 1 << inode->i_blkbits, 0);
+
+ /*
+ * VM calculation for nr_to_write seems off. Bump it way
+ * up, this gets simple streaming writes zippy again.
+ * To be reviewed again after Jens' writeback changes.
+ */
+ wbc->nr_to_write *= 4;
+
/*
* Convert delayed allocate, unwritten or unmapped space
* to real space and flush out to disk.
--
1.6.4.3