Re: MD write performance issue - found Catalyst patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Great!
So the dirty hack pumped at x16 does really work! (while we wait for Jens, as written in the patch: "To be reviewed again after Jens' writeback changes.") Thanks for having tried up to x32. Still Raid-6 xfs write is not yet up to the old speed... maybe the old code was better at filling RAID stripes exactly, who knows. Mark, yep, personally I would be very interested in seeing how does 2.6.31 perform on your hardware so I can e.g. see exactly how much my 3ware 9650 controllers suck... (so also pls try vanilla 3.6.31 which I think has an integrated x4 hack, do not just try with x16 please) We might also be interested in 2.6.32 performances if you have time, also because 2.6.32 includes the fixes for the CPU lockups in big arrays during resyncs which was reported on this list, and this is a good incentive for upgrading (Neil, btw, is there any chance those lockups fixes get backported to mainstream 2.6.31.x?).
Thank you!
Asdo


mark delfman wrote:
Hi Gents,

Attached is the result of some testing with the XFS patch... as we can
see it does make a reasonable difference!  Changing the value from
4,16,32 shows 16 is a good level...

Is this a 'safe' patch at 16?

I think that maybe there is still some performance to be gained,
especially in the R6 configs which is where most would be interested i
suspect.. but its a great start!


I think that i should jump up to maybe .31 and see how this reacts.....

Neil, i applied your writepage patch and have outputs if these are of
interest...

Thank you for the help with the pacthing and linux!!!!


mark



On Wed, Nov 4, 2009 at 5:25 PM, Asdo <asdo@xxxxxxxxxxxxx> wrote:
Hey great job Neil and Mark
Mark, your benchmarks seems to confirm Neil's analysis: ext2 and ext3 are
not slowed down from 2.6.28.5 and 2.6.28.6
Mark why don't you try to apply the patch below here by Eric Sandeen found
by Neil to the 2.6.28.6 to see if the xfs write performance comes back?
Thank you for your efforts
Asdo

mark delfman wrote:
Some FS comparisons attached in pdf

not sure what to make of them as yet, but worth posting


On Tue, Nov 3, 2009 at 12:11 PM, mark delfman
<markdelfman@xxxxxxxxxxxxxx> wrote:

Thanks Neil,

I seem to recall that I tried this on EXT3 and saw the same results as
XFS, but with your code and suggestions I think it is well worth me
trying some more tests and reporting back....


Mark

On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@xxxxxxx> wrote:

On Saturday October 31, markdelfman@xxxxxxxxxxxxxx wrote:

I am hopeful that you or another member of this group could offer some
advice / patch to implement the print options you suggested... if so i
would happily allocated resource and time to do what i can to help
with this.

I've spent a little while exploring this.
It appears to very definitely be an XFS problem, interacting in
interesting ways with the VM.

I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and
2.6.28.6 using each of xfs and ext2.

ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6
xfs gives 86MB/sec on .5 and only 51MB/sec on .6


When write_cache_pages is called it calls 'writepage' some number of
times.  On ext2, writepage will write at most one page.
On xfs writepage will sometimes write multiple pages.

I created a patch as below that prints (in a fairly cryptic way)
the number of 'writepage' calls and the number of pages that XFS
actually wrote.

For ext2, the number of writepage calls is at most 1536 and averages
around 140

For xfs with .5, there is usually only one call to writepage and it
writes around 800 pages.
For .6 there are about 200 calls to writepages but the achieve
an average of about 700 pages together.

So as you can see, there is very different behaviour.

I notice a more recent patch in XFS in mainline which looks like a
dirty hack to try to address this problem.

I suggest you try that patch and/or take this to the XFS developers.

NeilBrown



diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 08d2b96..aa4bccc 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping,
      int cycled;
      int range_whole = 0;
      long nr_to_write = wbc->nr_to_write;
+       long hidden_writes = 0;
+       long clear_writes = 0;

      if (wbc->nonblocking && bdi_write_congested(bdi)) {
              wbc->encountered_congestion = 1;
@@ -961,7 +963,11 @@ continue_unlock:
                      if (!clear_page_dirty_for_io(page))
                              goto continue_unlock;

+                       { int orig_nr_to_write = wbc->nr_to_write;
                      ret = (*writepage)(page, wbc, data);
+                       hidden_writes += orig_nr_to_write -
wbc->nr_to_write;
+                       clear_writes ++;
+                       }
                      if (unlikely(ret)) {
                              if (ret == AOP_WRITEPAGE_ACTIVATE) {
                                      unlock_page(page);
@@ -1008,12 +1014,37 @@ continue_unlock:
              end = writeback_index - 1;
              goto retry;
      }
+
      if (!wbc->no_nrwrite_index_update) {
              if (wbc->range_cyclic || (range_whole && nr_to_write > 0))
                      mapping->writeback_index = done_index;
              wbc->nr_to_write = nr_to_write;
      }

+       { static int sum, cnt, max;
+       static unsigned long previous;
+       static int sum2, max2;
+
+       sum += clear_writes;
+       cnt += 1;
+
+       if (max < clear_writes) max = clear_writes;
+
+       sum2 += hidden_writes;
+       if (max2 < hidden_writes) max2 = hidden_writes;
+
+       if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) {
+               printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d
sum2=%d max2=%d mean2=%d\n",
+                      sum, cnt, max, sum/cnt,
+                      sum2, max2, sum2/cnt);
+               sum = 0;
+               cnt = 0;
+               max = 0;
+               max2 = 0;
+               sum2 = 0;
+               previous = jiffies;
+       }
+       }
      return ret;
 }
 EXPORT_SYMBOL(write_cache_pages);


------------------------------------------------------
From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Fri, 31 Jul 2009 00:02:17 -0500
Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage

VM calculation for nr_to_write seems off.  Bump it way
up, this gets simple streaming writes zippy again.
To be reviewed again after Jens' writeback changes.

Signed-off-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxxx>
Cc: Chris Mason <chris.mason@xxxxxxxxxx>
Reviewed-by: Felix Blyakher <felixb@xxxxxxx>
Signed-off-by: Felix Blyakher <felixb@xxxxxxx>
---
 fs/xfs/linux-2.6/xfs_aops.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 7ec89fc..aecf251 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1268,6 +1268,14 @@ xfs_vm_writepage(
      if (!page_has_buffers(page))
              create_empty_buffers(page, 1 << inode->i_blkbits, 0);

+
+       /*
+        *  VM calculation for nr_to_write seems off.  Bump it way
+        *  up, this gets simple streaming writes zippy again.
+        *  To be reviewed again after Jens' writeback changes.
+        */
+       wbc->nr_to_write *= 4;
+
      /*
       * Convert delayed allocate, unwritten or unmapped space
       * to real space and flush out to disk.
--
1.6.4.3




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux