Re: MD write performance issue - found Catalyst patches

mark delfman <markdelfman@xxxxxxxxxxxxxx> · Fri, 6 Nov 2009 15:51:40 +0000

Attached are later kernel results..... Not an awful lot of difference
(apart from the native due to the fact 28.6 doesnt have the pacth
included).... 32.rc6 is certainly upto 10% faster on R6

Note we running around 10 test on each, this is a low number for
averages and the result move around 100MB plus.. but in this case we
did not need to be over accurate... they show maybe 20% reduction
writing to FS as opposed to direct to MD.

Whilst the reads on XFS are now 20% 'faster' on XFS than to the raw
device (reaching 2GBs)... .3X seems better at read caching on XFS. I
have only graphed the writes...

Mark

On Thu, Nov 5, 2009 at 7:09 PM, Asdo <asdo@xxxxxxxxxxxxx> wrote:
> Great!
> So the dirty hack pumped at x16 does really work! (while we wait for Jens,
> as written in the patch: "To be reviewed again after Jens' writeback
> changes.") Thanks for having tried up to x32.
> Still Raid-6 xfs write is not yet up to the old speed... maybe the old code
> was better at filling RAID stripes exactly, who knows.
> Mark, yep, personally I would be very interested in seeing how does 2.6.31
> perform on your hardware so I can e.g. see exactly how much my 3ware 9650
> controllers suck... (so also pls try vanilla 3.6.31 which I think has an
> integrated x4 hack, do not just try with x16 please)
> We might also be interested in 2.6.32 performances if you have time, also
> because 2.6.32 includes the fixes for the CPU lockups in big arrays during
> resyncs which was reported on this list, and this is a good incentive for
> upgrading (Neil, btw, is there any chance those lockups fixes get backported
> to mainstream 2.6.31.x?).
> Thank you!
> Asdo
>
>
> mark delfman wrote:
>>
>> Hi Gents,
>>
>> Attached is the result of some testing with the XFS patch... as we can
>> see it does make a reasonable difference!  Changing the value from
>> 4,16,32 shows 16 is a good level...
>>
>> Is this a 'safe' patch at 16?
>>
>> I think that maybe there is still some performance to be gained,
>> especially in the R6 configs which is where most would be interested i
>> suspect.. but its a great start!
>>
>>
>> I think that i should jump up to maybe .31 and see how this reacts.....
>>
>> Neil, i applied your writepage patch and have outputs if these are of
>> interest...
>>
>> Thank you for the help with the pacthing and linux!!!!
>>
>>
>> mark
>>
>>
>>
>> On Wed, Nov 4, 2009 at 5:25 PM, Asdo <asdo@xxxxxxxxxxxxx> wrote:
>>
>>>
>>> Hey great job Neil and Mark
>>> Mark, your benchmarks seems to confirm Neil's analysis: ext2 and ext3 are
>>> not slowed down from 2.6.28.5 and 2.6.28.6
>>> Mark why don't you try to apply the patch below here by Eric Sandeen
>>> found
>>> by Neil to the 2.6.28.6 to see if the xfs write performance comes back?
>>> Thank you for your efforts
>>> Asdo
>>>
>>> mark delfman wrote:
>>>
>>>>
>>>> Some FS comparisons attached in pdf
>>>>
>>>> not sure what to make of them as yet, but worth posting
>>>>
>>>>
>>>> On Tue, Nov 3, 2009 at 12:11 PM, mark delfman
>>>> <markdelfman@xxxxxxxxxxxxxx> wrote:
>>>>
>>>>
>>>>>
>>>>> Thanks Neil,
>>>>>
>>>>> I seem to recall that I tried this on EXT3 and saw the same results as
>>>>> XFS, but with your code and suggestions I think it is well worth me
>>>>> trying some more tests and reporting back....
>>>>>
>>>>>
>>>>> Mark
>>>>>
>>>>> On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@xxxxxxx> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> On Saturday October 31, markdelfman@xxxxxxxxxxxxxx wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I am hopeful that you or another member of this group could offer
>>>>>>> some
>>>>>>> advice / patch to implement the print options you suggested... if so
>>>>>>> i
>>>>>>> would happily allocated resource and time to do what i can to help
>>>>>>> with this.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I've spent a little while exploring this.
>>>>>> It appears to very definitely be an XFS problem, interacting in
>>>>>> interesting ways with the VM.
>>>>>>
>>>>>> I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and
>>>>>> 2.6.28.6 using each of xfs and ext2.
>>>>>>
>>>>>> ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6
>>>>>> xfs gives 86MB/sec on .5 and only 51MB/sec on .6
>>>>>>
>>>>>>
>>>>>> When write_cache_pages is called it calls 'writepage' some number of
>>>>>> times.  On ext2, writepage will write at most one page.
>>>>>> On xfs writepage will sometimes write multiple pages.
>>>>>>
>>>>>> I created a patch as below that prints (in a fairly cryptic way)
>>>>>> the number of 'writepage' calls and the number of pages that XFS
>>>>>> actually wrote.
>>>>>>
>>>>>> For ext2, the number of writepage calls is at most 1536 and averages
>>>>>> around 140
>>>>>>
>>>>>> For xfs with .5, there is usually only one call to writepage and it
>>>>>> writes around 800 pages.
>>>>>> For .6 there are about 200 calls to writepages but the achieve
>>>>>> an average of about 700 pages together.
>>>>>>
>>>>>> So as you can see, there is very different behaviour.
>>>>>>
>>>>>> I notice a more recent patch in XFS in mainline which looks like a
>>>>>> dirty hack to try to address this problem.
>>>>>>
>>>>>> I suggest you try that patch and/or take this to the XFS developers.
>>>>>>
>>>>>> NeilBrown
>>>>>>
>>>>>>
>>>>>>
>>>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>>>> index 08d2b96..aa4bccc 100644
>>>>>> --- a/mm/page-writeback.c
>>>>>> +++ b/mm/page-writeback.c
>>>>>> @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space
>>>>>> *mapping,
>>>>>>      int cycled;
>>>>>>      int range_whole = 0;
>>>>>>      long nr_to_write = wbc->nr_to_write;
>>>>>> +       long hidden_writes = 0;
>>>>>> +       long clear_writes = 0;
>>>>>>
>>>>>>      if (wbc->nonblocking && bdi_write_congested(bdi)) {
>>>>>>              wbc->encountered_congestion = 1;
>>>>>> @@ -961,7 +963,11 @@ continue_unlock:
>>>>>>                      if (!clear_page_dirty_for_io(page))
>>>>>>                              goto continue_unlock;
>>>>>>
>>>>>> +                       { int orig_nr_to_write = wbc->nr_to_write;
>>>>>>                      ret = (*writepage)(page, wbc, data);
>>>>>> +                       hidden_writes += orig_nr_to_write -
>>>>>> wbc->nr_to_write;
>>>>>> +                       clear_writes ++;
>>>>>> +                       }
>>>>>>                      if (unlikely(ret)) {
>>>>>>                              if (ret == AOP_WRITEPAGE_ACTIVATE) {
>>>>>>                                      unlock_page(page);
>>>>>> @@ -1008,12 +1014,37 @@ continue_unlock:
>>>>>>              end = writeback_index - 1;
>>>>>>              goto retry;
>>>>>>      }
>>>>>> +
>>>>>>      if (!wbc->no_nrwrite_index_update) {
>>>>>>              if (wbc->range_cyclic || (range_whole && nr_to_write >
>>>>>> 0))
>>>>>>                      mapping->writeback_index = done_index;
>>>>>>              wbc->nr_to_write = nr_to_write;
>>>>>>      }
>>>>>>
>>>>>> +       { static int sum, cnt, max;
>>>>>> +       static unsigned long previous;
>>>>>> +       static int sum2, max2;
>>>>>> +
>>>>>> +       sum += clear_writes;
>>>>>> +       cnt += 1;
>>>>>> +
>>>>>> +       if (max < clear_writes) max = clear_writes;
>>>>>> +
>>>>>> +       sum2 += hidden_writes;
>>>>>> +       if (max2 < hidden_writes) max2 = hidden_writes;
>>>>>> +
>>>>>> +       if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) {
>>>>>> +               printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d
>>>>>> sum2=%d max2=%d mean2=%d\n",
>>>>>> +                      sum, cnt, max, sum/cnt,
>>>>>> +                      sum2, max2, sum2/cnt);
>>>>>> +               sum = 0;
>>>>>> +               cnt = 0;
>>>>>> +               max = 0;
>>>>>> +               max2 = 0;
>>>>>> +               sum2 = 0;
>>>>>> +               previous = jiffies;
>>>>>> +       }
>>>>>> +       }
>>>>>>      return ret;
>>>>>>  }
>>>>>>  EXPORT_SYMBOL(write_cache_pages);
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001
>>>>>> From: Eric Sandeen <sandeen@xxxxxxxxxxx>
>>>>>> Date: Fri, 31 Jul 2009 00:02:17 -0500
>>>>>> Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage
>>>>>>
>>>>>> VM calculation for nr_to_write seems off.  Bump it way
>>>>>> up, this gets simple streaming writes zippy again.
>>>>>> To be reviewed again after Jens' writeback changes.
>>>>>>
>>>>>> Signed-off-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
>>>>>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxxx>
>>>>>> Cc: Chris Mason <chris.mason@xxxxxxxxxx>
>>>>>> Reviewed-by: Felix Blyakher <felixb@xxxxxxx>
>>>>>> Signed-off-by: Felix Blyakher <felixb@xxxxxxx>
>>>>>> ---
>>>>>>  fs/xfs/linux-2.6/xfs_aops.c |    8 ++++++++
>>>>>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
>>>>>> index 7ec89fc..aecf251 100644
>>>>>> --- a/fs/xfs/linux-2.6/xfs_aops.c
>>>>>> +++ b/fs/xfs/linux-2.6/xfs_aops.c
>>>>>> @@ -1268,6 +1268,14 @@ xfs_vm_writepage(
>>>>>>      if (!page_has_buffers(page))
>>>>>>              create_empty_buffers(page, 1 << inode->i_blkbits, 0);
>>>>>>
>>>>>> +
>>>>>> +       /*
>>>>>> +        *  VM calculation for nr_to_write seems off.  Bump it way
>>>>>> +        *  up, this gets simple streaming writes zippy again.
>>>>>> +        *  To be reviewed again after Jens' writeback changes.
>>>>>> +        */
>>>>>> +       wbc->nr_to_write *= 4;
>>>>>> +
>>>>>>      /*
>>>>>>       * Convert delayed allocate, unwritten or unmapped space
>>>>>>       * to real space and flush out to disk.
>>>>>> --
>>>>>> 1.6.4.3
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>
>>>
>
>
Attachment:
XFSvMD_2.pdf

Description: Adobe PDF document