Re: MD write performance issue - found Catalyst patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Neil,

I seem to recall that I tried this on EXT3 and saw the same results as
XFS, but with your code and suggestions I think it is well worth me
trying some more tests and reporting back....


Mark

On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@xxxxxxx> wrote:
> On Saturday October 31, markdelfman@xxxxxxxxxxxxxx wrote:
>>
>> I am hopeful that you or another member of this group could offer some
>> advice / patch to implement the print options you suggested... if so i
>> would happily allocated resource and time to do what i can to help
>> with this.
>
>
> I've spent a little while exploring this.
> It appears to very definitely be an XFS problem, interacting in
> interesting ways with the VM.
>
> I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and
> 2.6.28.6 using each of xfs and ext2.
>
> ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6
> xfs gives 86MB/sec on .5 and only 51MB/sec on .6
>
>
> When write_cache_pages is called it calls 'writepage' some number of
> times.  On ext2, writepage will write at most one page.
> On xfs writepage will sometimes write multiple pages.
>
> I created a patch as below that prints (in a fairly cryptic way)
> the number of 'writepage' calls and the number of pages that XFS
> actually wrote.
>
> For ext2, the number of writepage calls is at most 1536 and averages
> around 140
>
> For xfs with .5, there is usually only one call to writepage and it
> writes around 800 pages.
> For .6 there are about 200 calls to writepages but the achieve
> an average of about 700 pages together.
>
> So as you can see, there is very different behaviour.
>
> I notice a more recent patch in XFS in mainline which looks like a
> dirty hack to try to address this problem.
>
> I suggest you try that patch and/or take this to the XFS developers.
>
> NeilBrown
>
>
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 08d2b96..aa4bccc 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping,
>        int cycled;
>        int range_whole = 0;
>        long nr_to_write = wbc->nr_to_write;
> +       long hidden_writes = 0;
> +       long clear_writes = 0;
>
>        if (wbc->nonblocking && bdi_write_congested(bdi)) {
>                wbc->encountered_congestion = 1;
> @@ -961,7 +963,11 @@ continue_unlock:
>                        if (!clear_page_dirty_for_io(page))
>                                goto continue_unlock;
>
> +                       { int orig_nr_to_write = wbc->nr_to_write;
>                        ret = (*writepage)(page, wbc, data);
> +                       hidden_writes += orig_nr_to_write - wbc->nr_to_write;
> +                       clear_writes ++;
> +                       }
>                        if (unlikely(ret)) {
>                                if (ret == AOP_WRITEPAGE_ACTIVATE) {
>                                        unlock_page(page);
> @@ -1008,12 +1014,37 @@ continue_unlock:
>                end = writeback_index - 1;
>                goto retry;
>        }
> +
>        if (!wbc->no_nrwrite_index_update) {
>                if (wbc->range_cyclic || (range_whole && nr_to_write > 0))
>                        mapping->writeback_index = done_index;
>                wbc->nr_to_write = nr_to_write;
>        }
>
> +       { static int sum, cnt, max;
> +       static unsigned long previous;
> +       static int sum2, max2;
> +
> +       sum += clear_writes;
> +       cnt += 1;
> +
> +       if (max < clear_writes) max = clear_writes;
> +
> +       sum2 += hidden_writes;
> +       if (max2 < hidden_writes) max2 = hidden_writes;
> +
> +       if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) {
> +               printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d sum2=%d max2=%d mean2=%d\n",
> +                      sum, cnt, max, sum/cnt,
> +                      sum2, max2, sum2/cnt);
> +               sum = 0;
> +               cnt = 0;
> +               max = 0;
> +               max2 = 0;
> +               sum2 = 0;
> +               previous = jiffies;
> +       }
> +       }
>        return ret;
>  }
>  EXPORT_SYMBOL(write_cache_pages);
>
>
> ------------------------------------------------------
> From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001
> From: Eric Sandeen <sandeen@xxxxxxxxxxx>
> Date: Fri, 31 Jul 2009 00:02:17 -0500
> Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage
>
> VM calculation for nr_to_write seems off.  Bump it way
> up, this gets simple streaming writes zippy again.
> To be reviewed again after Jens' writeback changes.
>
> Signed-off-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxxx>
> Cc: Chris Mason <chris.mason@xxxxxxxxxx>
> Reviewed-by: Felix Blyakher <felixb@xxxxxxx>
> Signed-off-by: Felix Blyakher <felixb@xxxxxxx>
> ---
>  fs/xfs/linux-2.6/xfs_aops.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
> index 7ec89fc..aecf251 100644
> --- a/fs/xfs/linux-2.6/xfs_aops.c
> +++ b/fs/xfs/linux-2.6/xfs_aops.c
> @@ -1268,6 +1268,14 @@ xfs_vm_writepage(
>        if (!page_has_buffers(page))
>                create_empty_buffers(page, 1 << inode->i_blkbits, 0);
>
> +
> +       /*
> +        *  VM calculation for nr_to_write seems off.  Bump it way
> +        *  up, this gets simple streaming writes zippy again.
> +        *  To be reviewed again after Jens' writeback changes.
> +        */
> +       wbc->nr_to_write *= 4;
> +
>        /*
>         * Convert delayed allocate, unwritten or unmapped space
>         * to real space and flush out to disk.
> --
> 1.6.4.3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux