Re: [patch 0/3 v3] MD: improve raid1/10 write performance for fast storage

Shaohua Li <shli@xxxxxxxxxx> · Mon, 2 Jul 2012 15:36:45 +0800

On Fri, Jun 29, 2012 at 02:10:30PM +0800, Shaohua Li wrote:
> 2012/6/28 NeilBrown <neilb@xxxxxxx>:
> > On Wed, 13 Jun 2012 17:11:43 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:
> >
> >> In raid1/10, all write requests are dispatched in a single thread. In fast
> >> storage, the thread is a bottleneck, because it dispatches request too slow.
> >> Also the thread migrates freely, which makes request completion cpu not match
> >> with submission cpu even driver/block layer has such capability. This will
> >> cause bad cache issue. Both these are not a big deal for slow storage.
> >>
> >> Switching the dispatching to percpu/perthread based dramatically increases
> >> performance.  The more raid disk number is, the more performance boosts. In a
> >> 4-disk raid10 setup, this can double the throughput.
> >>
> >> percpu/perthread based dispatch doesn't harm slow storage. This is the way how
> >> raw device is accessed, and there is correct block plug set which can help do
> >> request merge and reduce lock contention.
> >>
> >> V2->V3:
> >> rebase to latest tree and fix cpuhotplug issue
> >>
> >> V1->V2:
> >> 1. droped direct dispatch patches. That has better performance imporvement, but
> >> is hopelessly made correct.
> >> 2. Add a MD specific workqueue to do percpu dispatch.
> >
> >
> > Hi.
> >
> > I still don't like the per-cpu allocations and the extra work queues.
> >
> > The following patch demonstrates how I would like to address this issue.  It
> > should submit requests from the same thread that initially made the request -
> > at least in most cases.
> >
> > It leverages the plugging code and pushed everything out on the unplug,
> > unless that comes from a scheduler call (which should be uncommon).  In that
> > case it falls back on passing all the requests to the md thread.
> >
> > Obviously if we proceed with this I'll split this up into neat reviewable
> > patches.  However before that it would help to know if it really helps as I
> > think it should.
> >
> > So would you be able to test it on your SSD hardware and see how it compares
> > the current code, and to you code?  Thanks.
> >
> > I have only tested it lightly myself so there could still be bugs, but
> > hopefully not obvious ones.
> >
> > A simple "time mkfs" test on very modest hardware show as 25% reduction in
> > total time (168s -> 127s).  I guess that's a 33% increase in speed?
> > However sequential writes with 'dd' seem a little slower (14MB/s -> 13.6MB/s)
> >
> > There are some hacks in there that need to be cleaned up, but I think the
> > general structure looks good.
> 
> Thought I consider this approach before, and schedule from the unplug
> callback is an issue. Maybe I overlooked it at that time, the from_schedule
> check looks promising.

I tried raid1/raid10 performance with this patch (with similar change for
raid10, and add plug in the raid1/10 unplug function for dispatching), the
result is ok. The from_schedule check does the trick, there isn't race I
mentioned before. And I double checked the rate unplug is called from schedule,
which is very very low.

Now the only problem is if extra bitmap flush could be an overhead. Our card
hasn't such overhead, so not sure.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html