Re: [patch 0/3 v3] MD: improve raid1/10 write performance for fast storage

Shaohua Li <shli@xxxxxxxxxx> · Fri, 29 Jun 2012 14:10:30 +0800

2012/6/28 NeilBrown <neilb@xxxxxxx>:
> On Wed, 13 Jun 2012 17:11:43 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:
>
>> In raid1/10, all write requests are dispatched in a single thread. In fast
>> storage, the thread is a bottleneck, because it dispatches request too slow.
>> Also the thread migrates freely, which makes request completion cpu not match
>> with submission cpu even driver/block layer has such capability. This will
>> cause bad cache issue. Both these are not a big deal for slow storage.
>>
>> Switching the dispatching to percpu/perthread based dramatically increases
>> performance.  The more raid disk number is, the more performance boosts. In a
>> 4-disk raid10 setup, this can double the throughput.
>>
>> percpu/perthread based dispatch doesn't harm slow storage. This is the way how
>> raw device is accessed, and there is correct block plug set which can help do
>> request merge and reduce lock contention.
>>
>> V2->V3:
>> rebase to latest tree and fix cpuhotplug issue
>>
>> V1->V2:
>> 1. droped direct dispatch patches. That has better performance imporvement, but
>> is hopelessly made correct.
>> 2. Add a MD specific workqueue to do percpu dispatch.
>
>
> Hi.
>
> I still don't like the per-cpu allocations and the extra work queues.
>
> The following patch demonstrates how I would like to address this issue.  It
> should submit requests from the same thread that initially made the request -
> at least in most cases.
>
> It leverages the plugging code and pushed everything out on the unplug,
> unless that comes from a scheduler call (which should be uncommon).  In that
> case it falls back on passing all the requests to the md thread.
>
> Obviously if we proceed with this I'll split this up into neat reviewable
> patches.  However before that it would help to know if it really helps as I
> think it should.
>
> So would you be able to test it on your SSD hardware and see how it compares
> the current code, and to you code?  Thanks.
>
> I have only tested it lightly myself so there could still be bugs, but
> hopefully not obvious ones.
>
> A simple "time mkfs" test on very modest hardware show as 25% reduction in
> total time (168s -> 127s).  I guess that's a 33% increase in speed?
> However sequential writes with 'dd' seem a little slower (14MB/s -> 13.6MB/s)
>
> There are some hacks in there that need to be cleaned up, but I think the
> general structure looks good.

Thought I consider this approach before, and schedule from the unplug
callback is an issue. Maybe I overlooked it at that time, the from_schedule
check looks promising.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html