On Thu, 28 Jun 2012 20:29:21 -0500 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote: > On 6/28/2012 4:03 AM, NeilBrown wrote: > > On Wed, 13 Jun 2012 17:11:43 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote: > > > >> In raid1/10, all write requests are dispatched in a single thread. In fast > >> storage, the thread is a bottleneck, because it dispatches request too slow. > >> Also the thread migrates freely, which makes request completion cpu not match > >> with submission cpu even driver/block layer has such capability. This will > >> cause bad cache issue. Both these are not a big deal for slow storage. > >> > >> Switching the dispatching to percpu/perthread based dramatically increases > >> performance. The more raid disk number is, the more performance boosts. In a > >> 4-disk raid10 setup, this can double the throughput. > >> > >> percpu/perthread based dispatch doesn't harm slow storage. This is the way how > >> raw device is accessed, and there is correct block plug set which can help do > >> request merge and reduce lock contention. > >> > >> V2->V3: > >> rebase to latest tree and fix cpuhotplug issue > >> > >> V1->V2: > >> 1. droped direct dispatch patches. That has better performance imporvement, but > >> is hopelessly made correct. > >> 2. Add a MD specific workqueue to do percpu dispatch. > > > > I still don't like the per-cpu allocations and the extra work queues. > > Why don't you like this method Neil? Complexity? The performance seems > to be there. > Not an easy question to answer. It just doesn't "taste" nice. I certainly like the performance and if this is the only way to get that performance then we'll probably go that way. But I'm not convinced it is the only way and I want to explore other options first. I guess it feels a bit heavy handed. On machines with 1024 cores, per-cpu allocations and per-cpu threads are not as cheap as they are one 2-core machines. And I'm hoping for a 1024-core phone soon :-) NeilBrown
Attachment:
signature.asc
Description: PGP signature