In raid1/10, all write requests are dispatched in a single thread. In fast storage, the thread is a bottleneck, because it dispatches request too slow. Also the thread migrates freely, which makes request completion cpu not match with submission cpu even driver/block layer has such capability. This will cause bad cache issue. Both these are not a big deal for slow storage. Switching the dispatching to percpu/perthread based dramatically increases performance. The more raid disk number is, the more performance boosts. In a 4-disk raid10 setup, this can double the throughput. percpu/perthread based dispatch doesn't harm slow storage. This is the way how raw device is accessed, and there is correct block plug set which can help do request merge and reduce lock contention. V2->V3: rebase to latest tree and fix cpuhotplug issue V1->V2: 1. droped direct dispatch patches. That has better performance imporvement, but is hopelessly made correct. 2. Add a MD specific workqueue to do percpu dispatch. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html