On Fri, 2009-10-02 at 19:37 +0200, Jens Axboe wrote: > On Fri, Oct 02 2009, Ingo Molnar wrote: > > > > * Jens Axboe <jens.axboe@xxxxxxxxxx> wrote: > > > > > On Fri, Oct 02 2009, Ingo Molnar wrote: > > > > > > > > * Jens Axboe <jens.axboe@xxxxxxxxxx> wrote: > > > > > > > > > It's not _that_ easy, it depends a lot on the access patterns. A > > > > > good example of that is actually the idling that we already do. > > > > > Say you have two applications, each starting up. If you start them > > > > > both at the same time and just care for the dumb low latency, then > > > > > you'll do one IO from each of them in turn. Latency will be good, > > > > > but throughput will be aweful. And this means that in 20s they are > > > > > both started, while with the slice idling and priority disk access > > > > > that CFQ does, you'd hopefully have both up and running in 2s. > > > > > > > > > > So latency is good, definitely, but sometimes you have to worry > > > > > about the bigger picture too. Latency is more than single IOs, > > > > > it's often for complete operation which may involve lots of IOs. > > > > > Single IO latency is a benchmark thing, it's not a real life > > > > > issue. And that's where it becomes complex and not so black and > > > > > white. Mike's test is a really good example of that. > > > > > > > > To the extent of you arguing that Mike's test is artificial (i'm not > > > > sure you are arguing that) - Mike certainly did not do an artificial > > > > test - he tested 'konsole' cache-cold startup latency, such as: > > > > > > [snip] > > > > > > I was saying the exact opposite, that Mike's test is a good example of > > > a valid test. It's not measuring single IO latencies, it's doing a > > > sequence of valid events and looking at the latency for those. It's > > > benchmarking the bigger picture, not a microbenchmark. > > > > Good, so we are in violent agreement :-) > > Yes, perhaps that last sentence didn't provide enough evidence of which > category I put Mike's test into :-) > > So to kick things off, I added an 'interactive' knob to CFQ and > defaulted it to on, along with re-enabling slice idling for hardware > that does tagged command queuing. This is almost completely identical to > what Vivek Goyal originally posted, it's just combined into one and uses > the term 'interactive' instead of 'fairness'. I think the former is a > better umbrella under which to add further tweaks that may sacrifice > throughput slightly, in the quest for better latency. > > It's queued up in the for-linus branch. FWIW, I did a matrix of Vivek's patch combined with my hack. Seems we do lose a bit of dd throughput over stock with either or both. dd pre 65.1 65.4 67.5 64.8 65.1 65.5 fairness=1 overload_delay=1 perf stat 1.70 1.94 1.32 1.89 1.87 1.7 dd post 69.4 62.3 69.7 70.3 69.6 68.2 dd pre 67.0 67.8 64.7 64.7 64.9 65.8 fairness=1 overload_delay=0 perf stat 4.89 3.13 2.98 2.71 2.17 3.1 dd post 67.2 63.3 62.6 62.8 63.1 63.8 dd pre 65.0 66.0 66.9 64.6 67.0 65.9 fairness=0 overload_delay=1 perf stat 4.66 3.81 4.23 2.98 4.23 3.9 dd post 62.0 60.8 62.4 61.4 62.2 61.7 dd pre 65.3 65.6 64.9 69.5 65.8 66.2 fairness=0 overload_delay=0 perf stat 14.79 9.11 14.16 8.44 13.67 12.0 dd post 64.1 66.5 64.0 66.5 64.4 65.1 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel