Just one more note: On Fri, Oct 6, 2017 at 5:20 AM, Coly Li <i@xxxxxxx> wrote: > [snip] > And when we talked about patch 5/5, you mentioned 1MB writes: > "- When writeback rate is medium, it does I/O more efficiently. e.g. > if the current writeback rate is 10MB/sec, and there are two > contiguous 1MB segments, they would not presently be combined. A 1MB > write would occur, then we would increase the delay counter by 100ms, > and then the next write would wait; this new code would issue 2 1MB > writes one after the other, and then sleep 200ms. On a disk that does > 150MB/sec sequential, and has a 7ms seek time, this uses the disk for > 13ms + 7ms, compared to the old code that does 13ms + 7ms * 2. This > is the difference between using 10% of the disk's I/O throughput and > 13% of the disk's throughput to do the same work." > > Then I assume the bio reorder patches should work well for write size > from 4KB to 1MB. Also I think "hmm, if the write size is smaller, there > will be less chance for dirty blocks to be contiguous on cached device", > then I choose 512KB. Please note those two conversations were talking about different controversial stuff in response to different questions from you. There's the ordering change, which is expected to improve peak writeback rate for small blocks. The test scenarios we're running right now are mostly trying to measure that. There needs to be relatively small I/O to see a difference-- that is, backing disk access time needs to dominate to make a difference. The other is the change in I/O aggregation behavior when writeback rate is not maximum. That's what I was discussing with the 1MB I/O example. Unfortunately it is difficult to craft a test scenario to test this one, but the argument of why it is better is pretty simple: - It's much better to aggregate a few contiguous I/Os and issue them together than to issue them some time apart because of rate limiting. The new code will aggregate a limited number of contiguous writes even if it doesn't have to, to meet rate. That is, it's much better for disk utilization to issue 5 256k writes that are sequential to each other at the same time, and then delay 250ms instead of one at a time, 50ms apart. In the first case, it might be a 7ms access + 13ms of writing and then 230ms of sleeping (8% disk time used); in the second case, if user I/O is seeking the disk away from where we're writing, it'll be 5 groups of: 7ms access + 2.5ms writing and then 40.5ms of sleeping (24% disk time used). Mike