I will write a test bench and send results soon. Just please note-- you've crafted a test where there's not likely to be sequential data to writeback, and chosen a block size where there is limited difference between sequential and nonsequential writeback. Not surprisingly, you don't see a real difference with code that is trying to optimize sequential writeback :P Mike On Fri, Oct 6, 2017 at 3:42 AM, Michael Lyle <mlyle@xxxxxxxx> wrote: > Coly-- > > Holy crap, I'm not surprised you don't see a difference if you're > writing with 512K size! The potential benefit from merging is much > less, and the odds of missing a merge is much smaller. 512KB is 5ms > sequential by itself on a 100MB/sec disk--- lots more time to wait to > get the next chunks in order, and even if you fail to merge the > potential benefit is much less-- if the difference is mostly > rotational latency from failing to merge then we're talking 5ms vs > 5+2ms. > > Do you even understand what you are trying to test? > > Mike > > On Fri, Oct 6, 2017 at 3:36 AM, Coly Li <i@xxxxxxx> wrote: >> On 2017/10/6 下午5:20, Michael Lyle wrote: >>> Coly-- >>> >>> I did not say the result from the changes will be random. >>> >>> I said the result from your test will be random, because where the >>> writeback position is making non-contiguous holes in the data is >>> nondeterministic-- it depends where it is on the disk at the instant >>> that writeback begins. There is a high degree of dispersion in the >>> test scenario you are running that is likely to exceed the differences >>> from my patch. >> >> Hi Mike, >> >> I did the test quite carefully. Here is how I ran the test, >> - disable writeback by echo 0 to writeback_runing. >> - write random data into cache to full or half size, then stop the I/O >> immediately. >> - echo 1 to writeback_runing to start writeback >> - and record performance data at once >> >> It might be random position where the writeback starts, but there should >> not be too much difference of statistical number of the continuous >> blocks (on cached device). Because fio just send random 512KB blocks >> onto cache device, the statistical number of contiguous blocks depends >> on cache device vs. cached device size, and how full the cache device is >> occupied. >> >> Indeed, I repeated some tests more than once (except the md raid5 and md >> raid0 configurations), the results are quite sable when I see the data >> charts, no big difference. >> >> If you feel the performance result I provided is problematic, it would >> be better to let the data talk. You need to show your performance test >> number to prove that the bio reorder patches are helpful for general >> workloads, or at least helpful to many typical workloads. >> >> Let the data talk. >> >> Thanks. >> >> -- >> Coly Li