On 10/06/2017 12:42 PM, Michael Lyle wrote: > Coly-- > > Holy crap, I'm not surprised you don't see a difference if you're > writing with 512K size! The potential benefit from merging is much > less, and the odds of missing a merge is much smaller. 512KB is 5ms > sequential by itself on a 100MB/sec disk--- lots more time to wait to > get the next chunks in order, and even if you fail to merge the > potential benefit is much less-- if the difference is mostly > rotational latency from failing to merge then we're talking 5ms vs > 5+2ms. > > Do you even understand what you are trying to test? > > Mike > > On Fri, Oct 6, 2017 at 3:36 AM, Coly Li <i@xxxxxxx> wrote: >> On 2017/10/6 下午5:20, Michael Lyle wrote: >>> Coly-- >>> >>> I did not say the result from the changes will be random. >>> >>> I said the result from your test will be random, because where the >>> writeback position is making non-contiguous holes in the data is >>> nondeterministic-- it depends where it is on the disk at the instant >>> that writeback begins. There is a high degree of dispersion in the >>> test scenario you are running that is likely to exceed the differences >>> from my patch. >> >> Hi Mike, >> >> I did the test quite carefully. Here is how I ran the test, >> - disable writeback by echo 0 to writeback_runing. >> - write random data into cache to full or half size, then stop the I/O >> immediately. >> - echo 1 to writeback_runing to start writeback >> - and record performance data at once >> >> It might be random position where the writeback starts, but there should >> not be too much difference of statistical number of the continuous >> blocks (on cached device). Because fio just send random 512KB blocks >> onto cache device, the statistical number of contiguous blocks depends >> on cache device vs. cached device size, and how full the cache device is >> occupied. >> >> Indeed, I repeated some tests more than once (except the md raid5 and md >> raid0 configurations), the results are quite sable when I see the data >> charts, no big difference. >> >> If you feel the performance result I provided is problematic, it would >> be better to let the data talk. You need to show your performance test >> number to prove that the bio reorder patches are helpful for general >> workloads, or at least helpful to many typical workloads. >> >> Let the data talk. >> I think it would be easier for everyone concerned if Coly could attach the fio script / cmdline and the bcache setup here. There still is a chance that both are correct, as different hardware setups are being used. We've seen this many times trying to establish workable performance regression metrics for I/O; depending on the hardware one set of optimisations fail to deliver the expected benefit on other platforms. Just look at the discussion we're having currently with Ming Lei on the SCSI mailing list trying to improve sequential I/O performance. But please try to calm down everyone. It's not that Coly is deliberately blocking your patches, it's just that he doesn't see the performance benefit on his side. Might be that he's using the wrong parameters, but than that should be clarified once the fio script is posted. At the same time I don't think that the size of the dataset is immaterial. Larger datasets take up more space, and inevitably add more overhead just for looking up the data in memory. Plus Coly has some quite high-powered NVMe for the caching device, which will affect writeback patterns, too. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)