Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

Coly Li <i@xxxxxxx> · Thu, 5 Oct 2017 02:43:56 +0800

On 2017/10/2 上午1:34, Michael Lyle wrote:
> On Sun, Oct 1, 2017 at 10:23 AM, Coly Li <i@xxxxxxx> wrote:
>> Hi Mike,
>>
>> Your data set is too small. Normally bcache users I talk with, they use
>> bcache for distributed storage cluster or commercial data base, their
>> catch device is large and fast. It is possible we see different I/O
>> behaviors because we use different configurations.
> 
> A small dataset is sufficient to tell whether the I/O subsystem is
> successfully aggregating sequential writes or not.  :P  It doesn't
> matter whether the test is 10 minutes or 10 hours...  The writeback
> stuff walks the data in order.  :P

Hi Mike,

I test your patch4,5 all these days, it turns out that your patch works
better when dirty data is full on cache device. And I can say it works
perfectly when dirty data is full on cache device and backing cached
device is a single spinning hard disk.

It is not because your data set is full, it is because when your data
set is small, the cache can be close to full state, then there is more
possibility to have adjacent dirty data blocks to writeback.

In the best case of your patch4,5, dirty data full on cache device and
cached device is a single spinning hard disk, writeback performance can
be 2x faster when there is no front end I/O. See one of the performmance
data (the lower the better)
http://blog.coly.li/wp-content/uploads/2017/10/existing_dirty_data_on_cache_single_disk_1T_full_cache.png

When the backing cached device gets faster and faster, your patch4,5
performs less and less advantage.

For same backing cached device size, when cache device gets smaller and
smaller, your patch4,5 performs less and less advantage.

And in the following configuration I find current bcache code performs
better (not too much) then your patch4,5 reorder method,
- cached device: A md linear device combined by 2x1.8T hard disks
- cache device: A 1800G NVMe SSD
- fio rand write blocksize 512K
- dirty data occupies 50% space of cache device (900G from 1800G)
One of the performance data can be found here,
http://blog.coly.li/wp-content/uploads/2017/10/existing_dirty_data_on_ssd_900_1800G_cache_half.png

> 
> ***We are measuring whether the cache and I/O scheduler can correctly
> order up-to-64-outstanding writebacks from a chunk of 500 dirty
> extents-- we do not need to do 12 hours of writes first to measure
> this.***
> 
> It's important that there be actual contiguous data, though, or the
> difference will be less significant.  If you write too much, there
> will be a lot more holes in the data from writeback during the test
> and from writes bypassing the cache.
>

I see,  your patches do perform better when dirty data are contiguous on
SSD. But we should know how much the probability in real world this
assumption can be real. Especially in some cases, your patches make
writeback performance slower than current bcache code does.

To test your patches, the following backing devices are used,
- md raid5 device composed by 4 hard disks
- md linear device composed by 2 hard disks
- md raid0 devices composed by 4 hard disks
- single 250G SATA SSD
- single 1.8T hard disk

And the following cache devices are used,
- 3.8T NVMe SSD
- 1.8T NVMe SSD partition
- 232G NVMe SSD partition

> Having all the data to writeback be sequential is an
> artificial/synthetic condition that allows the difference to be
> measured more easily.  It's about a 2x difference under these
> conditions in my test environment.  I expect with real data that is
> not purely sequential it's more like a few percent.

>From my test, it seems in the following situation your patches4,5 works
better,
1)   backing cached device is slow
2.1) higher percentage of (cache_device_size/cached_device_size), this
means dirty data on cache device has more probability to be contiguous.
or 2.2) dirty data on cache device is almost full

This is what I observe in these days testing. I will continue to try
tomorrow to see in which percentage of dirty data on cache device when
current bcache code performs worse than your patch. So far I see when
cache is 50% full, and cache device is half size of cached device, your
patch4,5 won't have performance advantage, in my testing environment.

All performance comparison png files are too big, once I finish the
final benchmark, I will combine them into a pdf file and share a link.

Thanks.

-- 
Coly Li
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html