Re: Write back mode Cach-tier behavior

TYLin <wooertim@xxxxxxxxx> · Mon, 5 Jun 2017 15:32:00 +0800

Hi Christian,

Thanks for you quick reply.

> On Jun 5, 2017, at 2:01 PM, Christian Balzer <chibi@xxxxxxx> wrote:
> 
> 
> Hello,
> 
> On Mon, 5 Jun 2017 12:25:25 +0800 TYLin wrote:
> 
>> Hi all,
>> 
>> We’re using cache-tier with write-back mode but the write throughput is not as good as we expect. 
> 
> Numbers (what did you see and what did you expect?), versions, cluster
> HW/SW, etc etc.
> 

We use kraken 11.2.0. Our cluster has 8 nodes and each node consists of 7 HDD for storage pool (cephfs data and metadata), 3 ssd for data pool cache, 1 ssd for metadata pool cache. Public network and cluster network use same 10G NIC interface. We mount cephfs with kernel client on one of the nodes and use dd/fio to test its performance. The throughput of creating new file is about 400MB/s. However, the throughput of overwriting an existing file can reach more than 800MB/s. In our thoughts, the throughput of creating a new file and overwriting an existing file should not have that much difference. 

>> We use CephFS and create a 20GB file in it. While data is writing, we use iostat to get the disk statistics. From iostat, we saw that ssd (cache-tier) is idle most of the time and hdd (storage-tier) is busy all the time. From the document
> 
> While having no real experience with CephFS (with or w/o cache-tiers), I
> do think I know what you're seeing here, see below.
> 
>> 
>> “When admins configure tiers with writeback mode, Ceph clients write data to the cache tier and receive an ACK from the cache tier. In time, the data written to the cache tier migrates to the storage tier and gets flushed from the cache tier.”
>> 
>> So the data is write to cache-tier and then flush to storage tier when dirty ratio is more than 0.4? The word “in time” in the document confused me. 
>> 
>> We found that the throughput of creating a new file is slower than overwrite an existing file, and ssd has more write when doing overwrite. We then look into the source code and log. A newly created file goes to proxy_write, which is followed by a promote_object. Does this means that the object actually goes to storage pool directly and then be promoted to the cache-tier when creating a new file?
>> 
> 
> Creating a new file means creating new Ceph objects, which need to be
> present on both the backing store and the cache-tier. 
> That overhead of creating them is the difference in time you see.
> The actual data of the initial write will still be only on the cache-tier,
> btw.

You mean that when we create a new object, client will not get ACK until the data is written to storage pool (only journal?) and then promote to cache-tier ? If this is true, why we should wait until the object be written to both storage pool and cache-tier ? Can we use any configuration to force it write to cache-tier only and then flush to storage pool when the dirty ratio is reached? Just as what happened when overwrite an existing file. 

> 
> Once a file exists and is properly (not sparsely) allocated, writes should
> indeed just go to the cache-tier until flushing (space/time/object#)
> becomes necessary. 
> That of course also requires the cache being big enough and not too busy
> so that things stay actually in it.
> Otherwise those objects need to be promoted back in from the HDDs, making
> things slow again.
> 
> Tuning a cache-tier (both parameters and size in general) isn't easy and
> with some workloads pretty impossible to get desirable results.
> 
> 
> Christian
> -- 
> Christian Balzer        Network/Systems Engineer                
> chibi@xxxxxxx   	Rakuten Communications

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com