Write Caching to hot tier not working as expected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm a newbie to Ceph.  I'm an MSP and a small-scale cloud hoster.  I'm intending to use Ceph as production storage for a small-scale private hosting cloud.  We run ESXi as our HVs so we want to present Ceph as iSCSI.

We've got Ceph Nautilus running on a 3-node cluster.  Each node contains a pair of Bronze XEONs, 128G RAM, 6 x 10G NICs,  8 x 10TB spinners, 2 x 2TB SATA SSDs and a 4TB NVME.  

The HDDs and the 2TB SSDs give me an rbd pool of 24 OSDs (1024 PGs) with the SSDs partitioned and used to hold the DB and WAL.  Each SSD hold the DB/WAL for 4 HDDs.

The NVMes give me a cache pool of 3 OSDs (128 PGs) which I want to use as a hot tier.  As far as I can tell I have followed the guidance given here: https://docs.ceph.com/docs/nautilus/rados/operations/cache-tiering/

The cluster is working, the iSCSI is working, and generally everything is looking pretty good. My only problem at this stage is that the tiering is not handling writes in the way I expect, and I really can't get my brain around why.

For my test I start with the hot tier empty.  I drained it by setting dirty_ratio=dirty_high_ratio=full_ratio = 0.    I then set dirty_ratio = 0.5,  dirty_high_ratio = 0.6 and  full_ratio = 0.7   and start writing data to it at high speed (using a simple file copy) from a VM on ESXi.

My expectation is that all inbound writes will land initially on the hot tier, resulting in low latency writes as seen by the ESXi hosts.  I expect that once the hot tier fills to 50% that it will start to flush these writes down to the HDD storage.

What I actually see is that as soon as I start throwing data at the cluster, the Ceph dashboard shows writes going to both the NVMEs and to the HDDs, and the write latency seen by ESX hits several hundred milliseconds.  It seems that the hot tier is absorbing only a fraction of the writes.

Here are my pool settings. 

[root@ceph00 ~]# ceph osd pool ls detail
pool 4 'rbd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 6971 lfor 6971/6971/6971 flags hashpspool,selfmanaged_snaps tiers 11 read_tier 11 write_tier 11 stripe_width 0 application rbd
        removed_snaps [1~3]
pool 11 'cache' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 7065 lfor 6971/6971/6971 flags hashpspool,incomplete_clones,selfmanaged_snaps tier_of 4 cache_mode writeback target_bytes 3078632557772 hit_set bloom{false_positive_probability: 0.001, target_size: 0, seed: 0} 3600s x12 decay_rate 0 search_last_n 0 min_read_recency_for_promote 2 stripe_width 0 application rbd
        removed_snaps [1~3]
pool 12 'pure_hdd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 7059 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
        removed_snaps [1~3]


Is anyone able to point me towards a solution?

Thanks,
Steve
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux