CephFS Jewel not using cache tiering much

Daniel van Ham Colchete <daniel.colchete@xxxxxxxxx> · Tue, 17 May 2016 06:10:11 -0300

Hello everyone!
I'm putting CephFS in production here to host Dovecot mailboxes. That's a big use case in the Dovecot community. 

Versions:
Ubuntu 14.04 LTS with kerrnel 4.4.0-22-generic
Ceph 10.2.1-1trusty
CephFS uses the kernel client

Right now I'm migrating my users to this new systems. That should amount for about 60 million e-mail files. 

Each user have 2 to 4 index files that are constantly changing. What I would like to happen is to have those index files and the most recent e-mail always on SSDs. Old stuff is rarely read so it can go to my HDs. This is my dream scenario. This is what I had before using different mount points and a Dovecot feature called alternate storage. 

So, right now, CephFS metadata is going to SSD, because I forced it to do so and it works great, that is a lot of IOPS that is avoiding the drives. But, data requests are mostly going to the HD cold storage. Both write requests and read requests.

Take a look at my rados df right now:
pool name                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
cephfs_data       1792522784     17845293            0         2219            0     20275251    867880582     89877651   3117356568
cephfs_data_cache     88076847       779669            0            0            0      1215633     86279650      1329401      1783581
cephfs_metadata        37739      4604869            0            0            0    134018352   1377574788     99091571    467218571
rbd                        0            0            0            0            0            0            0            0            0
  total used      4133262872     23229831
  total avail    54063412712
  total space    58196675584

So less than 5% of the data reads and less than 2% of my data writes and using my SSD hot-storage. The fact that it is doing at least some IOPS on the hot-storage tells me that the setup is correct, that is just a matter of tunning it right. I have another cluster with a few VMs and 95% of the RBD IOPS are hot-storage IOPS, a huge hit.

Some more info
# ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 53 flags hashpspool stripe_width 0
pool 2 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 70 lfor 62 flags hashpspool crash_replay_interval 45 tiers 3 read_tier 3 write_tier 3 stripe_width 0
pool 3 'cephfs_data_cache' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 70 flags hashpspool,incomplete_clones tier_of 2 cache_mode writeback target_bytes 375809638400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 1200s x1 decay_rate 0 search_last_n 0 stripe_width 0

# ceph mds dump
dumped fsmap epoch 3474
fs_name cephfs
epoch   3474
flags   0
created 2016-05-07 12:23:46.560778
modified        2016-05-17 02:25:47.843710
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
last_failure    0
last_failure_osd_epoch  917
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
max_mds 1
in      0
up      {0=70341}
failed  
damaged 
stopped 
data_pools      2
metadata_pool   1
inline_data     disabled
70341:  10.0.1.4:6808/7241 'd' mds.0.3470 up:active seq 35

# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

So, what I would like is to things to go straight to SSD until they are old, not being accessed anymore and then go to HDs. That's what the docs say happens with writeback cache tiering so I don't know why it is not working for me here.

Thank you very much for any help.

Best,
Daniel Colchete
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com