Hello everyone!
I'm putting CephFS in production here to host Dovecot mailboxes. That's a big use case in the Dovecot community.
Versions:
Ubuntu 14.04 LTS with kerrnel 4.4.0-22-generic
Ceph 10.2.1-1trusty
CephFS uses the kernel client
Right now I'm migrating my users to this new systems. That should amount for about 60 million e-mail files.
Each user have 2 to 4 index files that are constantly changing. What I would like to happen is to have those index files and the most recent e-mail always on SSDs. Old stuff is rarely read so it can go to my HDs. This is my dream scenario. This is what I had before using different mount points and a Dovecot feature called alternate storage.
So, right now, CephFS metadata is going to SSD, because I forced it to do so and it works great, that is a lot of IOPS that is avoiding the drives. But, data requests are mostly going to the HD cold storage. Both write requests and read requests.
Take a look at my rados df right now:
pool name KB objects clones degraded unfound rd rd KB wr wr KB
cephfs_data 1792522784 17845293 0 2219 0 20275251 867880582 89877651 3117356568
cephfs_data_cache 88076847 779669 0 0 0 1215633 86279650 1329401 1783581
cephfs_metadata 37739 4604869 0 0 0 134018352 1377574788 99091571 467218571
rbd 0 0 0 0 0 0 0 0 0
total used 4133262872 23229831
total avail 54063412712
total space 58196675584
So less than 5% of the data reads and less than 2% of my data writes and using my SSD hot-storage. The fact that it is doing at least some IOPS on the hot-storage tells me that the setup is correct, that is just a matter of tunning it right. I have another cluster with a few VMs and 95% of the RBD IOPS are hot-storage IOPS, a huge hit.
Some more info
# ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 53 flags hashpspool stripe_width 0
pool 2 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 70 lfor 62 flags hashpspool crash_replay_interval 45 tiers 3 read_tier 3 write_tier 3 stripe_width 0
pool 3 'cephfs_data_cache' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 70 flags hashpspool,incomplete_clones tier_of 2 cache_mode writeback target_bytes 375809638400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 1200s x1 decay_rate 0 search_last_n 0 stripe_width 0
# ceph mds dump
dumped fsmap epoch 3474
fs_name cephfs
epoch 3474
flags 0
created 2016-05-07 12:23:46.560778
modified 2016-05-17 02:25:47.843710
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 917
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
max_mds 1
in 0
up {0=70341}
failed
damaged
stopped
data_pools 2
metadata_pool 1
inline_data disabled
70341: 10.0.1.4:6808/7241 'd' mds.0.3470 up:active seq 35
# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
So, what I would like is to things to go straight to SSD until they are old, not being accessed anymore and then go to HDs. That's what the docs say happens with writeback cache tiering so I don't know why it is not working for me here.
Thank you very much for any help.
Best,
Daniel Colchete
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com