Re: Can min_read_recency_for_promote be -1

Paul Emmerich <paul.emmerich@xxxxxxxx> · Tue, 3 Dec 2019 01:14:17 +0100

I've recently configured something like this for a backup cluster with
these settings:

ceph osd pool set cache_test hit_set_type bloom
ceph osd pool set cache_test hit_set_count 1
ceph osd pool set cache_test hit_set_period 7200
ceph osd pool set cache_test target_max_bytes 1000000000000
ceph osd pool set cache_test min_read_recency_for_promote 1
ceph osd pool set cache_test min_write_recency_for_promote 0
ceph osd pool set cache_test cache_target_dirty_ratio 0.00001
ceph osd pool set cache_test cache_target_dirty_high_ratio 0.33
ceph osd pool set cache_test cache_target_full_ratio 0.8

The goal here was just to handle bad IO patterns generated by bad
backup software (why do they love to run with a stupidly low queue
depth and small IOs?)
It's not ideal and doesn't really match your use case (since the data
in question isn't read back here)

But yeah, I also thought about building a specialized cache mode that
just acts as a write buffer, there are quite a few applications that
would benefit from that.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Dec 2, 2019 at 11:40 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>
> I'd like to configure a cache tier to act as a write buffer, so that if writes come in, it promotes objects, but reads never promote an object. We have a lot of cold data so we would like to tier down to an EC pool (CephFS) after a period of about 30 days to save space. The storage tier and the 'cache' tier would be on the same spindles, so the only performance improvement would be from the faster writes with replication. So we don't want to really move data between tiers.
>
> The idea would be to not promote on read since EC read performance is good enough and have writes go to the cache tier where the data may be 'hot' for a week or so, then get cold.
>
> It seems that we would only need one hit_set and if -1 can't be set for min_read_recency_for_promote, I could probably use 2 which would never hit because there is only one set, but that may error too. The follow up is how big a set should be as it only really tells if an object "may" be in cache and does not determine when things are flushed, so it really only matters how out-of-date we are okay with the bloom filter being out of date, right? So we could have it be a day long if we are okay with that stale rate? Is there any advantage to having a longer period for a bloom filter? Now, I'm starting to wonder if I even need a bloom filter for this use case, can I get tiering to work without it and only use cache_min_flush_age/cach_min_evict_age since I don't care about promoting when there are X hits in Y time?
>
> Thanks
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx