Re: SSD Caching Mode Question

Nick Fisk <nick@xxxxxxxxxx> · Wed, 18 Nov 2015 10:38:30 -0000

Hi Robert,

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Robert LeBlanc
> Sent: 18 November 2015 00:47
> To: Ceph-User <ceph-users@xxxxxxxx>
> Subject:  SSD Caching Mode Question
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> We are inserting an SSD tier into our very busy cluster and I have a
question
> regrading writeback and forward modes.
> 
> Write back is the "normal" mode for RBD with VMs. When we put the tier in
> writeback mode we see objects are being promoted and once the ratio is
> reached objects are evicted, this works as expected. When we place the
tier
> into forward mode, we don't see any objects being evicted to the base tier
> when they are written to as described in the manual [1].
> Is this a bug? We are running 0.94.5.
> 
> Now, I usually like things to work they way they are described in the
manual,
> however this "bug" is a bit advantageous for us. It appears that we don't
> have enough IOPs in the SSD tier to handle the steady state (we still
have
> some more SSDs to add in, but it requires shuffling hardware around).
> However, when we put the tier into forward mode, the latency drops and
> we get much more performance from the Ceph cluster. In write back we
> seem to be capped at about 9K IOPs accroding to ceph -w with spikes up to
> about 15K. However in forward mode we can hit 65K IOPs and have a stead
> state near 30K IOPs. I'm linking two graphs to show what I'm describing
(for
> some reason the graphs seem to be half of what is reported by ceph -w).
> [2][3]
> 

I don't know if your lower performance is due to unwanted promotions to
cache or if you are seeing something else. I have found that the way the
cache logic currently works unless the bulk of your working set fits in the
cache tier the overhead of the promotions/flushes/evictions can cause a
significant penalty. This is especially true if you are doing IO which is
small compared to the object size. I believe this may be caused by the read
being serviced after the promotion, rather than the read being served from
the base tier and then promoted async.

> Does the promote/evict logic really add that much latency? It seems that
> overall the tier performance can be very good. We are using three hit sets
> with 10 minutes per set and all three sets have to have a read to promote
it
> (we don't want to promote isolated reads). Does someone have some
> suggestions from getting the forward like performance in writeback?

When you say you are using 3 hit sets and require 3 reads to promote, is
this via the min_read_recency variable? My understanding was that if set to
3 it will promote if it finds a hit in any of the last 3 hitsets. Although
the description isn't that clear in the documentation, but looking through
the code seems to support this. If you have found a way to only promote when
there is a hit in all 3 hitsets I would be very interested in hearing about
it as it would be very useful to me.

> 
> We have 35 1 TB Micron M600 drives ( 26K single thread direct sync 4K
> random writes, 43K two thread test, we are already aware of the potential
> power loss issue so you don't need to bring that up) in 3x replication.
Our
> current hot set is about 4.5TB and only shifts by about 30% over a week's
> time. We have cache_target_full_ratio set to
> 0.55 so that we leave a good part of the drive empty for performance.
> Also about 90% of our reads are in 10% of the working set and 80% of our
> writes are in about 20% of the working set.
> 
> [1] http://docs.ceph.com/docs/master/rados/operations/cache-
> tiering/#removing-a-writeback-cache
> [2] http://robert.leblancnet.us/files/performance.png
> [3] http://robert.leblancnet.us/files/promote_evict.png
> 
> 
> Thanks,
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1 -----
> BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWS8qbCRDmVDuy+mK58QAADboQAL0tl1ZArL1zPFBf5lYh
> xuYQyaWsoaOgdPvlsFhciSrh3VmdTkT9R3O6MZ61VEauKUHmoipE39KejPj3
> dQMKKHYc+6VF1MoNoQbeml63jC3DJGBDhPOd+bQ7RE8GBaKM71JaWvvG5
> bgW
> xLAZ7F+37jpHkp/9syrnb0wMxOtZ0xq/iW8Kt3lvSz5Qx6XNx5r78+H9Zr28
> OO4xFK8JNfa3JK7RbYU3VeUZCRhhIk/Enb8NdpA0a2cT1meTKfHMDKlOWmT4
> qrWIfptWdtADveq6xY2Kj92dFXVnwNfFjoIl4PXTwZtZM1RyAc1gy3qMBADI
> BOvn5jdw1PmVYHpY9NH57vpKhn+5o6+FvW95baE5OFJ52NthkVp87LuutnKV
> RNyy/cWEe2/Dc9QZdj3eXKjEcL5MYgM+P21THO2e7QQwD6GXnJWnsSTwsQ
> Om
> Qs6RqyE9RgdpabdThRzxWIuT8TJmBrDOovEulzFpBN3ZG8bsOrS/5pTmgamI
> c8FyddhFgYsPwjMKEDvEbHTIPHx1tZ9hL5fjAwZQeMMCV3LWojAK33a0a602
> JfSBj1dhICaULUFQT9f9yhd8/maYNpWogHb/zb3wolegcVP0UckcVxNEUxIf
> hxpTvFV93BzUupaprn03Oje2qbSdY++9lZbBfkVChodEprM5oejiT158WBYr
> Z3Af
> =FP/u
> -----END PGP SIGNATURE-----
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com