Re: SSD Caching Mode Question

Samuel Just <sjust@xxxxxxxxxx> · Mon, 23 Nov 2015 08:49:46 -0800



My read of that doc is that you still need to either set the configs
to force all objects to be flushed or use the rados command to
flush/evict all objects.
-Sam

On Wed, Nov 18, 2015 at 2:38 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> Hi Robert,
>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Robert LeBlanc
>> Sent: 18 November 2015 00:47
>> To: Ceph-User <ceph-users@xxxxxxxx>
>> Subject:  SSD Caching Mode Question
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> We are inserting an SSD tier into our very busy cluster and I have a
> question
>> regrading writeback and forward modes.
>>
>> Write back is the "normal" mode for RBD with VMs. When we put the tier in
>> writeback mode we see objects are being promoted and once the ratio is
>> reached objects are evicted, this works as expected. When we place the
> tier
>> into forward mode, we don't see any objects being evicted to the base tier
>> when they are written to as described in the manual [1].
>> Is this a bug? We are running 0.94.5.
>>
>> Now, I usually like things to work they way they are described in the
> manual,
>> however this "bug" is a bit advantageous for us. It appears that we don't
>> have enough IOPs in the SSD tier to handle the steady state (we still
> have
>> some more SSDs to add in, but it requires shuffling hardware around).
>> However, when we put the tier into forward mode, the latency drops and
>> we get much more performance from the Ceph cluster. In write back we
>> seem to be capped at about 9K IOPs accroding to ceph -w with spikes up to
>> about 15K. However in forward mode we can hit 65K IOPs and have a stead
>> state near 30K IOPs. I'm linking two graphs to show what I'm describing
> (for
>> some reason the graphs seem to be half of what is reported by ceph -w).
>> [2][3]
>>
>
> I don't know if your lower performance is due to unwanted promotions to
> cache or if you are seeing something else. I have found that the way the
> cache logic currently works unless the bulk of your working set fits in the
> cache tier the overhead of the promotions/flushes/evictions can cause a
> significant penalty. This is especially true if you are doing IO which is
> small compared to the object size. I believe this may be caused by the read
> being serviced after the promotion, rather than the read being served from
> the base tier and then promoted async.
>
>> Does the promote/evict logic really add that much latency? It seems that
>> overall the tier performance can be very good. We are using three hit sets
>> with 10 minutes per set and all three sets have to have a read to promote
> it
>> (we don't want to promote isolated reads). Does someone have some
>> suggestions from getting the forward like performance in writeback?
>
> When you say you are using 3 hit sets and require 3 reads to promote, is
> this via the min_read_recency variable? My understanding was that if set to
> 3 it will promote if it finds a hit in any of the last 3 hitsets. Although
> the description isn't that clear in the documentation, but looking through
> the code seems to support this. If you have found a way to only promote when
> there is a hit in all 3 hitsets I would be very interested in hearing about
> it as it would be very useful to me.
>
>>
>> We have 35 1 TB Micron M600 drives ( 26K single thread direct sync 4K
>> random writes, 43K two thread test, we are already aware of the potential
>> power loss issue so you don't need to bring that up) in 3x replication.
> Our
>> current hot set is about 4.5TB and only shifts by about 30% over a week's
>> time. We have cache_target_full_ratio set to
>> 0.55 so that we leave a good part of the drive empty for performance.
>> Also about 90% of our reads are in 10% of the working set and 80% of our
>> writes are in about 20% of the working set.
>>
>> [1] http://docs.ceph.com/docs/master/rados/operations/cache-
>> tiering/#removing-a-writeback-cache
>> [2] http://robert.leblancnet.us/files/performance.png
>> [3] http://robert.leblancnet.us/files/promote_evict.png
>>
>>
>> Thanks,
>> - ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1 -----
>> BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v1.2.3
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWS8qbCRDmVDuy+mK58QAADboQAL0tl1ZArL1zPFBf5lYh
>> xuYQyaWsoaOgdPvlsFhciSrh3VmdTkT9R3O6MZ61VEauKUHmoipE39KejPj3
>> dQMKKHYc+6VF1MoNoQbeml63jC3DJGBDhPOd+bQ7RE8GBaKM71JaWvvG5
>> bgW
>> xLAZ7F+37jpHkp/9syrnb0wMxOtZ0xq/iW8Kt3lvSz5Qx6XNx5r78+H9Zr28
>> OO4xFK8JNfa3JK7RbYU3VeUZCRhhIk/Enb8NdpA0a2cT1meTKfHMDKlOWmT4
>> qrWIfptWdtADveq6xY2Kj92dFXVnwNfFjoIl4PXTwZtZM1RyAc1gy3qMBADI
>> BOvn5jdw1PmVYHpY9NH57vpKhn+5o6+FvW95baE5OFJ52NthkVp87LuutnKV
>> RNyy/cWEe2/Dc9QZdj3eXKjEcL5MYgM+P21THO2e7QQwD6GXnJWnsSTwsQ
>> Om
>> Qs6RqyE9RgdpabdThRzxWIuT8TJmBrDOovEulzFpBN3ZG8bsOrS/5pTmgamI
>> c8FyddhFgYsPwjMKEDvEbHTIPHx1tZ9hL5fjAwZQeMMCV3LWojAK33a0a602
>> JfSBj1dhICaULUFQT9f9yhd8/maYNpWogHb/zb3wolegcVP0UckcVxNEUxIf
>> hxpTvFV93BzUupaprn03Oje2qbSdY++9lZbBfkVChodEprM5oejiT158WBYr
>> Z3Af
>> =FP/u
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com