Re: SSD Caching Mode Question

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Mon, 23 Nov 2015 10:16:06 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hmmm. It sounds like some objects should be flushed automatically but
maybe not all of them. However, I'm not seeing any objects being
evicted at all and I know that objects in the tier are being modified.

1. Change the cache mode to forward so that new and modified objects
will flush to the backing storage pool.
2. Ensure that the cache pool has been flushed. This may take a few minutes:
    If the cache pool still has objects, you can flush them manually.
For example:

So I'm concerned that there is a disconnect between what is actually
is happening and what is expected to happen. Nick, are you seeing
objects being evicted when in forward mode? It may be as simple as
updating the document.

The other thing is the massive performance difference between
writeback and forward. Nick, are you seeing something similar in your
environment in this regard?
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWU0nTCRDmVDuy+mK58QAATDkP+gKxAwZGP5zg1XKSAey7
Dz/FLUqv0RtrFGHVaxtlnfHZowMD1m/3k8+ahtAwDGfUg6o8xZA+BdkKGZDN
/21p+IfGeLNYbX2ujvqamAuHpNjDdpva8MytwPiqMuLsLmrjOLGtfZrTC1EC
dP3duGGjceCw+fMFr/QefSdMzlyzghsYrkVKFNJCRd0oOV9LEMT4QVdOZo9X
WXBf6WNEo7Z8442yoWsA9BaoEYU5GwWmxrmNtBV8gRXGKDxZcSLAY86ZTjS6
8pipJhdzhR/Zm98NnEvBwomP6Jx9ii7ao7jUHPc4ap6e/KGJ3W6a1J49pOPu
z5qoEtFONqomfxbmkW0dVIvONSFTVH85b0MaNfrGuj6peMfse/nEB0fJT+9+
d2tCi95ThwNK7piP617ADLXSgNfPsAIW4nuGcyRu+c7/o4iq6GfMUBPwTjRf
b6iu0C8dkQGMRgK8mp8hHVs/Q55wx5BL4ae9BzsMlQiyjJNFZsrzfm1KvyyC
0MVVM4oRgicnav7RoSqDtSms1Zc9lhG7ov+WTDcIN3EYTmA/vRTNhqLA/Ijn
Cx2Cncloy342I5GCWKeMLKTznX94pYgcrWqkUqVatlcgEXKhgkK1eTuK1KB0
ItJVzovNpsQSxOio4vQ6CiEOdKi03Uo7IqUuffUeZCtXKLwinIJgJc+IJ0fe
11kg
=nX2P
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, Nov 23, 2015 at 9:49 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> My read of that doc is that you still need to either set the configs
> to force all objects to be flushed or use the rados command to
> flush/evict all objects.
> -Sam
>
> On Wed, Nov 18, 2015 at 2:38 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> Hi Robert,
>>
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>>> Robert LeBlanc
>>> Sent: 18 November 2015 00:47
>>> To: Ceph-User <ceph-users@xxxxxxxx>
>>> Subject:  SSD Caching Mode Question
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> We are inserting an SSD tier into our very busy cluster and I have a
>> question
>>> regrading writeback and forward modes.
>>>
>>> Write back is the "normal" mode for RBD with VMs. When we put the tier in
>>> writeback mode we see objects are being promoted and once the ratio is
>>> reached objects are evicted, this works as expected. When we place the
>> tier
>>> into forward mode, we don't see any objects being evicted to the base tier
>>> when they are written to as described in the manual [1].
>>> Is this a bug? We are running 0.94.5.
>>>
>>> Now, I usually like things to work they way they are described in the
>> manual,
>>> however this "bug" is a bit advantageous for us. It appears that we don't
>>> have enough IOPs in the SSD tier to handle the steady state (we still
>> have
>>> some more SSDs to add in, but it requires shuffling hardware around).
>>> However, when we put the tier into forward mode, the latency drops and
>>> we get much more performance from the Ceph cluster. In write back we
>>> seem to be capped at about 9K IOPs accroding to ceph -w with spikes up to
>>> about 15K. However in forward mode we can hit 65K IOPs and have a stead
>>> state near 30K IOPs. I'm linking two graphs to show what I'm describing
>> (for
>>> some reason the graphs seem to be half of what is reported by ceph -w).
>>> [2][3]
>>>
>>
>> I don't know if your lower performance is due to unwanted promotions to
>> cache or if you are seeing something else. I have found that the way the
>> cache logic currently works unless the bulk of your working set fits in the
>> cache tier the overhead of the promotions/flushes/evictions can cause a
>> significant penalty. This is especially true if you are doing IO which is
>> small compared to the object size. I believe this may be caused by the read
>> being serviced after the promotion, rather than the read being served from
>> the base tier and then promoted async.
>>
>>> Does the promote/evict logic really add that much latency? It seems that
>>> overall the tier performance can be very good. We are using three hit sets
>>> with 10 minutes per set and all three sets have to have a read to promote
>> it
>>> (we don't want to promote isolated reads). Does someone have some
>>> suggestions from getting the forward like performance in writeback?
>>
>> When you say you are using 3 hit sets and require 3 reads to promote, is
>> this via the min_read_recency variable? My understanding was that if set to
>> 3 it will promote if it finds a hit in any of the last 3 hitsets. Although
>> the description isn't that clear in the documentation, but looking through
>> the code seems to support this. If you have found a way to only promote when
>> there is a hit in all 3 hitsets I would be very interested in hearing about
>> it as it would be very useful to me.
>>
>>>
>>> We have 35 1 TB Micron M600 drives ( 26K single thread direct sync 4K
>>> random writes, 43K two thread test, we are already aware of the potential
>>> power loss issue so you don't need to bring that up) in 3x replication.
>> Our
>>> current hot set is about 4.5TB and only shifts by about 30% over a week's
>>> time. We have cache_target_full_ratio set to
>>> 0.55 so that we leave a good part of the drive empty for performance.
>>> Also about 90% of our reads are in 10% of the working set and 80% of our
>>> writes are in about 20% of the working set.
>>>
>>> [1] http://docs.ceph.com/docs/master/rados/operations/cache-
>>> tiering/#removing-a-writeback-cache
>>> [2] http://robert.leblancnet.us/files/performance.png
>>> [3] http://robert.leblancnet.us/files/promote_evict.png
>>>
>>>
>>> Thanks,
>>> - ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1 -----
>>> BEGIN PGP SIGNATURE-----
>>> Version: Mailvelope v1.2.3
>>> Comment: https://www.mailvelope.com
>>>
>>> wsFcBAEBCAAQBQJWS8qbCRDmVDuy+mK58QAADboQAL0tl1ZArL1zPFBf5lYh
>>> xuYQyaWsoaOgdPvlsFhciSrh3VmdTkT9R3O6MZ61VEauKUHmoipE39KejPj3
>>> dQMKKHYc+6VF1MoNoQbeml63jC3DJGBDhPOd+bQ7RE8GBaKM71JaWvvG5
>>> bgW
>>> xLAZ7F+37jpHkp/9syrnb0wMxOtZ0xq/iW8Kt3lvSz5Qx6XNx5r78+H9Zr28
>>> OO4xFK8JNfa3JK7RbYU3VeUZCRhhIk/Enb8NdpA0a2cT1meTKfHMDKlOWmT4
>>> qrWIfptWdtADveq6xY2Kj92dFXVnwNfFjoIl4PXTwZtZM1RyAc1gy3qMBADI
>>> BOvn5jdw1PmVYHpY9NH57vpKhn+5o6+FvW95baE5OFJ52NthkVp87LuutnKV
>>> RNyy/cWEe2/Dc9QZdj3eXKjEcL5MYgM+P21THO2e7QQwD6GXnJWnsSTwsQ
>>> Om
>>> Qs6RqyE9RgdpabdThRzxWIuT8TJmBrDOovEulzFpBN3ZG8bsOrS/5pTmgamI
>>> c8FyddhFgYsPwjMKEDvEbHTIPHx1tZ9hL5fjAwZQeMMCV3LWojAK33a0a602
>>> JfSBj1dhICaULUFQT9f9yhd8/maYNpWogHb/zb3wolegcVP0UckcVxNEUxIf
>>> hxpTvFV93BzUupaprn03Oje2qbSdY++9lZbBfkVChodEprM5oejiT158WBYr
>>> Z3Af
>>> =FP/u
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com