Re: SSD Caching Mode Question

Nick Fisk <nick@xxxxxxxxxx> · Mon, 23 Nov 2015 17:48:15 -0000

> -----Original Message-----
> From: Robert LeBlanc [mailto:robert@xxxxxxxxxxxxx]
> Sent: 23 November 2015 17:16
> To: Samuel Just <sjust@xxxxxxxxxx>
> Cc: Nick Fisk <nick@xxxxxxxxxx>; Ceph-User <ceph-users@xxxxxxxx>
> Subject: Re:  SSD Caching Mode Question
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Hmmm. It sounds like some objects should be flushed automatically but
> maybe not all of them. However, I'm not seeing any objects being evicted at
> all and I know that objects in the tier are being modified.
> 
> 1. Change the cache mode to forward so that new and modified objects will
> flush to the backing storage pool.
> 2. Ensure that the cache pool has been flushed. This may take a few minutes:
>     If the cache pool still has objects, you can flush them manually.
> For example:
> 
> So I'm concerned that there is a disconnect between what is actually is
> happening and what is expected to happen. Nick, are you seeing objects
> being evicted when in forward mode? It may be as simple as updating the
> document.
> 

I'm not actually running forward so I can't comment from an actual running system. But a quick skim of code doesn't reveal anything that should effect the eviction logic. However keep in mind that no new blocks will be promoted and so nothing will in turn get pushed out/evicted. However I would have though flushing should still occur if the number of dirty blocks exceeds the target.

> The other thing is the massive performance difference between writeback
> and forward. Nick, are you seeing something similar in your environment in
> this regard?

I can see in the code that the cache mode "forward" uses redirects instead of proxy reads, I wonder if this is the cause? There are a couple of additional cache modes that don't seem to be documented. "readproxy" is another one which uses proxy reads and in theory should give you similar functionality. It will always promote on write, but will not promote any reads.

Actually I bet your problem is you are suffering from too many promotions. I'm working on a couple of changes which should hopefully improve this behaviour, but currently Ceph isn't really tracking hot blocks very well. In your config example, any 2 hits to an object within 30 minutes will cause a promotion and subsequent eviction, that’s a lot of wasted IO. I should know more in the morning (UK time) if my changes have worked and will hopefully present something to the perf meeting on Wednesday.

Nick

> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWU0nTCRDmVDuy+mK58QAATDkP+gKxAwZGP5zg1XKS
> Aey7
> Dz/FLUqv0RtrFGHVaxtlnfHZowMD1m/3k8+ahtAwDGfUg6o8xZA+BdkKGZDN
> /21p+IfGeLNYbX2ujvqamAuHpNjDdpva8MytwPiqMuLsLmrjOLGtfZrTC1EC
> dP3duGGjceCw+fMFr/QefSdMzlyzghsYrkVKFNJCRd0oOV9LEMT4QVdOZo9X
> WXBf6WNEo7Z8442yoWsA9BaoEYU5GwWmxrmNtBV8gRXGKDxZcSLAY86ZTj
> S6
> 8pipJhdzhR/Zm98NnEvBwomP6Jx9ii7ao7jUHPc4ap6e/KGJ3W6a1J49pOPu
> z5qoEtFONqomfxbmkW0dVIvONSFTVH85b0MaNfrGuj6peMfse/nEB0fJT+9+
> d2tCi95ThwNK7piP617ADLXSgNfPsAIW4nuGcyRu+c7/o4iq6GfMUBPwTjRf
> b6iu0C8dkQGMRgK8mp8hHVs/Q55wx5BL4ae9BzsMlQiyjJNFZsrzfm1KvyyC
> 0MVVM4oRgicnav7RoSqDtSms1Zc9lhG7ov+WTDcIN3EYTmA/vRTNhqLA/Ijn
> Cx2Cncloy342I5GCWKeMLKTznX94pYgcrWqkUqVatlcgEXKhgkK1eTuK1KB0
> ItJVzovNpsQSxOio4vQ6CiEOdKi03Uo7IqUuffUeZCtXKLwinIJgJc+IJ0fe
> 11kg
> =nX2P
> -----END PGP SIGNATURE-----
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Mon, Nov 23, 2015 at 9:49 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> > My read of that doc is that you still need to either set the configs
> > to force all objects to be flushed or use the rados command to
> > flush/evict all objects.
> > -Sam
> >
> > On Wed, Nov 18, 2015 at 2:38 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >> Hi Robert,
> >>
> >>> -----Original Message-----
> >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> >>> Behalf Of Robert LeBlanc
> >>> Sent: 18 November 2015 00:47
> >>> To: Ceph-User <ceph-users@xxxxxxxx>
> >>> Subject:  SSD Caching Mode Question
> >>>
> >>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> Hash: SHA256
> >>>
> >>> We are inserting an SSD tier into our very busy cluster and I have a
> >> question
> >>> regrading writeback and forward modes.
> >>>
> >>> Write back is the "normal" mode for RBD with VMs. When we put the
> >>> tier in writeback mode we see objects are being promoted and once
> >>> the ratio is reached objects are evicted, this works as expected.
> >>> When we place the
> >> tier
> >>> into forward mode, we don't see any objects being evicted to the
> >>> base tier when they are written to as described in the manual [1].
> >>> Is this a bug? We are running 0.94.5.
> >>>
> >>> Now, I usually like things to work they way they are described in
> >>> the
> >> manual,
> >>> however this "bug" is a bit advantageous for us. It appears that we
> >>> don't have enough IOPs in the SSD tier to handle the steady state
> >>> (we still
> >> have
> >>> some more SSDs to add in, but it requires shuffling hardware around).
> >>> However, when we put the tier into forward mode, the latency drops
> >>> and we get much more performance from the Ceph cluster. In write
> >>> back we seem to be capped at about 9K IOPs accroding to ceph -w with
> >>> spikes up to about 15K. However in forward mode we can hit 65K IOPs
> >>> and have a stead state near 30K IOPs. I'm linking two graphs to show
> >>> what I'm describing
> >> (for
> >>> some reason the graphs seem to be half of what is reported by ceph -
> w).
> >>> [2][3]
> >>>
> >>
> >> I don't know if your lower performance is due to unwanted promotions
> >> to cache or if you are seeing something else. I have found that the
> >> way the cache logic currently works unless the bulk of your working
> >> set fits in the cache tier the overhead of the
> >> promotions/flushes/evictions can cause a significant penalty. This is
> >> especially true if you are doing IO which is small compared to the
> >> object size. I believe this may be caused by the read being serviced
> >> after the promotion, rather than the read being served from the base tier
> and then promoted async.
> >>
> >>> Does the promote/evict logic really add that much latency? It seems
> >>> that overall the tier performance can be very good. We are using
> >>> three hit sets with 10 minutes per set and all three sets have to
> >>> have a read to promote
> >> it
> >>> (we don't want to promote isolated reads). Does someone have some
> >>> suggestions from getting the forward like performance in writeback?
> >>
> >> When you say you are using 3 hit sets and require 3 reads to promote,
> >> is this via the min_read_recency variable? My understanding was that
> >> if set to
> >> 3 it will promote if it finds a hit in any of the last 3 hitsets.
> >> Although the description isn't that clear in the documentation, but
> >> looking through the code seems to support this. If you have found a
> >> way to only promote when there is a hit in all 3 hitsets I would be
> >> very interested in hearing about it as it would be very useful to me.
> >>
> >>>
> >>> We have 35 1 TB Micron M600 drives ( 26K single thread direct sync
> >>> 4K random writes, 43K two thread test, we are already aware of the
> >>> potential power loss issue so you don't need to bring that up) in 3x
> replication.
> >> Our
> >>> current hot set is about 4.5TB and only shifts by about 30% over a
> >>> week's time. We have cache_target_full_ratio set to
> >>> 0.55 so that we leave a good part of the drive empty for performance.
> >>> Also about 90% of our reads are in 10% of the working set and 80% of
> >>> our writes are in about 20% of the working set.
> >>>
> >>> [1] http://docs.ceph.com/docs/master/rados/operations/cache-
> >>> tiering/#removing-a-writeback-cache
> >>> [2] http://robert.leblancnet.us/files/performance.png
> >>> [3] http://robert.leblancnet.us/files/promote_evict.png
> >>>
> >>>
> >>> Thanks,
> >>> - ----------------
> >>> Robert LeBlanc
> >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>> ----- BEGIN PGP SIGNATURE-----
> >>> Version: Mailvelope v1.2.3
> >>> Comment: https://www.mailvelope.com
> >>>
> >>>
> wsFcBAEBCAAQBQJWS8qbCRDmVDuy+mK58QAADboQAL0tl1ZArL1zPFBf5lYh
> >>>
> xuYQyaWsoaOgdPvlsFhciSrh3VmdTkT9R3O6MZ61VEauKUHmoipE39KejPj3
> >>>
> dQMKKHYc+6VF1MoNoQbeml63jC3DJGBDhPOd+bQ7RE8GBaKM71JaWvvG5
> >>> bgW
> >>> xLAZ7F+37jpHkp/9syrnb0wMxOtZ0xq/iW8Kt3lvSz5Qx6XNx5r78+H9Zr28
> >>>
> OO4xFK8JNfa3JK7RbYU3VeUZCRhhIk/Enb8NdpA0a2cT1meTKfHMDKlOWmT4
> >>>
> qrWIfptWdtADveq6xY2Kj92dFXVnwNfFjoIl4PXTwZtZM1RyAc1gy3qMBADI
> >>>
> BOvn5jdw1PmVYHpY9NH57vpKhn+5o6+FvW95baE5OFJ52NthkVp87LuutnKV
> >>>
> RNyy/cWEe2/Dc9QZdj3eXKjEcL5MYgM+P21THO2e7QQwD6GXnJWnsSTwsQ
> >>> Om
> >>>
> Qs6RqyE9RgdpabdThRzxWIuT8TJmBrDOovEulzFpBN3ZG8bsOrS/5pTmgamI
> >>>
> c8FyddhFgYsPwjMKEDvEbHTIPHx1tZ9hL5fjAwZQeMMCV3LWojAK33a0a602
> >>>
> JfSBj1dhICaULUFQT9f9yhd8/maYNpWogHb/zb3wolegcVP0UckcVxNEUxIf
> >>> hxpTvFV93BzUupaprn03Oje2qbSdY++9lZbBfkVChodEprM5oejiT158WBYr
> >>> Z3Af
> >>> =FP/u
> >>> -----END PGP SIGNATURE-----
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com