Hi Robert, > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Robert LeBlanc > Sent: 18 November 2015 00:47 > To: Ceph-User <ceph-users@xxxxxxxx> > Subject: SSD Caching Mode Question > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > We are inserting an SSD tier into our very busy cluster and I have a question > regrading writeback and forward modes. > > Write back is the "normal" mode for RBD with VMs. When we put the tier in > writeback mode we see objects are being promoted and once the ratio is > reached objects are evicted, this works as expected. When we place the tier > into forward mode, we don't see any objects being evicted to the base tier > when they are written to as described in the manual [1]. > Is this a bug? We are running 0.94.5. > > Now, I usually like things to work they way they are described in the manual, > however this "bug" is a bit advantageous for us. It appears that we don't > have enough IOPs in the SSD tier to handle the steady state (we still have > some more SSDs to add in, but it requires shuffling hardware around). > However, when we put the tier into forward mode, the latency drops and > we get much more performance from the Ceph cluster. In write back we > seem to be capped at about 9K IOPs accroding to ceph -w with spikes up to > about 15K. However in forward mode we can hit 65K IOPs and have a stead > state near 30K IOPs. I'm linking two graphs to show what I'm describing (for > some reason the graphs seem to be half of what is reported by ceph -w). > [2][3] > I don't know if your lower performance is due to unwanted promotions to cache or if you are seeing something else. I have found that the way the cache logic currently works unless the bulk of your working set fits in the cache tier the overhead of the promotions/flushes/evictions can cause a significant penalty. This is especially true if you are doing IO which is small compared to the object size. I believe this may be caused by the read being serviced after the promotion, rather than the read being served from the base tier and then promoted async. > Does the promote/evict logic really add that much latency? It seems that > overall the tier performance can be very good. We are using three hit sets > with 10 minutes per set and all three sets have to have a read to promote it > (we don't want to promote isolated reads). Does someone have some > suggestions from getting the forward like performance in writeback? When you say you are using 3 hit sets and require 3 reads to promote, is this via the min_read_recency variable? My understanding was that if set to 3 it will promote if it finds a hit in any of the last 3 hitsets. Although the description isn't that clear in the documentation, but looking through the code seems to support this. If you have found a way to only promote when there is a hit in all 3 hitsets I would be very interested in hearing about it as it would be very useful to me. > > We have 35 1 TB Micron M600 drives ( 26K single thread direct sync 4K > random writes, 43K two thread test, we are already aware of the potential > power loss issue so you don't need to bring that up) in 3x replication. Our > current hot set is about 4.5TB and only shifts by about 30% over a week's > time. We have cache_target_full_ratio set to > 0.55 so that we leave a good part of the drive empty for performance. > Also about 90% of our reads are in 10% of the working set and 80% of our > writes are in about 20% of the working set. > > [1] http://docs.ceph.com/docs/master/rados/operations/cache- > tiering/#removing-a-writeback-cache > [2] http://robert.leblancnet.us/files/performance.png > [3] http://robert.leblancnet.us/files/promote_evict.png > > > Thanks, > - ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ----- > BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.2.3 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJWS8qbCRDmVDuy+mK58QAADboQAL0tl1ZArL1zPFBf5lYh > xuYQyaWsoaOgdPvlsFhciSrh3VmdTkT9R3O6MZ61VEauKUHmoipE39KejPj3 > dQMKKHYc+6VF1MoNoQbeml63jC3DJGBDhPOd+bQ7RE8GBaKM71JaWvvG5 > bgW > xLAZ7F+37jpHkp/9syrnb0wMxOtZ0xq/iW8Kt3lvSz5Qx6XNx5r78+H9Zr28 > OO4xFK8JNfa3JK7RbYU3VeUZCRhhIk/Enb8NdpA0a2cT1meTKfHMDKlOWmT4 > qrWIfptWdtADveq6xY2Kj92dFXVnwNfFjoIl4PXTwZtZM1RyAc1gy3qMBADI > BOvn5jdw1PmVYHpY9NH57vpKhn+5o6+FvW95baE5OFJ52NthkVp87LuutnKV > RNyy/cWEe2/Dc9QZdj3eXKjEcL5MYgM+P21THO2e7QQwD6GXnJWnsSTwsQ > Om > Qs6RqyE9RgdpabdThRzxWIuT8TJmBrDOovEulzFpBN3ZG8bsOrS/5pTmgamI > c8FyddhFgYsPwjMKEDvEbHTIPHx1tZ9hL5fjAwZQeMMCV3LWojAK33a0a602 > JfSBj1dhICaULUFQT9f9yhd8/maYNpWogHb/zb3wolegcVP0UckcVxNEUxIf > hxpTvFV93BzUupaprn03Oje2qbSdY++9lZbBfkVChodEprM5oejiT158WBYr > Z3Af > =FP/u > -----END PGP SIGNATURE----- > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com