SSD Caching Mode Question

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 17 Nov 2015 17:47:26 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We are inserting an SSD tier into our very busy cluster and I have a
question regrading writeback and forward modes.

Write back is the "normal" mode for RBD with VMs. When we put the tier
in writeback mode we see objects are being promoted and once the ratio
is reached objects are evicted, this works as expected. When we place
the tier into forward mode, we don't see any objects being evicted to
the base tier when they are written to as described in the manual [1].
Is this a bug? We are running 0.94.5.

Now, I usually like things to work they way they are described in the
manual, however this "bug" is a bit advantageous for us. It appears
that we don't have enough IOPs in the SSD tier to handle the steady
state (we still  have some more SSDs to add in, but it requires
shuffling hardware around). However, when we put the tier into forward
mode, the latency drops and we get much more performance from the Ceph
cluster. In write back we seem to be capped at about 9K IOPs accroding
to ceph -w with spikes up to about 15K. However in forward mode we can
hit 65K IOPs and have a stead state near 30K IOPs. I'm linking two
graphs to show what I'm describing (for some reason the graphs seem to
be half of what is reported by ceph -w). [2][3]

Does the promote/evict logic really add that much latency? It seems
that overall the tier performance can be very good. We are using three
hit sets with 10 minutes per set and all three sets have to have a
read to promote it (we don't want to promote isolated reads). Does
someone have some suggestions from getting the forward like
performance in writeback?

We have 35 1 TB Micron M600 drives ( 26K single thread direct sync 4K
random writes, 43K two thread test, we are already aware of the
potential power loss issue so you don't need to bring that up) in 3x
replication. Our current hot set is about 4.5TB and only shifts by
about 30% over a week's time. We have cache_target_full_ratio set to
0.55 so that we leave a good part of the drive empty for performance.
Also about 90% of our reads are in 10% of the working set and 80% of
our writes are in about 20% of the working set.

[1] http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
[2] http://robert.leblancnet.us/files/performance.png
[3] http://robert.leblancnet.us/files/promote_evict.png

Thanks,
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWS8qbCRDmVDuy+mK58QAADboQAL0tl1ZArL1zPFBf5lYh
xuYQyaWsoaOgdPvlsFhciSrh3VmdTkT9R3O6MZ61VEauKUHmoipE39KejPj3
dQMKKHYc+6VF1MoNoQbeml63jC3DJGBDhPOd+bQ7RE8GBaKM71JaWvvG5bgW
xLAZ7F+37jpHkp/9syrnb0wMxOtZ0xq/iW8Kt3lvSz5Qx6XNx5r78+H9Zr28
OO4xFK8JNfa3JK7RbYU3VeUZCRhhIk/Enb8NdpA0a2cT1meTKfHMDKlOWmT4
qrWIfptWdtADveq6xY2Kj92dFXVnwNfFjoIl4PXTwZtZM1RyAc1gy3qMBADI
BOvn5jdw1PmVYHpY9NH57vpKhn+5o6+FvW95baE5OFJ52NthkVp87LuutnKV
RNyy/cWEe2/Dc9QZdj3eXKjEcL5MYgM+P21THO2e7QQwD6GXnJWnsSTwsQOm
Qs6RqyE9RgdpabdThRzxWIuT8TJmBrDOovEulzFpBN3ZG8bsOrS/5pTmgamI
c8FyddhFgYsPwjMKEDvEbHTIPHx1tZ9hL5fjAwZQeMMCV3LWojAK33a0a602
JfSBj1dhICaULUFQT9f9yhd8/maYNpWogHb/zb3wolegcVP0UckcVxNEUxIf
hxpTvFV93BzUupaprn03Oje2qbSdY++9lZbBfkVChodEprM5oejiT158WBYr
Z3Af
=FP/u
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com