Re: Set cache tier pool forward state automatically!

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Wed, 3 Feb 2016 22:35:39 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I think it depends. If there are no writes, then there probably won't
be any blocking if there are less than min_size OSDs to service a PG.
In an RBD workload, that is highly unlikely. If there is no blocking
then setting the full_ratio to near zero should flush out the blocks
even with less then min_size, but any write would stall it all, maybe.
You would have to test it.

Just be aware that you are trying something that I haven't heard
anyone doing, so it may or may not work. Just do a lot of testing and
think of all the possible failure scenarios that may be possible and
try them. Honestly, if you are writing to it, I wouldn't trust less
than size=3, min_size=2 with the automatic recovery. If you only have
2 copies and there is corruption, you can't easily tell which one is
the right one, at least with 3 you can have a vote. Ceph is also
supposed to get smarter about recovery and use the voting to auto
recovery the "best" candidate and I hope with hashing, it will be a
slam dunk for auto recovery.
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.3.4
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWsuMoCRDmVDuy+mK58QAAsFYP/RZMRvc7THLM3b+ogEOZ
gmKbqlIQSVncVJT7luItgjxwtlrVoNsAfgFhluk2Mzdo8v2Nwa4jYGquxkoz
YIaoUmBgN5fapopKjCqJ3wIvd5+W1bT9ASlyksI/roIlNkI+p8mnFRsAHm3w
Ik6gZB2YnSYI6mTDsUn2OKpB1u00AQmRDJqT61lRFsBdqmo1H8QM1bP8bp0C
WZsoZpv0dyCLf/aIIe0PAsKrn53/Ha+gehVZST8TVdfkAJrikUiDUrtfPVDv
XNmBKxODPg74ldHhSd2UTvWO84zv3gKipCJ4OGmOk0eQ8MJDVfpKm3wCDNbh
hq7ywRrjIKmtu/5ppZ8o9UAvgzVREbeW4y5LYPbis/cO0T+PyZn+eb66j1qF
VBGHgILDjFKxDi0nMDDrBpjYNzHfv7ZrDgALw9Awgx/3ogi9Iv6UGVRqvV+D
WtvfXIbFJplB/58FARTaQClIkzVzIIMg3VKS6MjiFSk6gRoRwzmvjpmTHgzI
49wQTIwuIArWeITMjf4dxQKsWBqmh/+/D8dt4lb6OSvxYk983RuY4SSWjwIc
sJMzBXVsQfuzE0iF0BuuqXdgRoPmr8FofDN/Vm6eF6wz6nGPGlUjT6t13b0y
1RLhEPtq2YcYNkiE6dT97g16PRCW7R7h1bnQ5EVUIzFb+Dzh7rtjTjyYmm2z
geuA
=gMKn
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Wed, Feb 3, 2016 at 5:07 PM, Mihai Gheorghe <mcapsali@xxxxxxxxx> wrote:
> Does the cache pool flush when setting a min value ratio if the pool doesn't
> meet the min_size? I mean ceph blocks only writes when an osd fails in a
> pool size of 2 or does it block reads too?
>
> Because on paper it looks good on a small cache pool, in case of osd
> failiure to set lowest ratio for flush, wait for it to finish and then set
> it in forward mode, or disable it completely untill it's fixed.
>
> On 4 Feb 2016 01:57, "Robert LeBlanc" <robert@xxxxxxxxxxxxx> wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> My experience with Hammer is showing that setting the pool to forward
>> mode is not evicting objects, nor do I think it is flushing objects.
>> We have had our pool in forward mode for weeks now and we still have
>> almost the same amount of I/O to it. There has been a slight shift
>> between SSD and HDD, but I think that is because some objects have
>> cooled off and others have been newly accessed. You may have better
>> luck adjusting the ratios, but we see that there is a big hit to our
>> cluster to do that. We usually do 1% every minute or two to help
>> reduce the impact of evicting the data (we usually drop the cache full
>> ratio 10% or so to evict some objects and we then toggle the cache
>> mode between writeback and forward periodically to warm up the cache.
>> Setting it to writeback will promote so many objects at once that it
>> severely impact our cluster. There is also a limit that we reach at
>> about 10K IOPs when in writeback where with forward I've seen spikes
>> to 64K IOPs. So we turn on writeback for 30-60 seconds (or until the
>> blocked I/O is too much for us to handle), then set it to forward for
>> 60-120 second, rinse and repeat until the impact of writeback isn't so
>> bad, then set it back to forward for a couple more weeks).
>>
>> Needless to say, cache tiering could use some more love. If I get some
>> time, I'd like to try and help that section of code, but I have a
>> couple other more pressing issues I'd like to track down first.
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v1.3.4
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWspPSCRDmVDuy+mK58QAAsQgP/15YrzV+BRt+CGnzZL/Q
>> w6PwnSdw4HBJT4OEqdg+kStCP+SqUSVCiJcdeHo5Sm40smEWVYRim3jsHBSg
>> Z4Woa31XsjYbEw3HCxIoI93OPhaKszOhvktKZxu1iSnyMDDJIYMARlYIjbfc
>> ToCOC/IVe2MMAEtVq+J2fm/NQy6VDGbaUuYcNtkIF41j7vKoNoE3h5qi+L0K
>> cVwUhVTcuSNDuiuJOoduM/vSH6nJzmCnypH1BDTcEOYpvmbXWJ0iTdej2Oa1
>> gVvV7SOcu4PkjzL9MmJB2Cjiiy/zWjUTfN01nBvIatwOjF7AE8vq2pLD9FIs
>> TxmzE4UZgjwJNbkDVQsgHPCeUlEll+t3QKbokpEkQDQgvIOs6NCbj0KYpuhC
>> DWtQCbgYsniT+Md1vWFMgqs0a45ulGxEKUWiUOEXgTJLHH+dbrW32MZEl1Gd
>> yTKyzFarbae6tbAmaMPC8l9vaj15t7bAB0KOokMqZied7EcM1ZoFVqKRahrm
>> 73mIeHiDUwZ8gi+BHKX7OwqKt3VZJYf/+rNJx+g4kp5WN0FEkUMoqF75qO4p
>> 62+PuQIwh6jUpB4cDsbEJd78UGbCptJBojmsNVogU+xiSXTKQmEduP0HqQfG
>> JhTLg3Un2C4/MSGbhRI26csFCzEi66iRXQWdfCITP4Um70KO6dE2C1MAveYg
>> hJ7b
>> =CaRF
>> -----END PGP SIGNATURE-----
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Feb 3, 2016 at 10:01 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> > I think this would be better to be done outside of Ceph. It should be
>> > quite simple for whatever monitoring software you are using to pick up the
>> > disk failure to set the target_dirty_ratio to a very low value or change the
>> > actual caching mode.
>> >
>> > Doing it in Ceph would be complicated as you are then asking Ceph to
>> > decide when you are in an at risk scenario, ie would you want it to flush
>> > your cache after a quick service reload or node reboot?
>> >
>> >> -----Original Message-----
>> >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>> >> Of
>> >> Mihai Gheorghe
>> >> Sent: 03 February 2016 16:57
>> >> To: ceph-users <ceph-users@xxxxxxxxxxxxxx>; ceph-users <ceph-
>> >> users@xxxxxxxx>
>> >> Subject:  Set cache tier pool forward state automatically!
>> >>
>> >> Hi,
>> >>
>> >> Is there a built in setting in ceph that would set the cache pool from
>> >> writeback to forward state automatically in case of an OSD fail from
>> >> the pool?
>> >>
>> >> Let;s say the size of the cache pool is 2. If an OSD fails ceph blocks
>> >> write to
>> >> the pool, making the VM that use this pool to be unaccesable. But an
>> >> earlier
>> >> copy of the data is present on the cold storage pool prior to the last
>> >> cache
>> >> flush.
>> >>
>> >> In this case, is it possible that when an OSD fails, the data on the
>> >> cache pool
>> >> to be flushed onto the cold storage pool and set the forward flag
>> >> automatically on the cache pool? So that the VM can resume write to the
>> >> block device as soon as the cache is flushed from the pool and
>> >> read/write
>> >> directly from the cold storage pool untill manual intervention on the
>> >> cache
>> >> pool is done to fix it and set it back to writeback?
>> >>
>> >> This way we can get away with a pool size of 2 without worrying for too
>> >> much
>> >> downtime!
>> >>
>> >> Hope i was explicit enough!
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com