Re: Local SSD cache for ceph on each compute node.

Sebastien Han <seb@xxxxxxxxxx> · Thu, 17 Mar 2016 12:45:54 +0100

I’d rather like to see this implemented at the hypervisor level, i.e.: QEMU, so we can have a common layer for all the storage backends.
Although this is less portable...

> On 17 Mar 2016, at 11:00, Nick Fisk <nick@xxxxxxxxxx> wrote:
> 
> 
> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Daniel Niasoff
>> Sent: 16 March 2016 21:02
>> To: Nick Fisk <nick@xxxxxxxxxx>; 'Van Leeuwen, Robert'
>> <rovanleeuwen@xxxxxxxx>; 'Jason Dillaman' <dillaman@xxxxxxxxxx>
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  Local SSD cache for ceph on each compute node.
>> 
>> Hi Nick,
>> 
>> Your solution requires manual configuration for each VM and cannot be
>> setup as part of an automated OpenStack deployment.
> 
> Absolutely, potentially flaky as well.
> 
>> 
>> It would be really nice if it was a hypervisor based setting as opposed to
> a VM
>> based setting.
> 
> Yes, I can't wait until we can just specify "rbd_cache_device=/dev/ssd" in
> the ceph.conf and get it to write to that instead. Ideally ceph would also
> provide some sort of lightweight replication for the cache devices, but
> otherwise a iSCSI SSD farm or switched SAS could be used so that the caching
> device is not tied to one physical host.
> 
>> 
>> Thanks
>> 
>> Daniel
>> 
>> -----Original Message-----
>> From: Nick Fisk [mailto:nick@xxxxxxxxxx]
>> Sent: 16 March 2016 08:59
>> To: Daniel Niasoff <daniel@xxxxxxxxxxxxxx>; 'Van Leeuwen, Robert'
>> <rovanleeuwen@xxxxxxxx>; 'Jason Dillaman' <dillaman@xxxxxxxxxx>
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: RE:  Local SSD cache for ceph on each compute node.
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>>> Of Daniel Niasoff
>>> Sent: 16 March 2016 08:26
>>> To: Van Leeuwen, Robert <rovanleeuwen@xxxxxxxx>; Jason Dillaman
>>> <dillaman@xxxxxxxxxx>
>>> Cc: ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re:  Local SSD cache for ceph on each compute node.
>>> 
>>> Hi Robert,
>>> 
>>>> Caching writes would be bad because a hypervisor failure would result
>>>> in
>>> loss of the cache which pretty much guarantees inconsistent data on
>>> the ceph volume.
>>>> Also live-migration will become problematic compared to running
>>> everything from ceph since you will also need to migrate the
>> local-storage.
>> 
>> I tested a solution using iSCSI for the cache devices. Each VM was using
>> flashcache with a combination of a iSCSI LUN from a SSD and a RBD. This
> gets
>> around the problem of moving things around or if the hypervisor goes down.
>> It's not local caching but the write latency is at least 10x lower than
> the RBD.
>> Note I tested it, I didn't put it into production :-)
>> 
>>> 
>>> My understanding of how a writeback cache should work is that it
>>> should only take a few seconds for writes to be streamed onto the
>>> network and is focussed on resolving the speed issue of small sync
>>> writes. The writes
>> would
>>> be bundled into larger writes that are not time sensitive.
>>> 
>>> So there is potential for a few seconds data loss but compared to the
>> current
>>> trend of using ephemeral storage to solve this issue, it's a major
>>> improvement.
>> 
>> Yeah, problem is a couple of seconds data loss mean different things to
>> different people.
>> 
>>> 
>>>> (considering the time required for setting up and maintaining the
>>>> extra
>>> caching layer on each vm, unless you work for free ;-)
>>> 
>>> Couldn't agree more there.
>>> 
>>> I am just so surprised how the openstack community haven't looked to
>>> resolve this issue. Ephemeral storage is a HUGE compromise unless you
>>> have built in failure into every aspect of your application but many
>>> people use openstack as a general purpose devstack.
>>> 
>>> (Jason pointed out his blueprint but I guess it's at least a year or 2
>> away -
>>> http://tracker.ceph.com/projects/ceph/wiki/Rbd_-_ordered_crash-
>>> consistent_write-back_caching_extension)
>>> 
>>> I see articles discussing the idea such as this one
>>> 
>>> http://www.sebastien-han.fr/blog/2014/06/10/ceph-cache-pool-tiering-
>>> scalable-cache/
>>> 
>>> but no real straightforward  validated setup instructions.
>>> 
>>> Thanks
>>> 
>>> Daniel
>>> 
>>> 
>>> -----Original Message-----
>>> From: Van Leeuwen, Robert [mailto:rovanleeuwen@xxxxxxxx]
>>> Sent: 16 March 2016 08:11
>>> To: Jason Dillaman <dillaman@xxxxxxxxxx>; Daniel Niasoff
>>> <daniel@xxxxxxxxxxxxxx>
>>> Cc: ceph-users@xxxxxxxxxxxxxx
>>> Subject: Re:  Local SSD cache for ceph on each compute node.
>>> 
>>>> Indeed, well understood.
>>>> 
>>>> As a shorter term workaround, if you have control over the VMs, you
>>>> could
>>> always just slice out an LVM volume from local SSD/NVMe and pass it
>>> through to the guest.  Within the guest, use dm-cache (or similar) to
>>> add
>> a
>>> cache front-end to your RBD volume.
>>> 
>>> If you do this you need to setup your cache as read-cache only.
>>> Caching writes would be bad because a hypervisor failure would result
>>> in
>> loss
>>> of the cache which pretty much guarantees inconsistent data on the
>>> ceph volume.
>>> Also live-migration will become problematic compared to running
>>> everything from ceph since you will also need to migrate the
> local-storage.
>>> 
>>> The question will be if adding more ram (== more read cache) would not
>>> be more convenient and cheaper in the end.
>>> (considering the time required for setting up and maintaining the
>>> extra caching layer on each vm, unless you work for free ;-) Also
>>> reads from
>> ceph
>>> are pretty fast compared to the biggest bottleneck: (small) sync writes.
>>> So it is debatable how much performance you would win except for some
>>> use-cases with lots of reads on very large data sets which are also
>>> very latency sensitive.
>>> 
>>> Cheers,
>>> Robert van Leeuwen
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Cheers.
––––
Sébastien Han
Senior Cloud Architect

"Always give 100%. Unless you're giving blood."

Mail: seb@xxxxxxxxxx
Address: 11 bis, rue Roquépine - 75008 Paris

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com