Re: Local SSD cache for ceph on each compute node.

Daniel Niasoff <daniel@xxxxxxxxxxxxxx> · Wed, 16 Mar 2016 21:02:23 +0000

Hi Nick,

Your solution requires manual configuration for each VM and cannot be setup as part of an automated OpenStack deployment.

It would be really nice if it was a hypervisor based setting as opposed to a VM based setting.

Thanks 

Daniel

-----Original Message-----
From: Nick Fisk [mailto:nick@xxxxxxxxxx] 
Sent: 16 March 2016 08:59
To: Daniel Niasoff <daniel@xxxxxxxxxxxxxx>; 'Van Leeuwen, Robert' <rovanleeuwen@xxxxxxxx>; 'Jason Dillaman' <dillaman@xxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: RE:  Local SSD cache for ceph on each compute node.

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf 
> Of Daniel Niasoff
> Sent: 16 March 2016 08:26
> To: Van Leeuwen, Robert <rovanleeuwen@xxxxxxxx>; Jason Dillaman 
> <dillaman@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Local SSD cache for ceph on each compute node.
> 
> Hi Robert,
> 
> >Caching writes would be bad because a hypervisor failure would result 
> >in
> loss of the cache which pretty much guarantees inconsistent data on 
> the ceph volume.
> >Also live-migration will become problematic compared to running
> everything from ceph since you will also need to migrate the
local-storage.

I tested a solution using iSCSI for the cache devices. Each VM was using flashcache with a combination of a iSCSI LUN from a SSD and a RBD. This gets around the problem of moving things around or if the hypervisor goes down.
It's not local caching but the write latency is at least 10x lower than the RBD. Note I tested it, I didn't put it into production :-)

> 
> My understanding of how a writeback cache should work is that it 
> should only take a few seconds for writes to be streamed onto the 
> network and is focussed on resolving the speed issue of small sync 
> writes. The writes
would
> be bundled into larger writes that are not time sensitive.
> 
> So there is potential for a few seconds data loss but compared to the
current
> trend of using ephemeral storage to solve this issue, it's a major 
> improvement.

Yeah, problem is a couple of seconds data loss mean different things to different people.

> 
> > (considering the time required for setting up and maintaining the 
> > extra
> caching layer on each vm, unless you work for free ;-)
> 
> Couldn't agree more there.
> 
> I am just so surprised how the openstack community haven't looked to 
> resolve this issue. Ephemeral storage is a HUGE compromise unless you 
> have built in failure into every aspect of your application but many 
> people use openstack as a general purpose devstack.
> 
> (Jason pointed out his blueprint but I guess it's at least a year or 2
away -
> http://tracker.ceph.com/projects/ceph/wiki/Rbd_-_ordered_crash-
> consistent_write-back_caching_extension)
> 
> I see articles discussing the idea such as this one
> 
> http://www.sebastien-han.fr/blog/2014/06/10/ceph-cache-pool-tiering-
> scalable-cache/
> 
> but no real straightforward  validated setup instructions.
> 
> Thanks
> 
> Daniel
> 
> 
> -----Original Message-----
> From: Van Leeuwen, Robert [mailto:rovanleeuwen@xxxxxxxx]
> Sent: 16 March 2016 08:11
> To: Jason Dillaman <dillaman@xxxxxxxxxx>; Daniel Niasoff 
> <daniel@xxxxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Local SSD cache for ceph on each compute node.
> 
> >Indeed, well understood.
> >
> >As a shorter term workaround, if you have control over the VMs, you 
> >could
> always just slice out an LVM volume from local SSD/NVMe and pass it 
> through to the guest.  Within the guest, use dm-cache (or similar) to 
> add
a
> cache front-end to your RBD volume.
> 
> If you do this you need to setup your cache as read-cache only.
> Caching writes would be bad because a hypervisor failure would result 
> in
loss
> of the cache which pretty much guarantees inconsistent data on the 
> ceph volume.
> Also live-migration will become problematic compared to running 
> everything from ceph since you will also need to migrate the local-storage.
> 
> The question will be if adding more ram (== more read cache) would not 
> be more convenient and cheaper in the end.
> (considering the time required for setting up and maintaining the 
> extra caching layer on each vm, unless you work for free ;-) Also 
> reads from
ceph
> are pretty fast compared to the biggest bottleneck: (small) sync writes.
> So it is debatable how much performance you would win except for some 
> use-cases with lots of reads on very large data sets which are also 
> very latency sensitive.
> 
> Cheers,
> Robert van Leeuwen
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com