Re: Local SSD cache for ceph on each compute node.

Stuart Longland <stuartl@xxxxxxxxxx> · Sun, 27 Mar 2016 19:10:33 +1000

Hi all,
On 16/03/16 18:11, Van Leeuwen, Robert wrote:
>> Indeed, well understood.
>>
>> As a shorter term workaround, if you have control over the VMs, you could always just slice out an LVM volume from local SSD/NVMe and pass it through to the guest.  Within the guest, use dm-cache (or similar) to add a cache front-end to your RBD volume.  
> 
> If you do this you need to setup your cache as read-cache only. 
> Caching writes would be bad because a hypervisor failure would result in loss of the cache which pretty much guarantees inconsistent data on the ceph volume.
> Also live-migration will become problematic compared to running everything from ceph since you will also need to migrate the local-storage.
> 
> The question will be if adding more ram (== more read cache) would not be more convenient and cheaper in the end.
> (considering the time required for setting up and maintaining the extra caching layer on each vm, unless you work for free ;-)
> Also reads from ceph are pretty fast compared to the biggest bottleneck: (small) sync writes.
> So it is debatable how much performance you would win except for some use-cases with lots of reads on very large data sets which are also very latency sensitive.

Been following this discussion from a distance for a while, and have
personally experimented with trying to introduce VM-local caching.

Our set-up, we have 3 storage nodes which run a monitor and two OSDs
each.  The machines have two gigabit Ethernet cards, one public-facing,
the other a private network for storage cluster communications.
Machines themselves are Core i3s with 8GB RAM.  The cluster is set to
put replicas on each of the three nodes.

The thought being that the load would be spread across the three nodes,
so we should get close to SATA-1 type speeds in ideal cases.

I tried running without any sort of caching other than what the kernel
RBD driver / librbd provided out-of-the-box.  Virtual machines are
running on KVM managed by OpenNebula.  If I used VirtIO storage, things
were a little better, but trying out some HyperV images, the VMs chocked
on disk I/O.

I did some research and after seeing a post by Sebastian Han regarding
use of FlashCache with Ceph, I tried patching OpenNebula to support this
caching scheme.

http://dev.opennebula.org/issues/2827

The way this worked is it would slice a bit of SSD using LVM, map the
RBD using the in-kernel driver, then set up FlashCache to combine the
two.  There was a bit of cache pre-seeding done too.

This worked, and gave big speed improvements, but my implementation in
OpenNebula is a bit of a house of cards.  I've since bought my own
hardware and intend to look into this at home:

https://hackaday.io/project/10529-solar-powered-cloud-computing

Something built into Ceph's librbd or kernel driver would be fantastic
though as it would then be usable by OpenStack, libvirt, etc.
-- 
     _ ___             Stuart Longland - Systems Engineer
\  /|_) |                           T: +61 7 3535 9619
 \/ | \ |     38b Douglas Street    F: +61 7 3535 9699
   SYSTEMS    Milton QLD 4064       http://www.vrt.com.au
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com