I’d rather like to see this implemented at the hypervisor level, i.e.: QEMU, so we can have a common layer for all the storage backends. Although this is less portable... > On 17 Mar 2016, at 11:00, Nick Fisk <nick@xxxxxxxxxx> wrote: > > > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Daniel Niasoff >> Sent: 16 March 2016 21:02 >> To: Nick Fisk <nick@xxxxxxxxxx>; 'Van Leeuwen, Robert' >> <rovanleeuwen@xxxxxxxx>; 'Jason Dillaman' <dillaman@xxxxxxxxxx> >> Cc: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: Local SSD cache for ceph on each compute node. >> >> Hi Nick, >> >> Your solution requires manual configuration for each VM and cannot be >> setup as part of an automated OpenStack deployment. > > Absolutely, potentially flaky as well. > >> >> It would be really nice if it was a hypervisor based setting as opposed to > a VM >> based setting. > > Yes, I can't wait until we can just specify "rbd_cache_device=/dev/ssd" in > the ceph.conf and get it to write to that instead. Ideally ceph would also > provide some sort of lightweight replication for the cache devices, but > otherwise a iSCSI SSD farm or switched SAS could be used so that the caching > device is not tied to one physical host. > >> >> Thanks >> >> Daniel >> >> -----Original Message----- >> From: Nick Fisk [mailto:nick@xxxxxxxxxx] >> Sent: 16 March 2016 08:59 >> To: Daniel Niasoff <daniel@xxxxxxxxxxxxxx>; 'Van Leeuwen, Robert' >> <rovanleeuwen@xxxxxxxx>; 'Jason Dillaman' <dillaman@xxxxxxxxxx> >> Cc: ceph-users@xxxxxxxxxxxxxx >> Subject: RE: Local SSD cache for ceph on each compute node. >> >> >> >>> -----Original Message----- >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >>> Of Daniel Niasoff >>> Sent: 16 March 2016 08:26 >>> To: Van Leeuwen, Robert <rovanleeuwen@xxxxxxxx>; Jason Dillaman >>> <dillaman@xxxxxxxxxx> >>> Cc: ceph-users@xxxxxxxxxxxxxx >>> Subject: Re: Local SSD cache for ceph on each compute node. >>> >>> Hi Robert, >>> >>>> Caching writes would be bad because a hypervisor failure would result >>>> in >>> loss of the cache which pretty much guarantees inconsistent data on >>> the ceph volume. >>>> Also live-migration will become problematic compared to running >>> everything from ceph since you will also need to migrate the >> local-storage. >> >> I tested a solution using iSCSI for the cache devices. Each VM was using >> flashcache with a combination of a iSCSI LUN from a SSD and a RBD. This > gets >> around the problem of moving things around or if the hypervisor goes down. >> It's not local caching but the write latency is at least 10x lower than > the RBD. >> Note I tested it, I didn't put it into production :-) >> >>> >>> My understanding of how a writeback cache should work is that it >>> should only take a few seconds for writes to be streamed onto the >>> network and is focussed on resolving the speed issue of small sync >>> writes. The writes >> would >>> be bundled into larger writes that are not time sensitive. >>> >>> So there is potential for a few seconds data loss but compared to the >> current >>> trend of using ephemeral storage to solve this issue, it's a major >>> improvement. >> >> Yeah, problem is a couple of seconds data loss mean different things to >> different people. >> >>> >>>> (considering the time required for setting up and maintaining the >>>> extra >>> caching layer on each vm, unless you work for free ;-) >>> >>> Couldn't agree more there. >>> >>> I am just so surprised how the openstack community haven't looked to >>> resolve this issue. Ephemeral storage is a HUGE compromise unless you >>> have built in failure into every aspect of your application but many >>> people use openstack as a general purpose devstack. >>> >>> (Jason pointed out his blueprint but I guess it's at least a year or 2 >> away - >>> http://tracker.ceph.com/projects/ceph/wiki/Rbd_-_ordered_crash- >>> consistent_write-back_caching_extension) >>> >>> I see articles discussing the idea such as this one >>> >>> http://www.sebastien-han.fr/blog/2014/06/10/ceph-cache-pool-tiering- >>> scalable-cache/ >>> >>> but no real straightforward validated setup instructions. >>> >>> Thanks >>> >>> Daniel >>> >>> >>> -----Original Message----- >>> From: Van Leeuwen, Robert [mailto:rovanleeuwen@xxxxxxxx] >>> Sent: 16 March 2016 08:11 >>> To: Jason Dillaman <dillaman@xxxxxxxxxx>; Daniel Niasoff >>> <daniel@xxxxxxxxxxxxxx> >>> Cc: ceph-users@xxxxxxxxxxxxxx >>> Subject: Re: Local SSD cache for ceph on each compute node. >>> >>>> Indeed, well understood. >>>> >>>> As a shorter term workaround, if you have control over the VMs, you >>>> could >>> always just slice out an LVM volume from local SSD/NVMe and pass it >>> through to the guest. Within the guest, use dm-cache (or similar) to >>> add >> a >>> cache front-end to your RBD volume. >>> >>> If you do this you need to setup your cache as read-cache only. >>> Caching writes would be bad because a hypervisor failure would result >>> in >> loss >>> of the cache which pretty much guarantees inconsistent data on the >>> ceph volume. >>> Also live-migration will become problematic compared to running >>> everything from ceph since you will also need to migrate the > local-storage. >>> >>> The question will be if adding more ram (== more read cache) would not >>> be more convenient and cheaper in the end. >>> (considering the time required for setting up and maintaining the >>> extra caching layer on each vm, unless you work for free ;-) Also >>> reads from >> ceph >>> are pretty fast compared to the biggest bottleneck: (small) sync writes. >>> So it is debatable how much performance you would win except for some >>> use-cases with lots of reads on very large data sets which are also >>> very latency sensitive. >>> >>> Cheers, >>> Robert van Leeuwen >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Cheers. –––– Sébastien Han Senior Cloud Architect "Always give 100%. Unless you're giving blood." Mail: seb@xxxxxxxxxx Address: 11 bis, rue Roquépine - 75008 Paris
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com