Bcache / Enhanceio with osds

andrei@xxxxxxxxxx (Andrei Mikhailovsky) · Tue, 23 Sep 2014 01:00:00 +0100 (BST)

I've done a bit of testing with Enhanceio on my cluster and I can see a definate improvement in read performance for cached data. The performance increase is around 3-4 times the cluster speed prior to using enhanceio based on large block size IO (1M and 4M). 

I've done a concurrent test of running a single "dd if=/dev/vda of=/dev/null bs=1M/4M iflag=direct" instance over 20 vms which were running on 4 host servers. Prior to enchanceio i was getting around 30-35MB/s per guest vm regardless of how many times i run the test. With enhanceio (from the second run) I was hitting over 130MB/s per vm. I've not seen any lag in performance of other vms while using enchanceio, unlike a considerable lag without the enchanceio. The ssd disk utilisation was not hitting much over 60%. 

The small block size (4K) performance hasn't changed with enhanceio, which made me think that the performance of osds themselves is limited when using small block sizes. I wasn't getting much over 2-3MB/s per guest vm. 

On a contrary, when I tried to use the firefly cache pool on the same hardware, my cluster has performed significantly slower with the cache pool. The whole cluster seemed under a lot more load and the performance has dropped to around 12-15MB/s and other guest vms were very very slow. The ssd disks were utilised 100% all the time during the test with majority of write IO. 

I admit that these tests shouldn't be considered as a definate and fully performance tests of ceph cluster as this is a live cluster with disk io actiivity outside outside of the test vms. The average load is not much (300-500 IO/s), mainly reads. However, it still indicates that there is a room for improvement in the ceph's cache pool implementation. Looking at my results, I think ceph is missing a lot of hits on the read cache, which causes osds to write a lot of data. With enchanceio I was getting well over 50% read hit ratio and the main activity on the ssds was read io unlike ceph. 

Outside of the tests, i've left enchanceio running on the osd servers. It has been a few days now and the hit ratio on the osds is around 8-11%, which seems a bit low. I was wondering if I should change the default block size of enchance io to 2K instead of the default 4K. Taking into account's ceph object size of 4M I am not sure if this will help the hit ratio. Does anyone have an idea? 

Andrei 
----- Original Message -----

> From: "Mark Nelson" <mark.nelson at inktank.com>
> To: "Robert LeBlanc" <robert at leblancnet.us>, "Mark Nelson"
> <mark.nelson at inktank.com>
> Cc: ceph-users at lists.ceph.com
> Sent: Monday, 22 September, 2014 10:49:42 PM
> Subject: Re: Bcache / Enhanceio with osds

> Likely it won't since the OSD is already coalescing journal writes.
> FWIW, I ran through a bunch of tests using seekwatcher and blktrace
> at
> 4k, 128k, and 4m IO sizes on a 4 OSD cluster (3x replication) to get
> a
> feel for what the IO patterns are like for the dm-cache developers. I
> included both the raw blktrace data and seekwatcher graphs here:

> http://nhm.ceph.com/firefly_blktrace/

> there are some interesting patterns but they aren't too easy to spot
> (I
> don't know why the Chris decided to use blue and green by default!)

> Mark

> On 09/22/2014 04:32 PM, Robert LeBlanc wrote:
> > We are still in the middle of testing things, but so far we have
> > had
> > more improvement with SSD journals than the OSD cached with bcache
> > (five
> > OSDs fronted by one SSD). We still have yet to test if adding a
> > bcache
> > layer in addition to the SSD journals provides any additional
> > improvements.
> >
> > Robert LeBlanc
> >
> > On Sun, Sep 14, 2014 at 6:13 PM, Mark Nelson
> > <mark.nelson at inktank.com
> > <mailto:mark.nelson at inktank.com>> wrote:
> >
> > On 09/14/2014 05:11 PM, Andrei Mikhailovsky wrote:
> >
> > Hello guys,
> >
> > Was wondering if anyone uses or done some testing with using
> > bcache or
> > enhanceio caching in front of ceph osds?
> >
> > I've got a small cluster of 2 osd servers, 16 osds in total and
> > 4 ssds
> > for journals. I've recently purchased four additional ssds to be
> > used
> > for ceph cache pool, but i've found performance of guest vms to be
> > slower with the cache pool for many benchmarks. The write
> > performance
> > has slightly improved, but the read performance has suffered a
> > lot (as
> > much as 60% in some tests).
> >
> > Therefore, I am planning to scrap the cache pool (at least until it
> > matures) and use either bcache or enhanceio instead.
> >
> >
> > We're actually looking at dm-cache a bit right now. (and talking
> > some of the developers about the challenges they are facing to help
> > improve our own cache tiering) No meaningful benchmarks of dm-cache
> > yet though. Bcache, enhanceio, and flashcache all look interesting
> > too. Regarding the cache pool: we've got a couple of ideas that
> > should help improve performance, especially for reads. There are
> > definitely advantages to keeping cache local to the node though. I
> > think some form of local node caching could be pretty useful going
> > forward.
> >
> >
> > Thanks
> >
> > Andrei
> >
> >
> > _________________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
> > http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> >
> > _________________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
> > http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> >

> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140923/f98bb254/attachment.htm>