Re: ceph, ssds, hdds, journals and caching

Christian Balzer <chibi@xxxxxxx> · Sat, 4 Oct 2014 12:18:23 +0900

On Fri, 3 Oct 2014 11:24:38 +0100 (BST) Andrei Mikhailovsky wrote:

> From: "Christian Balzer" <chibi@xxxxxxx> 
> 
> > To: ceph-users@xxxxxxxxxxxxxx
> > Sent: Friday, 3 October, 2014 2:06:48 AM
> > Subject: Re:  ceph, ssds, hdds, journals and caching
> 
> > On Thu, 2 Oct 2014 21:54:54 +0100 (BST) Andrei Mikhailovsky wrote:
> 
> > > Hello Cephers,
> > >
> > > I am a bit lost on the best ways of using ssd and hdds for ceph
> > > cluster
> > > which uses rbd + kvm for guest vms.
> > >
> > > At the moment I've got 2 osd servers which currently have 8 hdd
> > > osds
> > > (max 16 bays) each and 4 ssd disks. Currently, I am using 2 ssds
> > > for osd
> > > journals and I've got 2x512GB ssd spare, which are waiting to be
> > > utilised. I am running Ubuntu 12.04 with 3.13 kernel from Ubuntu
> > > 14.04
> > > and the latest firefly release.
> > >
> > In case you're planning to add more HDDs to those nodes, the obvious
> > use
> > case for those SSDs would be additional journals.
> 
> From what i've seen so far, the two ssds that i currently use for
> journaling are happy serving 8 osds and I do not have much load on them.
> Having more osds per server might change that though, you are right. But
> at the moment I was hoping to improve the read performance, especially
> for small block sizes, hense I was thinking of adding the caching layer. 
> 
> > Also depending on your use case, a kernel newer than 3.13(which also
> > is
> > not getting any upstream updates/support) might be a good idea.
> 
> Yes, indeed. I am considering the latest supported kernels from Ubuntu
> team 
> 
> > > I've tried to use ceph cache pool tier and the results were not
> > > good. My
> > > small cluster slowed down by quite a bit and i've disabled the
> > > cache
> > > tier altogether.
> > >
> > Yeah, this feature is clearly a case of "wait for the next major
> > release
> > or the one after that and try again".
> 
> Anyone know if the latest 0.80.6 firefly improves the cache behaviour?
> I've seen a bunch of changes in the cache tiering, however, I am not
> sure if these are addressing the stability of the tier or its
> efficiency? 
> 
Not a Ceph developer, but I think these were bug fixes for the most part.
I wouldn't expect major (invasive code changes) improvements before a
future release (and with future I mean probably the next one over).

> > > My question is how would one utilise the ssds in the best manner to
> > > achieve a good performance boost compared to a pure hdd setup?
> > > Should I
> > > enable block level caching (likes of bcache or similar) using all
> > > my
> > > ssds and do not bother using ssd journals? Should I keep the
> > > journals on
> > > two ssds and utilse the remaining two ssds for bcache? Or is there
> > > a
> > > better alternative?
> > >
> > This has all been discussed very recently here and the results where
> > inconclusive at best. In some cases reads were improved, but for
> > writes it
> > was potentially worse than normal Ceph journals.
> 
> > Have you monitored your storage nodes (I keep recommending atop for
> > this)
> > during a high load time? If your SSDs are becoming the bottleneck and
> > not
> > the actual disks (doubtful, but verify), more journals.
> 
> I am monitoring my ceph cluster with Zabbix and I do not have a
> significant load on the servers at all. 

While I doubt you're hitting any particular bottlenecks on your storage
servers I don't think Zabbix (very limited experience with it so I might
be wrong) monitors everything, nor does it so at sufficiently high
freqency to show what is going on during a peak or fio test from a
client. 
Thus my suggestion to stare at it live with atop (on all nodes).

> My biggest concern is the single
> thread performance of vms. From what I can see, this is the main
> downside of ceph. On average, I am not getting much over 35-40MB/s per
> thread in cold data reads. This is compared with a single hdd read
> performance of 150-160MB/s. Having about 1/4 of the raw device
> performance is a bit worring, especially compared with what i've read. I
> should be getting about 1/2 of the raw drive performance for a single
> thread, but I am not. My hope was with caching tier I can increase it. 
> 
Have a look at:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028552.html

Your numbers look very much like mine before increasing the read_ahead
buffer.

> > Other than that, maybe create a 1TB (usable space) SSD pool for
> > guests
> > with special speed requirements...
> 
> I am planning to do this for the database volumes, however, from what
> I've read so far, there are performance bottlenecks and the current
> stable firefly is not optimised for ssds. I've not tried it myself, but
> it doesn't look like having a dedicated ssd pool will bring a
> significant increase in performance. 
> 
It will be faster than HDDs and also has future potential for improvement,
but don't expect miracles indeed.
If in doubt, just test it. ^^

Christian

> Has anyone tried using bcache of dm-cache with ceph? Any tips on how to
> integrate it? From what I've read so far, they require you to format the
> existing hdd, which is not feasible if you have an existing live
> cluster. 
> 
> Cheers 
> 
> > Christian
> > --
> > Christian Balzer Network/Systems Engineer
> > chibi@xxxxxxx Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com