Designing a cluster with ceph and benchmark (ceph vs ext4)

chibi@xxxxxxx (Christian Balzer) · Sun, 25 May 2014 23:51:25 +0900

On Sat, 24 May 2014 13:14:42 +0100 Pieter Koorts wrote:

> If looking for a DRBD alternative and not wanting to use CephFS is it
> not possible to just use something like OCFS2 or GFS on top of a RDB
> block device and all worker nodes accessing it via GFS or OCFS2
> (obviously with write-through mode)?
> 
> Would this method not present some advantages over DRBD?
> 
If the size of your data exceeds that of what can sensibly be put on two
nodes, maybe.

But again, going from a local FS like Ext4 to OCFS2 or GFS will be painful
enough and for RBD to perform on par with DRBD you need to spend a LOT more
in nodes, disks (and SSDs) and most importantly in a high performance
network with associated expensive switches (as opposed to direct
interconnects in the case of DRBD).

> DRBD has its uses and will never go away but it does have limited
> scalability in the general sense.
> 
It certainly does, but until you hit those scalability limits it is hard
to beat. 
Given the one use case of of the OP (and myself), mailbox servers, the fact
that reads are local is a tremendous benefit. 
And with proxies like dovecot or perdition there isn't any real
scalability issue either.

Christian

> Pieter
> 
> 
> On 24 May 2014, at 06:43, Christian Balzer <chibi at gol.com> wrote:
> 
> > 
> > Hello,
> > 
> > On Fri, 23 May 2014 15:41:23 -0300 Listas at Adminlinux wrote:
> > 
> >> Hi !
> >> 
> >> I have failover clusters for some aplications. Generally with 2
> >> members configured with Ubuntu + Drbd + Ext4. For example, my IMAP
> >> cluster works fine with ~ 50k email accounts and my HTTP cluster
> >> hosts ~2k sites.
> >> 
> > My mailbox servers are also multiple DRBD based cluster pairs. 
> > For performance in fully redundant storage there is isn't anything
> > better (in the OSS, generic hardware section at least).
> > 
> >> See design here: http://adminlinux.com.br/cluster_design.txt
> >> 
> >> I would like to provide load balancing instead of just failover. So,
> >> I would like to use a distributed architecture of the filesystem. As
> >> we know, Ext4 isn't a distributed filesystem. So wish to use Ceph in
> >> my clusters.
> >> 
> > You will find that all cluster/distributed filesystems have severe
> > performance shortcomings when compared to something like Ext4.
> > 
> > On top of that, CephFS isn't ready for production as the MDS isn't HA.
> > 
> > A potential middle way might be to use Ceph/RBD volumes formatted in
> > Ext4. That doesn't give you shared access, but it will allow you to
> > separate storage and compute nodes, so when one compute node becomes
> > busy, mount that volume from a more powerful compute node instead.
> > 
> > That all said, I can't see any way and reason to replace my mailbox
> > DRBD clusters with Ceph in the foreseeable future.
> > To get similar performance/reliability to DRBD I would have to spend
> > 3-4 times the money.
> > 
> > Where Ceph/RBD works well is situations where you can't fit the compute
> > needs into a storage node (as required with DRBD) and where you want to
> > access things from multiple compute nodes, primarily for migration
> > purposes. 
> > In short, as a shared storage for VMs.
> > 
> >> Any suggestions for design of the cluster with Ubuntu+Ceph?
> >> 
> >> I built a simple cluster of 2 servers to test simultaneous reading
> >> and writing with Ceph. My conf:
> >> http://adminlinux.com.br/ceph_conf.txt
> >> 
> > Again, CephFS isn't ready for production, but other than that I know
> > very little about it as I don't use it.
> > However your version of Ceph is severely outdated, you really should be
> > looking at something more recent to rule out you're experience long
> > fixed bugs. The same goes for your entire setup and kernel.
> > 
> > Also Ceph only starts to perform decently with many OSDs (disks) and
> > the journals on SSDs instead of being on the same disk.
> > Think DRBD AL metadata-internal, but with MUCH more impact.
> > 
> > Regards,
> > 
> > Christian
> >> But in my simultaneous benchmarks found errors in reading and
> >> writing. I ran "iozone -t 5 -r 4k -s 2m" simultaneously on both
> >> servers in the cluster. The performance was poor and had errors like
> >> this:
> >> 
> >> Error in file: Found ?0? Expecting ?6d6d6d6d6d6d6d6d? addr b6600000
> >> Error in file: Position 1060864
> >> Record # 259 Record size 4 kb
> >> where b6600000 loop 0
> >> 
> >> Performance graphs of benchmark:
> >> http://adminlinux.com.br/ceph_bench.html
> >> 
> >> Can you help me find what I did wrong?
> >> 
> >> Thanks !
> >> 
> > 
> > 
> > -- 
> > Christian Balzer        Network/Systems Engineer                
> > chibi at gol.com   	Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/