Designing a cluster with ceph and benchmark (ceph vs ext4)

chibi@xxxxxxx (Christian Balzer) · Sat, 24 May 2014 14:43:11 +0900

Hello,

On Fri, 23 May 2014 15:41:23 -0300 Listas at Adminlinux wrote:

> Hi !
> 
> I have failover clusters for some aplications. Generally with 2 members 
> configured with Ubuntu + Drbd + Ext4. For example, my IMAP cluster works 
> fine with ~ 50k email accounts and my HTTP cluster hosts ~2k sites.
> 
My mailbox servers are also multiple DRBD based cluster pairs. 
For performance in fully redundant storage there is isn't anything better
(in the OSS, generic hardware section at least).

> See design here: http://adminlinux.com.br/cluster_design.txt
> 
> I would like to provide load balancing instead of just failover. So, I 
> would like to use a distributed architecture of the filesystem. As we 
> know, Ext4 isn't a distributed filesystem. So wish to use Ceph in my 
> clusters.
>
You will find that all cluster/distributed filesystems have severe
performance shortcomings when compared to something like Ext4.

On top of that, CephFS isn't ready for production as the MDS isn't HA.

A potential middle way might be to use Ceph/RBD volumes formatted in Ext4.
That doesn't give you shared access, but it will allow you to separate
storage and compute nodes, so when one compute node becomes busy, mount
that volume from a more powerful compute node instead.

That all said, I can't see any way and reason to replace my mailbox DRBD
clusters with Ceph in the foreseeable future.
To get similar performance/reliability to DRBD I would have to spend 3-4
times the money.

Where Ceph/RBD works well is situations where you can't fit the compute
needs into a storage node (as required with DRBD) and where you want to
access things from multiple compute nodes, primarily for migration
purposes. 
In short, as a shared storage for VMs.

> Any suggestions for design of the cluster with Ubuntu+Ceph?
> 
> I built a simple cluster of 2 servers to test simultaneous reading and 
> writing with Ceph. My conf:  http://adminlinux.com.br/ceph_conf.txt
> 
Again, CephFS isn't ready for production, but other than that I know very
little about it as I don't use it.
However your version of Ceph is severely outdated, you really should be
looking at something more recent to rule out you're experience long fixed
bugs. The same goes for your entire setup and kernel.

Also Ceph only starts to perform decently with many OSDs (disks) and
the journals on SSDs instead of being on the same disk.
Think DRBD AL metadata-internal, but with MUCH more impact.

Regards,

Christian
> But in my simultaneous benchmarks found errors in reading and writing. I 
> ran "iozone -t 5 -r 4k -s 2m" simultaneously on both servers in the 
> cluster. The performance was poor and had errors like this:
> 
> Error in file: Found ?0? Expecting ?6d6d6d6d6d6d6d6d? addr b6600000
> Error in file: Position 1060864
> Record # 259 Record size 4 kb
> where b6600000 loop 0
> 
> Performance graphs of benchmark: http://adminlinux.com.br/ceph_bench.html
> 
> Can you help me find what I did wrong?
> 
> Thanks !
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/