Ceph 0.94 (and lower) performance on >1 hosts ??

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

As I explained in various previous threads, I’m having a hard time getting the most out of my test ceph cluster.

I’m benching things with rados bench.

All Ceph hosts are on the same 10GB switch.

 

Basically, I know I can get about 1GB/s of disk write performance per host, when I bench things with dd (hundreds of dd threads) +iperf 10gbit inbound+iperf 10gbit outbound.

I also can get 2GB/s or even more if I don’t bench the network at the same time, so yes, there is a bottleneck between disks and network, but I can’t identify which one, and it’s not relevant for what follows anyway

(Dell R510 + MD1200 + PERC H700 + PERC H800 here, if anyone has hints about this strange bottleneck though…)

 

My hosts each are connected though a single 10Gbits/s link for now.

 

My problem is the following. Please note I see the same kind of poor performance with replicated pools...

When testing EC pools, I ended putting a 4+1 pool on a single node in order to track down the ceph bottleneck.

On that node, I can get approximately 420MB/s write performance using rados bench, but that’s fair enough since the dstat output shows that real data throughput on disks is about 800+MB/s (that’s the ceph journal effect, I presume).

 

I tested Ceph on my other standalone nodes : I can also get around 420MB/s, since they’re identical.

I’m testing things with 5 10Gbits/s clients, each running rados bench.

 

But what I really don’t get is the following :

 

-          With 1 host : throughput is 420MB/s

-          With 2 hosts : I get 640MB/s. That’s surely not 2x420MB/s.

-          With 5 hosts : I get around 1375MB/s . That’s far from the expected 2GB/s.

 

The network never is maxed out, nor are the disks or CPUs.

The hosts throughput I see with rados bench seems to match the dstat throughput.

That’s as if each additional host was only capable of adding 220MB/s of throughput. Compare this to the 1GB/s they are capable of (420MB/s with journals)…

 

I’m therefore wondering what could possibly be so wrong with my setup ??

Why would it impact so much the performance to add hosts ?

 

On the hardware side, I have Broadcam BCM57711 10-Gigabit PCIe cards.

I know, not perfect, but not THAT bad neither… ?

 

Any hint would be greatly appreciated !

 

Thanks

Frederic Schaer

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux