Re: 1MB/s throughput to 33-ssd test cluster

Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> · Mon, 09 Dec 2013 17:33:17 +1300

On 09/12/13 17:07, Greg Poirier wrote:
Hi.

So, I have a test cluster made up of ludicrously overpowered machines
with nothing but SSDs in them. Bonded 10Gbps NICs (802.3ad layer 2+3
xmit hash policy, confirmed ~19.8 Gbps throughput with 32+ threads). I'm
running rados bench, and I am currently getting less than 1 MBps throughput:

sudo rados -N `hostname` bench 600 write -b 4096 -p volumes --no-cleanup
-t 32 > bench_write_4096_volumes_1_32.out 2>&1'

Colocated journals on the same disk, so I'm not expecting optimum
throughput, but previous tests on spinning disks have shown reasonable
speeds (23MB/s, 4000-6000 iops) as opposed to the 150-450 iops I'm
currently getting.

ceph_deploy@ssd-1001:~$ sudo ceph -s
     cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8
      health HEALTH_WARN clock skew detected on mon.ssd-1003
      monmap e1: 3 mons at
{ssd-1001=10.20.69.101:6789/0,ssd-1002=10.20.69.102:6789/0,ssd-1003=10.20.69.103:6789/0
<http://10.20.69.101:6789/0,ssd-1002=10.20.69.102:6789/0,ssd-1003=10.20.69.103:6789/0>},
election epoch 20, quorum 0,1,2 ssd-1001,ssd-1002,ssd-1003
      osdmap e344: 33 osds: 33 up, 33 in
       pgmap v10600: 1650 pgs, 6 pools, 289 MB data, 74029 objects
             466 GB used, 17621 GB / 18088 GB avail
                 1650 active+clean
   client io 1263 kB/s wr, 315 op/s

ceph_deploy@ssd-1001:~$ sudo ceph osd tree
# idweighttype nameup/downreweight
-130.03root default
-210.01host ssd-1001
00.91osd.0up1
10.91osd.1up1
20.91osd.2up1
30.91osd.3up1
40.91osd.4up1
50.91osd.5up1
60.91osd.6up1
70.91osd.7up1
80.91osd.8up1
90.91osd.9up1
100.91osd.10up1
-310.01host ssd-1002
110.91osd.11up1
120.91osd.12up1
130.91osd.13up1
140.91osd.14up1
150.91osd.15up1
160.91osd.16up1
170.91osd.17up1
180.91osd.18up1
190.91osd.19up1
200.91osd.20up1
210.91osd.21up1
-410.01host ssd-1003
220.91osd.22up1
230.91osd.23up1
240.91osd.24up1
250.91osd.25up1
260.91osd.26up1
270.91osd.27up1
280.91osd.28up1
290.91osd.29up1
300.91osd.30up1
310.91osd.31up1
320.91osd.32up1

The clock skew error can safely be ignored. It's something like 2-3 ms
skew, I just haven't bothered configuring away the warning.

This is with a newly-created pool after deleting the last pool used for
testing.

Any suggestions on where to start debugging?

I'd suggest testing the components separately - try to rule out NIC (and 
switch) issues and SSD performance issues, then when you are sure the 
bits all go fast individually test how ceph performs again.

What make and model of SSD? I'd check that the firmware is up to date 
(sometimes makes a huge difference). I'm also wondering if you might get 
better performance by having (say) 7 osds and using 4 of the SSD for 
journals for them.

Cheers

Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com