On 09/12/13 17:07, Greg Poirier wrote:
Hi. So, I have a test cluster made up of ludicrously overpowered machines with nothing but SSDs in them. Bonded 10Gbps NICs (802.3ad layer 2+3 xmit hash policy, confirmed ~19.8 Gbps throughput with 32+ threads). I'm running rados bench, and I am currently getting less than 1 MBps throughput: sudo rados -N `hostname` bench 600 write -b 4096 -p volumes --no-cleanup -t 32 > bench_write_4096_volumes_1_32.out 2>&1' Colocated journals on the same disk, so I'm not expecting optimum throughput, but previous tests on spinning disks have shown reasonable speeds (23MB/s, 4000-6000 iops) as opposed to the 150-450 iops I'm currently getting. ceph_deploy@ssd-1001:~$ sudo ceph -s cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8 health HEALTH_WARN clock skew detected on mon.ssd-1003 monmap e1: 3 mons at {ssd-1001=10.20.69.101:6789/0,ssd-1002=10.20.69.102:6789/0,ssd-1003=10.20.69.103:6789/0 <http://10.20.69.101:6789/0,ssd-1002=10.20.69.102:6789/0,ssd-1003=10.20.69.103:6789/0>}, election epoch 20, quorum 0,1,2 ssd-1001,ssd-1002,ssd-1003 osdmap e344: 33 osds: 33 up, 33 in pgmap v10600: 1650 pgs, 6 pools, 289 MB data, 74029 objects 466 GB used, 17621 GB / 18088 GB avail 1650 active+clean client io 1263 kB/s wr, 315 op/s ceph_deploy@ssd-1001:~$ sudo ceph osd tree # idweighttype nameup/downreweight -130.03root default -210.01host ssd-1001 00.91osd.0up1 10.91osd.1up1 20.91osd.2up1 30.91osd.3up1 40.91osd.4up1 50.91osd.5up1 60.91osd.6up1 70.91osd.7up1 80.91osd.8up1 90.91osd.9up1 100.91osd.10up1 -310.01host ssd-1002 110.91osd.11up1 120.91osd.12up1 130.91osd.13up1 140.91osd.14up1 150.91osd.15up1 160.91osd.16up1 170.91osd.17up1 180.91osd.18up1 190.91osd.19up1 200.91osd.20up1 210.91osd.21up1 -410.01host ssd-1003 220.91osd.22up1 230.91osd.23up1 240.91osd.24up1 250.91osd.25up1 260.91osd.26up1 270.91osd.27up1 280.91osd.28up1 290.91osd.29up1 300.91osd.30up1 310.91osd.31up1 320.91osd.32up1 The clock skew error can safely be ignored. It's something like 2-3 ms skew, I just haven't bothered configuring away the warning. This is with a newly-created pool after deleting the last pool used for testing. Any suggestions on where to start debugging?
I'd suggest testing the components separately - try to rule out NIC (and switch) issues and SSD performance issues, then when you are sure the bits all go fast individually test how ceph performs again.
What make and model of SSD? I'd check that the firmware is up to date (sometimes makes a huge difference). I'm also wondering if you might get better performance by having (say) 7 osds and using 4 of the SSD for journals for them.
Cheers Mark _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com