Hello German
Could you remember me what and wich type of bus/adapter your hds are connected ?
Julien
Hi Mark, I've already make those changes but the performance is almost the same, i make another test with a DD statement and the results were the same (i've used all of the 73GB disks for the OSD's and also put the Journal inside the OSD device), also noticed that the network is at Gb:
ceph@ceph-node04:~$ sudo rbd -m 10.1.1.151 -p ceph-cloud --size 102400 create rbdCloud -k /etc/ceph/ceph.client.admin.keyring ceph@ceph-node04:~$ sudo rbd map -m 10.1.1.151 rbdCloud --pool ceph-cloud --id admin -k /etc/ceph/ceph.client.admin.keyring ceph@ceph-node04:~$ sudo mkdir /mnt/rbdCloud ceph@ceph-node04:~$ sudo mkfs.xfs -l size=64m,lazy-count=1 -f /dev/rbd/ceph-cloud/rbdCloud log stripe unit (4194304 bytes) is too large (maximum is 256KiB) log stripe unit adjusted to 32KiB meta-data="" isize=256 agcount=17, agsize=1637376 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=1024 swidth=1024 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 ceph@ceph-node04:~$ ceph@ceph-node04:~$ sudo mount /dev/rbd/ceph-cloud/rbdCloud /mnt/rbdCloudceph@ceph-node04:~$ cd /mnt/rbdCloud ceph@ceph-node04:/mnt/rbdCloud$ ceph@ceph-node04:/mnt/rbdCloud$ for i in 1 2 3 4; do sudo dd if=/dev/zero of=a bs=1M count=1000 conv=fdatasync; done 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 10.2545 s, 102 MB/s 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 10.0554 s, 104 MB/s 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 10.2352 s, 102 MB/s 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 10.1197 s, 104 MB/s ceph@ceph-node04:/mnt/rbdCloud$ OSD tree:
ceph@ceph-node05:~/ceph-cluster-prd$ sudo ceph osd tree # id weight type name up/down reweight -1 3.43 root default -2 0.6299 host ceph-node01 12 0.06999 osd.12 up 1 13 0.06999 osd.13 up 1 14 0.06999 osd.14 up 1 15 0.06999 osd.15 up 1 16 0.06999 osd.16 up 1 17 0.06999 osd.17 up 1 18 0.06999 osd.18 up 1 19 0.06999 osd.19 up 1 20 0.06999 osd.20 up 1 -3 0.6999 host ceph-node02 22 0.06999 osd.22 up 1 23 0.06999 osd.23 up 1 24 0.06999 osd.24 up 1 25 0.06999 osd.25 up 1 26 0.06999 osd.26 up 1 27 0.06999 osd.27 up 1 28 0.06999 osd.28 up 1 29 0.06999 osd.29 up 1 30 0.06999 osd.30 up 1 31 0.06999 osd.31 up 1 -4 0.6999 host ceph-node03 32 0.06999 osd.32 up 1 33 0.06999 osd.33 up 1 34 0.06999 osd.34 up 1 35 0.06999 osd.35 up 1 36 0.06999 osd.36 up 1 37 0.06999 osd.37 up 1 38 0.06999 osd.38 up 1 39 0.06999 osd.39 up 1 40 0.06999 osd.40 up 1 41 0.06999 osd.41 up 1 -5 0.6999 host ceph-node04 0 0.06999 osd.0 up 1 1 0.06999 osd.1 up 1 2 0.06999 osd.2 up 1 3 0.06999 osd.3 up 1 4 0.06999 osd.4 up 1 5 0.06999 osd.5 up 1 6 0.06999 osd.6 up 1 7 0.06999 osd.7 up 1 8 0.06999 osd.8 up 1 9 0.06999 osd.9 up 1 -6 0.6999 host ceph-node05 10 0.06999 osd.10 up 1 11 0.06999 osd.11 up 1 42 0.06999 osd.42 up 1 43 0.06999 osd.43 up 1 44 0.06999 osd.44 up 1 45 0.06999 osd.45 up 1 46 0.06999 osd.46 up 1 47 0.06999 osd.47 up 1 48 0.06999 osd.48 up 1 49 0.06999 osd.49 up 1
Any ideas?
Thanks in advance, --- Original message --- Asunto: Re: Cluster Performance very Poor De: Mark Nelson <mark.nelson@xxxxxxxxxxx> Para: <ceph-users@xxxxxxxxxxxxxx> Fecha: Friday, 27/12/2013 15:39
On 12/27/2013 12:19 PM, German Anders wrote:
Hi Cephers,
I've run a rados bench to measure the throughput of the cluster, and found that the performance is really poor:
The setup is the following:
OS: Ubuntu 12.10 Server 64 bits
ceph-node01(mon) 10.77.0.101 ProLiant BL460c G7 32GB 8 x 2 Ghz 10.1.1.151 D2200sb Storage Blade (Firmware: 2.30) ceph-node02(mon) 10.77.0.102 ProLiant BL460c G7 64GB 8 x 2 Ghz 10.1.1.152 D2200sb Storage Blade (Firmware: 2.30) ceph-node03(mon) 10.77.0.103 ProLiant BL460c G6 32GB 8 x 2 Ghz 10.1.1.153 D2200sb Storage Blade (Firmware: 2.30) ceph-node04 10.77.0.104 ProLiant BL460c G7 32GB 8 x 2 Ghz 10.1.1.154 D2200sb Storage Blade (Firmware: 2.30) ceph-node05(deploy) 10.77.0.105 ProLiant BL460c G6 32GB 8 x 2 Ghz 10.1.1.155 D2200sb Storage Blade (Firmware: 2.30)
If your servers have controllers with writeback cache, please make sure it is enabled as that will likely help.
ceph-node01:
/dev/sda 73G (OSD) /dev/sdb 73G (OSD) /dev/sdc 73G (OSD) /dev/sdd 73G (OSD) /dev/sde 73G (OSD) /dev/sdf 73G (OSD) /dev/sdg 73G (OSD) /dev/sdh 73G (OSD) /dev/sdi 73G (OSD) /dev/sdj 73G (Journal) /dev/sdk 500G (OSD) /dev/sdl 500G (OSD) /dev/sdn 146G (Journal)
ceph-node02:
/dev/sda 73G (OSD) /dev/sdb 73G (OSD) /dev/sdc 73G (OSD) /dev/sdd 73G (OSD) /dev/sde 73G (OSD) /dev/sdf 73G (OSD) /dev/sdg 73G (OSD) /dev/sdh 73G (OSD) /dev/sdi 73G (OSD) /dev/sdj 73G (Journal) /dev/sdk 500G (OSD) /dev/sdl 500G (OSD) /dev/sdn 146G (Journal)
ceph-node03:
/dev/sda 73G (OSD) /dev/sdb 73G (OSD) /dev/sdc 73G (OSD) /dev/sdd 73G (OSD) /dev/sde 73G (OSD) /dev/sdf 73G (OSD) /dev/sdg 73G (OSD) /dev/sdh 73G (OSD) /dev/sdi 73G (OSD) /dev/sdj 73G (Journal) /dev/sdk 500G (OSD) /dev/sdl 500G (OSD) /dev/sdn 73G (Journal)
ceph-node04:
/dev/sda 73G (OSD) /dev/sdb 73G (OSD) /dev/sdc 73G (OSD) /dev/sdd 73G (OSD) /dev/sde 73G (OSD) /dev/sdf 73G (OSD) /dev/sdg 73G (OSD) /dev/sdh 73G (OSD) /dev/sdi 73G (OSD) /dev/sdj 73G (Journal) /dev/sdk 500G (OSD) /dev/sdl 500G (OSD) /dev/sdn 146G (Journal)
ceph-node05:
/dev/sda 73G (OSD) /dev/sdb 73G (OSD) /dev/sdc 73G (OSD) /dev/sdd 73G (OSD) /dev/sde 73G (OSD) /dev/sdf 73G (OSD) /dev/sdg 73G (OSD) /dev/sdh 73G (OSD) /dev/sdi 73G (OSD) /dev/sdj 73G (Journal) /dev/sdk 500G (OSD) /dev/sdl 500G (OSD) /dev/sdn 73G (Journal)
Am I correct in assuming that you've put all of your journals for every disk in each node on two spinning disks? This is going to be quite slow, because Ceph does a full write of the data the journal for every real write. The general solution is to either use SSDs for journals (preferably multiple fast SSDs with high write endurance and only 3-6 OSD journals each), or put the journals on a partition on the data disk.
And the OSD tree is:
root@ceph-node03:/home/ceph# ceph osd tree # id weight type name up/down reweight -1 7.27 root default -2 1.15 host ceph-node01 12 0.06999 osd.12 up 1 13 0.06999 osd.13 up 1 14 0.06999 osd.14 up 1 15 0.06999 osd.15 up 1 16 0.06999 osd.16 up 1 17 0.06999 osd.17 up 1 18 0.06999 osd.18 up 1 19 0.06999 osd.19 up 1 20 0.06999 osd.20 up 1 21 0.45 osd.21 up 1 22 0.06999 osd.22 up 1 -3 1.53 host ceph-node02 23 0.06999 osd.23 up 1 24 0.06999 osd.24 up 1 25 0.06999 osd.25 up 1 26 0.06999 osd.26 up 1 27 0.06999 osd.27 up 1 28 0.06999 osd.28 up 1 29 0.06999 osd.29 up 1 30 0.06999 osd.30 up 1 31 0.06999 osd.31 up 1 32 0.45 osd.32 up 1 33 0.45 osd.33 up 1 -4 1.53 host ceph-node03 34 0.06999 osd.34 up 1 35 0.06999 osd.35 up 1 36 0.06999 osd.36 up 1 37 0.06999 osd.37 up 1 38 0.06999 osd.38 up 1 39 0.06999 osd.39 up 1 40 0.06999 osd.40 up 1 41 0.06999 osd.41 up 1 42 0.06999 osd.42 up 1 43 0.45 osd.43 up 1 44 0.45 osd.44 up 1 -5 1.53 host ceph-node04 0 0.06999 osd.0 up 1 1 0.06999 osd.1 up 1 2 0.06999 osd.2 up 1 3 0.06999 osd.3 up 1 4 0.06999 osd.4 up 1 5 0.06999 osd.5 up 1 6 0.06999 osd.6 up 1 7 0.06999 osd.7 up 1 8 0.06999 osd.8 up 1 9 0.45 osd.9 up 1 10 0.45 osd.10 up 1 -6 1.53 host ceph-node05 11 0.06999 osd.11 up 1 45 0.06999 osd.45 up 1 46 0.06999 osd.46 up 1 47 0.06999 osd.47 up 1 48 0.06999 osd.48 up 1 49 0.06999 osd.49 up 1 50 0.06999 osd.50 up 1 51 0.06999 osd.51 up 1 52 0.06999 osd.52 up 1 53 0.45 osd.53 up 1 54 0.45 osd.54 up 1
Based on this, it appears your 500GB drives are weighted much higher than the 73GB drives. This will help even data distribution out, but unfortunately will cause the system to be slower if all of the OSDs are in the same pool. What this does is cause the 500GB drives to get a higher proportion of the writes than the other drives, but those drives are almost certainly no faster than the other ones. Because there is a limited number of outstanding IOs you can have (due to memory constraints), eventually all outstanding IOs will be waiting on the 500GB disks while the 73GB disks mostly sit around waiting for work.
What I'd suggest doing is putting all of your 73 disks in the same pool and your 500GB disks in another pool. I suspect that if you do that and put your journals on the first partition of each disk, you'll see some improvement in your benchmark results.
And the result:
root@ceph-node03:/home/ceph# rados bench -p ceph-cloud 20 write -t 10 Maintaining 10 concurrent writes of 4194304 bytes for up to 20 seconds or 0 objects Object prefix: benchmark_data_ceph-node03_29727 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 10 30 20 79.9465 80 0.159295 0.378849 2 10 52 42 83.9604 88 0.719616 0.430293 3 10 74 64 85.2991 88 0.487685 0.412956 4 10 97 87 86.9676 92 0.351122 0.418814 5 10 123 113 90.3679 104 0.317011 0.418876 6 10 147 137 91.3012 96 0.562112 0.418178 7 10 172 162 92.5398 100 0.691045 0.413416 8 10 197 187 93.469 100 0.459424 0.415459 9 10 222 212 94.1915 100 0.798889 0.416093 10 10 248 238 95.1697 104 0.440002 0.415609 11 10 267 257 93.4252 76 0.48959 0.41531 12 10 289 279 92.9707 88 0.524622 0.420145 13 10 313 303 93.2016 96 1.02104 0.423955 14 10 336 326 93.1136 92 0.477328 0.420684 15 10 359 349 93.037 92 0.591118 0.418589 16 10 383 373 93.2204 96 0.600392 0.421916 17 10 407 397 93.3812 96 0.240166 0.419829 18 10 431 421 93.526 96 0.746706 0.420971 19 10 457 447 94.0757 104 0.237565 0.419025 2013-12-27 13:13:21.817874min lat: 0.101352 max lat: 1.81426 avg lat: 0.418242 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 10 480 470 93.9709 92 0.489254 0.418242 Total time run: 20.258064 Total writes made: 481 Write size: 4194304 Bandwidth (MB/sec): 94.975
Stddev Bandwidth: 21.7799 Max bandwidth (MB/sec): 104 Min bandwidth (MB/sec): 0 Average Latency: 0.420573 Stddev Latency: 0.226378 Max latency: 1.81426 Min latency: 0.101352 root@ceph-node03:/home/ceph#
Thanks in advance,
Best regards,
*German Anders*
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|