On Fri, 27 Dec 2013, German Anders wrote: > Hi Mark, > I've already make those changes but the performance is almost > the same, i make another test with a DD statement and the results were the > same (i've used all of the 73GB disks for the OSD's and also put the Journal > inside the OSD device), also noticed that the network is at Gb: Wait... this is a 1Gbps network? And you're getting around 100 MB/sec from a single client? That is about right given what the client NIC is capable of. sage > > ceph@ceph-node04:~$ sudo rbd -m 10.1.1.151 -p ceph-cloud --size 102400 > create rbdCloud -k /etc/ceph/ceph.client.admin.keyring > ceph@ceph-node04:~$ sudo rbd map -m 10.1.1.151 rbdCloud --pool ceph-cloud > --id admin -k /etc/ceph/ceph.client.admin.keyring > ceph@ceph-node04:~$ sudo mkdir /mnt/rbdCloud > ceph@ceph-node04:~$ sudo mkfs.xfs -l size=64m,lazy-count=1 -f > /dev/rbd/ceph-cloud/rbdCloud > log stripe unit (4194304 bytes) is too large (maximum is 256KiB) > log stripe unit adjusted to 32KiB > meta-data=/dev/rbd/ceph-cloud/rbdCloud isize=256 agcount=17, > agsize=1637376 blks > = sectsz=512 attr=2, projid32bit=0 > data = bsize=4096 blocks=26214400, imaxpct=25 > = sunit=1024 swidth=1024 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=16384, version=2 > = sectsz=512 sunit=8 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > ceph@ceph-node04:~$ > ceph@ceph-node04:~$ sudo mount /dev/rbd/ceph-cloud/rbdCloud /mnt/rbdCloud > ceph@ceph-node04:~$ cd /mnt/rbdCloud > ceph@ceph-node04:/mnt/rbdCloud$ > ceph@ceph-node04:/mnt/rbdCloud$ for i in 1 2 3 4; do sudo dd if=/dev/zero > of=a bs=1M count=1000 conv=fdatasync; done > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 10.2545 s, 102 MB/s > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 10.0554 s, 104 MB/s > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 10.2352 s, 102 MB/s > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 10.1197 s, 104 MB/s > ceph@ceph-node04:/mnt/rbdCloud$ > > OSD tree: > > ceph@ceph-node05:~/ceph-cluster-prd$ sudo ceph osd tree > # id weight type name up/down reweight > -1 3.43 root default > -2 0.6299 host ceph-node01 > 12 0.06999 osd.12 up 1 > 13 0.06999 osd.13 up 1 > 14 0.06999 osd.14 up 1 > 15 0.06999 osd.15 up 1 > 16 0.06999 osd.16 up 1 > 17 0.06999 osd.17 up 1 > 18 0.06999 osd.18 up 1 > 19 0.06999 osd.19 up 1 > 20 0.06999 osd.20 up 1 > -3 0.6999 host ceph-node02 > 22 0.06999 osd.22 up 1 > 23 0.06999 osd.23 up 1 > 24 0.06999 osd.24 up 1 > 25 0.06999 osd.25 up 1 > 26 0.06999 osd.26 up 1 > 27 0.06999 osd.27 up 1 > 28 0.06999 osd.28 up 1 > 29 0.06999 osd.29 up 1 > 30 0.06999 osd.30 up 1 > 31 0.06999 osd.31 up 1 > -4 0.6999 host ceph-node03 > 32 0.06999 osd.32 up 1 > 33 0.06999 osd.33 up 1 > 34 0.06999 osd.34 up 1 > 35 0.06999 osd.35 up 1 > 36 0.06999 osd.36 up 1 > 37 0.06999 osd.37 up 1 > 38 0.06999 osd.38 up 1 > 39 0.06999 osd.39 up 1 > 40 0.06999 osd.40 up 1 > 41 0.06999 osd.41 up 1 > -5 0.6999 host ceph-node04 > 0 0.06999 osd.0 up 1 > 1 0.06999 osd.1 up 1 > 2 0.06999 osd.2 up 1 > 3 0.06999 osd.3 up 1 > 4 0.06999 osd.4 up 1 > 5 0.06999 osd.5 up 1 > 6 0.06999 osd.6 up 1 > 7 0.06999 osd.7 up 1 > 8 0.06999 osd.8 up 1 > 9 0.06999 osd.9 up 1 > -6 0.6999 host ceph-node05 > 10 0.06999 osd.10 up 1 > 11 0.06999 osd.11 up 1 > 42 0.06999 osd.42 up 1 > 43 0.06999 osd.43 up 1 > 44 0.06999 osd.44 up 1 > 45 0.06999 osd.45 up 1 > 46 0.06999 osd.46 up 1 > 47 0.06999 osd.47 up 1 > 48 0.06999 osd.48 up 1 > 49 0.06999 osd.49 up 1 > > > Any ideas? > > Thanks in advance, > > > German Anders > > > > > > > > > --- Original message --- > Asunto: Re: Cluster Performance very Poor > De: Mark Nelson <mark.nelson@xxxxxxxxxxx> > Para: <ceph-users@xxxxxxxxxxxxxx> > Fecha: Friday, 27/12/2013 15:39 > > On 12/27/2013 12:19 PM, German Anders wrote: > Hi Cephers, > > I've run a rados bench to measure the > throughput of the cluster, > and found that the performance is really poor: > > The setup is the following: > > OS: Ubuntu 12.10 Server 64 bits > > > ceph-node01(mon) 10.77.0.101 ProLiant BL460c G7 32GB > 8 x 2 Ghz > 10.1.1.151 D2200sb > Storage Blade > (Firmware: 2.30) > ceph-node02(mon) 10.77.0.102 ProLiant BL460c G7 64GB > 8 x 2 Ghz > 10.1.1.152 D2200sb > Storage Blade > (Firmware: 2.30) > ceph-node03(mon) 10.77.0.103 ProLiant BL460c G6 32GB > 8 x 2 Ghz > 10.1.1.153 D2200sb > Storage Blade > (Firmware: 2.30) > ceph-node04 10.77.0.104 ProLiant BL460c G7 32GB 8 x > 2 Ghz > 10.1.1.154 D2200sb > Storage Blade > (Firmware: 2.30) > ceph-node05(deploy) 10.77.0.105 ProLiant BL460c G6 > 32GB 8 x > 2 Ghz > 10.1.1.155 > D2200sb Storage > Blade (Firmware: 2.30) > > > If your servers have controllers with writeback cache, please > make sure > it is enabled as that will likely help. > > > ceph-node01: > > /dev/sda 73G (OSD) > /dev/sdb 73G (OSD) > /dev/sdc 73G (OSD) > /dev/sdd 73G (OSD) > /dev/sde 73G (OSD) > /dev/sdf 73G (OSD) > /dev/sdg 73G (OSD) > /dev/sdh 73G (OSD) > /dev/sdi 73G (OSD) > /dev/sdj 73G (Journal) > /dev/sdk 500G (OSD) > /dev/sdl 500G (OSD) > /dev/sdn 146G (Journal) > > ceph-node02: > > /dev/sda 73G (OSD) > /dev/sdb 73G (OSD) > /dev/sdc 73G (OSD) > /dev/sdd 73G (OSD) > /dev/sde 73G (OSD) > /dev/sdf 73G (OSD) > /dev/sdg 73G (OSD) > /dev/sdh 73G (OSD) > /dev/sdi 73G (OSD) > /dev/sdj 73G (Journal) > /dev/sdk 500G (OSD) > /dev/sdl 500G (OSD) > /dev/sdn 146G (Journal) > > ceph-node03: > > /dev/sda 73G (OSD) > /dev/sdb 73G (OSD) > /dev/sdc 73G (OSD) > /dev/sdd 73G (OSD) > /dev/sde 73G (OSD) > /dev/sdf 73G (OSD) > /dev/sdg 73G (OSD) > /dev/sdh 73G (OSD) > /dev/sdi 73G (OSD) > /dev/sdj 73G (Journal) > /dev/sdk 500G (OSD) > /dev/sdl 500G (OSD) > /dev/sdn 73G (Journal) > > ceph-node04: > > /dev/sda 73G (OSD) > /dev/sdb 73G (OSD) > /dev/sdc 73G (OSD) > /dev/sdd 73G (OSD) > /dev/sde 73G (OSD) > /dev/sdf 73G (OSD) > /dev/sdg 73G (OSD) > /dev/sdh 73G (OSD) > /dev/sdi 73G (OSD) > /dev/sdj 73G (Journal) > /dev/sdk 500G (OSD) > /dev/sdl 500G (OSD) > /dev/sdn 146G (Journal) > > ceph-node05: > > /dev/sda 73G (OSD) > /dev/sdb 73G (OSD) > /dev/sdc 73G (OSD) > /dev/sdd 73G (OSD) > /dev/sde 73G (OSD) > /dev/sdf 73G (OSD) > /dev/sdg 73G (OSD) > /dev/sdh 73G (OSD) > /dev/sdi 73G (OSD) > /dev/sdj 73G (Journal) > /dev/sdk 500G (OSD) > /dev/sdl 500G (OSD) > /dev/sdn 73G (Journal) > > > Am I correct in assuming that you've put all of your journals > for every > disk in each node on two spinning disks? This is going to be > quite > slow, because Ceph does a full write of the data the journal for > every > real write. The general solution is to either use SSDs for > journals > (preferably multiple fast SSDs with high write endurance and > only 3-6 > OSD journals each), or put the journals on a partition on the > data disk. > > > And the OSD tree is: > > root@ceph-node03:/home/ceph# ceph osd tree > # id weight type name up/down reweight > -1 7.27 root default > -2 1.15 host ceph-node01 > 12 0.06999 osd.12 up 1 > 13 0.06999 osd.13 up 1 > 14 0.06999 osd.14 up 1 > 15 0.06999 osd.15 up 1 > 16 0.06999 osd.16 up 1 > 17 0.06999 osd.17 up 1 > 18 0.06999 osd.18 up 1 > 19 0.06999 osd.19 up 1 > 20 0.06999 osd.20 up 1 > 21 0.45 osd.21 up 1 > 22 0.06999 osd.22 up 1 > -3 1.53 host ceph-node02 > 23 0.06999 osd.23 up 1 > 24 0.06999 osd.24 up 1 > 25 0.06999 osd.25 up 1 > 26 0.06999 osd.26 up 1 > 27 0.06999 osd.27 up 1 > 28 0.06999 osd.28 up 1 > 29 0.06999 osd.29 up 1 > 30 0.06999 osd.30 up 1 > 31 0.06999 osd.31 up 1 > 32 0.45 osd.32 up 1 > 33 0.45 osd.33 up 1 > -4 1.53 host ceph-node03 > 34 0.06999 osd.34 up 1 > 35 0.06999 osd.35 up 1 > 36 0.06999 osd.36 up 1 > 37 0.06999 osd.37 up 1 > 38 0.06999 osd.38 up 1 > 39 0.06999 osd.39 up 1 > 40 0.06999 osd.40 up 1 > 41 0.06999 osd.41 up 1 > 42 0.06999 osd.42 up 1 > 43 0.45 osd.43 up 1 > 44 0.45 osd.44 up 1 > -5 1.53 host ceph-node04 > 0 0.06999 osd.0 up 1 > 1 0.06999 osd.1 up 1 > 2 0.06999 osd.2 up 1 > 3 0.06999 osd.3 up 1 > 4 0.06999 osd.4 up 1 > 5 0.06999 osd.5 up 1 > 6 0.06999 osd.6 up 1 > 7 0.06999 osd.7 up 1 > 8 0.06999 osd.8 up 1 > 9 0.45 osd.9 up 1 > 10 0.45 osd.10 up 1 > -6 1.53 host ceph-node05 > 11 0.06999 osd.11 up 1 > 45 0.06999 osd.45 up 1 > 46 0.06999 osd.46 up 1 > 47 0.06999 osd.47 up 1 > 48 0.06999 osd.48 up 1 > 49 0.06999 osd.49 up 1 > 50 0.06999 osd.50 up 1 > 51 0.06999 osd.51 up 1 > 52 0.06999 osd.52 up 1 > 53 0.45 osd.53 up 1 > 54 0.45 osd.54 up 1 > > > Based on this, it appears your 500GB drives are weighted much > higher > than the 73GB drives. This will help even data distribution out, > but > unfortunately will cause the system to be slower if all of the > OSDs are > in the same pool. What this does is cause the 500GB drives to > get a > higher proportion of the writes than the other drives, but those > drives > are almost certainly no faster than the other ones. Because > there is a > limited number of outstanding IOs you can have (due to memory > constraints), eventually all outstanding IOs will be waiting on > the > 500GB disks while the 73GB disks mostly sit around waiting for > work. > > What I'd suggest doing is putting all of your 73 disks in the > same pool > and your 500GB disks in another pool. I suspect that if you do > that and > put your journals on the first partition of each disk, you'll > see some > improvement in your benchmark results. > > > > And the result: > > root@ceph-node03:/home/ceph# rados bench -p > ceph-cloud 20 write -t 10 > Maintaining 10 concurrent writes of 4194304 > bytes for up to 20 seconds > or 0 objects > Object prefix: benchmark_data_ceph-node03_29727 > sec Cur ops started finished avg MB/s cur MB/s > last lat avg lat > 0 0 0 0 0 0 - 0 > 1 10 30 20 79.9465 80 0.159295 0.378849 > 2 10 52 42 83.9604 88 0.719616 0.430293 > 3 10 74 64 85.2991 88 0.487685 0.412956 > 4 10 97 87 86.9676 92 0.351122 0.418814 > 5 10 123 113 90.3679 104 0.317011 0.418876 > 6 10 147 137 91.3012 96 0.562112 0.418178 > 7 10 172 162 92.5398 100 0.691045 0.413416 > 8 10 197 187 93.469 100 0.459424 0.415459 > 9 10 222 212 94.1915 100 0.798889 0.416093 > 10 10 248 238 95.1697 104 0.440002 0.415609 > 11 10 267 257 93.4252 76 0.48959 0.41531 > 12 10 289 279 92.9707 88 0.524622 0.420145 > 13 10 313 303 93.2016 96 1.02104 0.423955 > 14 10 336 326 93.1136 92 0.477328 0.420684 > 15 10 359 349 93.037 92 0.591118 0.418589 > 16 10 383 373 93.2204 96 0.600392 0.421916 > 17 10 407 397 93.3812 96 0.240166 0.419829 > 18 10 431 421 93.526 96 0.746706 0.420971 > 19 10 457 447 94.0757 104 0.237565 0.419025 > 2013-12-27 13:13:21.817874min lat: 0.101352 max lat: > 1.81426 avg lat: > 0.418242 > sec Cur ops started finished avg MB/s cur MB/s > last lat avg lat > 20 10 480 470 93.9709 92 0.489254 0.418242 > Total time run: 20.258064 > Total writes made: 481 > Write size: 4194304 > Bandwidth (MB/sec): 94.975 > > Stddev Bandwidth: 21.7799 > Max bandwidth (MB/sec): 104 > Min bandwidth (MB/sec): 0 > Average Latency: 0.420573 > Stddev Latency: 0.226378 > Max latency: 1.81426 > Min latency: 0.101352 > root@ceph-node03:/home/ceph# > > Thanks in advance, > > Best regards, > > *German Anders* > > > > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com