Re: Cluster Performance very Poor

German Anders <ganders@xxxxxxxxxxxx> · Fri, 27 Dec 2013 22:11:41 -0500

Hi Sage,
           Thanks for the quick answer. Regarding that yes, but if i run a rados bench test with for example 10 concurrent threads does not supposed to get better performance? I'm getting the same numbers. Any other command to test perf in this situation?

Thanks a lot!

Best regards, 

German Anders

--- Original message --- 
Asunto: Re:  Cluster Performance very Poor 
De: Sage Weil <sage@xxxxxxxxxxx> 
Para: German Anders <ganders@xxxxxxxxxxxx> 
Cc: Mark Nelson <mark.nelson@xxxxxxxxxxx>,  <ceph-users@xxxxxxxxxxxxxx> 
Fecha: Friday, 27/12/2013 20:19

On Fri, 27 Dec 2013, German Anders wrote:
 Hi Mark,
             I've already make those changes but the performance is almost
 the same, i make another test with a DD statement and the results were the
 same (i've used all of the 73GB disks for the OSD's and also put the Journal
 inside the OSD device), also noticed that the network is at Gb:

Wait... this is a 1Gbps network?  And you're getting around 100 MB/sec 
from a single client?  That is about right given what the client NIC is 
capable of.

sage

 ceph@ceph-node04:~$ sudo rbd -m 10.1.1.151 -p ceph-cloud --size 102400
 create rbdCloud -k /etc/ceph/ceph.client.admin.keyring
 ceph@ceph-node04:~$ sudo rbd map -m 10.1.1.151 rbdCloud --pool ceph-cloud
 --id admin -k /etc/ceph/ceph.client.admin.keyring
 ceph@ceph-node04:~$ sudo mkdir /mnt/rbdCloud
 ceph@ceph-node04:~$ sudo mkfs.xfs -l size=64m,lazy-count=1 -f
 /dev/rbd/ceph-cloud/rbdCloud
 log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
 log stripe unit adjusted to 32KiB
 meta-data="" isize=256    agcount=17,
 agsize=1637376 blks
          =                       sectsz=512   attr=2, projid32bit=0
 data     =                       bsize=4096   blocks=26214400, imaxpct=25
          =                       sunit=1024   swidth=1024 blks
 naming   =version 2              bsize=4096   ascii-ci=0
 log      =internal log           bsize=4096   blocks=16384, version=2
          =                       sectsz=512   sunit=8 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
 ceph@ceph-node04:~$
 ceph@ceph-node04:~$ sudo mount /dev/rbd/ceph-cloud/rbdCloud /mnt/rbdCloud
 ceph@ceph-node04:~$ cd /mnt/rbdCloud
 ceph@ceph-node04:/mnt/rbdCloud$
 ceph@ceph-node04:/mnt/rbdCloud$ for i in 1 2 3 4; do sudo dd if=/dev/zero
 of=a bs=1M count=1000 conv=fdatasync; done
 1000+0 records in
 1000+0 records out
 1048576000 bytes (1.0 GB) copied, 10.2545 s, 102 MB/s
 1000+0 records in
 1000+0 records out
 1048576000 bytes (1.0 GB) copied, 10.0554 s, 104 MB/s
 1000+0 records in
 1000+0 records out
 1048576000 bytes (1.0 GB) copied, 10.2352 s, 102 MB/s
 1000+0 records in
 1000+0 records out
 1048576000 bytes (1.0 GB) copied, 10.1197 s, 104 MB/s
 ceph@ceph-node04:/mnt/rbdCloud$

 OSD tree:

 ceph@ceph-node05:~/ceph-cluster-prd$ sudo ceph osd tree
 # id    weight    type name    up/down    reweight
 -1    3.43    root default
 -2    0.6299        host ceph-node01
 12    0.06999            osd.12    up    1   
 13  �� 0.06999            osd.13    up    1   
 14    0.06999            osd.14    up    1   
 15    0.06999            osd.15    up    1   
 16    0.06999            osd.16    up    1   
 17    0.06999            osd.17    up    1   
 18    0.06999            osd.18    up    1   
 19    0.06999            osd.19    up    1   
 20    0.06999            osd.20    up    1   
 -3    0.6999        host ceph-node02
 22    0.06999            osd.22    up    1   
 23    0.06999            osd.23    up    1   
 24    0.06999            osd.24    up    1   
 25    0.06999            osd.25    up    1   
 26    0.06999            osd.26    up    1   
 27    0.06999            osd.27    up    1   
 28    0.06999            osd.28    up    1   
 29    0.06999            osd.29    up    1   
 30    0.06999            osd.30    up    1   
 31    0.06999            osd.31    up    1   
 -4    0.6999        host ceph-node03
 32    0.06999            osd.32    up    1   
 33    0.06999            osd.33    up    1   
 34    0.06999            osd.34    up    1   
 35    0.06999            osd.35    up    1   
 36    0.06999            osd.36    up    1   
 37    0.06999            osd.37    up    1   
 38    0.06999            osd.38    up    1   
 39    0.06999            osd.39    up    1   
 40    0.06999            osd.40    up  �� 1   
 41    0.06999            osd.41    up    1   
 -5    0.6999        host ceph-node04
 0    0.06999            osd.0    up    1   
 1    0.06999            osd.1    up    1   
 2    0.06999            osd.2    up    1   
 3    0.06999            osd.3    up    1   
 4    0.06999            osd.4    up    1   
 5    0.06999            osd.5    up    1   
 6    0.06999            osd.6    up    1   
 7    0.06999            osd.7    up    1   
 8    0.06999            osd.8    up    1   
 9    0.06999            osd.9    up    1   
 -6    0.6999        host ceph-node05
 10    0.06999            osd.10    up    1   
 11    0.06999            osd.11    up    1   
 42    0.06999            osd.42    up    1   
 43    0.06999            osd.43    up    1   
 44    0.06999            osd.44    up    1   
 45    0.06999            osd.45    up    1   
 46    0.06999            osd.46    up    1   
 47    0.06999            osd.47    up    1   
 48    0.06999            osd.48    up    1   
 49    0.06999            osd.49    up    1

 Any ideas?

 Thanks in advance,

 German Anders

              --- Original message ---
              Asunto: Re:  Cluster Performance very Poor
              De: Mark Nelson <mark.nelson@xxxxxxxxxxx>
              Para: <ceph-users@xxxxxxxxxxxxxx>
              Fecha: Friday, 27/12/2013 15:39

              On 12/27/2013 12:19 PM, German Anders wrote:
                              Hi Cephers,

                                   I've run a rados bench to measure the
                          throughput of the cluster,
                          and found that the performance is really poor:

                          The setup is the following:

                          OS: Ubuntu 12.10 Server 64 bits

                          ceph-node01(mon) 10.77.0.101 ProLiant BL460c G7 32GB
                          8 x 2 Ghz
                                                            10.1.1.151 D2200sb
                          Storage Blade
                          (Firmware: 2.30)
                          ceph-node02(mon) 10.77.0.102 ProLiant BL460c G7 64GB
                          8 x 2 Ghz
                                                            10.1.1.152 D2200sb
                          Storage Blade
                          (Firmware: 2.30)
                          ceph-node03(mon) 10.77.0.103 ProLiant BL460c G6 32GB
                          8 x 2 Ghz
                                                            10.1.1.153 D2200sb
                          Storage Blade
                          (Firmware: 2.30)
                          ceph-node04 10.77.0.104 ProLiant BL460c G7 32GB 8 x
                          2 Ghz
                                                           10.1.1.154 D2200sb
                          Storage Blade
                          (Firmware: 2.30)
                          ceph-node05(deploy) 10.77.0.105 ProLiant BL460c G6
                          32GB 8 x
                          2 Ghz
                                                               10.1.1.155
                          D2200sb Storage
                          Blade (Firmware: 2.30)

              If your servers have controllers with writeback cache, please
              make sure
              it is enabled as that will likely help.

                          ceph-node01:

                                 /dev/sda 73G (OSD)
                                 /dev/sdb 73G (OSD)
                                 /dev/sdc 73G (OSD)
                                 /dev/sdd 73G (OSD)
                                 /dev/sde 73G (OSD)
                                 /dev/sdf 73G (OSD)
                                 /dev/sdg 73G (OSD)
                                 /dev/sdh 73G (OSD)
                                 /dev/sdi 73G (OSD)
                                 /dev/sdj 73G (Journal)
                                 /dev/sdk 500G (OSD)
                                 /dev/sdl 500G (OSD)
                                 /dev/sdn 146G (Journal)

                          ceph-node02:

                                 /dev/sda 73G (OSD)
                                 /dev/sdb 73G (OSD)
                                 /dev/sdc 73G (OSD)
                                 /dev/sdd 73G (OSD)
                                 /dev/sde 73G (OSD)
                                 /dev/sdf 73G (OSD)
                                 /dev/sdg 73G (OSD)
                                 /dev/sdh 73G (OSD)
                                 /dev/sdi 73G (OSD)
                                 /dev/sdj 73G (Journal)
                                 /dev/sdk 500G (OSD)
                                 /dev/sdl 500G (OSD)
                                 /dev/sdn 146G (Journal)

                          ceph-node03:

                                 /dev/sda 73G (OSD)
                                 /dev/sdb 73G (OSD)
                                 /dev/sdc 73G (OSD)
                                 /dev/sdd 73G (OSD)
                                 /dev/sde 73G (OSD)
                                 /dev/sdf 73G (OSD)
                                 /dev/sdg 73G (OSD)
                                 /dev/sdh 73G (OSD)
                                 /dev/sdi 73G (OSD)
                                 /dev/sdj 73G (Journal)
                                 /dev/sdk 500G (OSD)
                                 /dev/sdl 500G (OSD)
                                 /dev/sdn 73G (Journal)

                          ceph-node04:

                                 /dev/sda 73G (OSD)
                                 /dev/sdb 73G (OSD)
                                 /dev/sdc 73G (OSD)
                                 /dev/sdd 73G (OSD)
                                 /dev/sde 73G (OSD)
                                 /dev/sdf 73G (OSD)
                                 /dev/sdg 73G (OSD)
                                 /dev/sdh 73G (OSD)
                                 /dev/sdi 73G (OSD)
                                 /dev/sdj 73G (Journal)
                                 /dev/sdk 500G (OSD)
                                 /dev/sdl 500G (OSD)
                                 /dev/sdn 146G (Journal)

                          ceph-node05:

                                 /dev/sda 73G (OSD)
                                 /dev/sdb 73G (OSD)
                                 /dev/sdc 73G (OSD)
                                 /dev/sdd 73G (OSD)
                                 /dev/sde 73G (OSD)
                                 /dev/sdf 73G (OSD)
                                 /dev/sdg 73G (OSD)
                                 /dev/sdh 73G (OSD)
                                 /dev/sdi 73G (OSD)
                                 /dev/sdj 73G (Journal)
                                 /dev/sdk 500G (OSD)
                                 /dev/sdl 500G (OSD)
                                 /dev/sdn 73G (Journal)

              Am I correct in assuming that you've put all of your journals
              for every
              disk in each node on two spinning disks? This is going to be
              quite
              slow, because Ceph does a full write of the data the journal for
              every
              real write. The general solution is to either use SSDs for
              journals
              (preferably multiple fast SSDs with high write endurance and
              only 3-6
              OSD journals each), or put the journals on a partition on the
              data disk.

                          And the OSD tree is:

                          root@ceph-node03:/home/ceph# ceph osd tree
                          # id weight type name up/down reweight
                          -1 7.27 root default
                          -2 1.15 host ceph-node01
                          12 0.06999 osd.12 up 1
                          13 0.06999 osd.13 up 1
                          14 0.06999 osd.14 up 1
                          15 0.06999 osd.15 up 1
                          16 0.06999 osd.16 up 1
                          17 0.06999 osd.17 up 1
                          18 0.06999 osd.18 up 1
                          19 0.06999 osd.19 up 1
                          20 0.06999 osd.20 up 1
                          21 0.45 osd.21 up 1
                          22 0.06999 osd.22 up 1
                          -3 1.53 host ceph-node02
                          23 0.06999 osd.23 up 1
                          24 0.06999 osd.24 up 1
                          25 0.06999 osd.25 up 1
                          26 0.06999 osd.26 up 1
                          27 0.06999 osd.27 up 1
                          28 0.06999 osd.28 up 1
                          29 0.06999 osd.29 up 1
                          30 0.06999 osd.30 up 1
                          31 0.06999 osd.31 up 1
                          32 0.45 osd.32 up 1
                          33 0.45 osd.33 up 1
                          -4 1.53 host ceph-node03
                          34 0.06999 osd.34 up 1
                          35 0.06999 osd.35 up 1
                          36 0.06999 osd.36 up 1
                          37 0.06999 osd.37 up 1
                          38 0.06999 osd.38 up 1
                          39 0.06999 osd.39 up 1
                          40 0.06999 osd.40 up 1
                          41 0.06999 osd.41 up 1
                          42 0.06999 osd.42 up 1
                          43 0.45 osd.43 up 1
                          44 0.45 osd.44 up 1
                          -5 1.53 host ceph-node04
                          0 0.06999 osd.0 up 1
                          1 0.06999 osd.1 up 1
                          2 0.06999 osd.2 up 1
                          3 0.06999 osd.3 up 1
                          4 0.06999 osd.4 up 1
                          5 0.06999 osd.5 up 1
                          6 0.06999 osd.6 up 1
                          7 0.06999 osd.7 up 1
                          8 0.06999 osd.8 up 1
                          9 0.45 osd.9 up 1
                          10 0.45 osd.10 up 1
                          -6 1.53 host ceph-node05
                          11 0.06999 osd.11 up 1
                          45 0.06999 osd.45 up 1
                          46 0.06999 osd.46 up 1
                          47 0.06999 osd.47 up 1
                          48 0.06999 osd.48 up 1
                          49 0.06999 osd.49 up 1
                          50 0.06999 osd.50 up 1
                          51 0.06999 osd.51 up 1
                          52 0.06999 osd.52 up 1
                          53 0.45 osd.53 up 1
                          54 0.45 osd.54 up 1

              Based on this, it appears your 500GB drives are weighted much
              higher
              than the 73GB drives. This will help even data distribution out,
              but
              unfortunately will cause the system to be slower if all of the
              OSDs are
              in the same pool. What this does is cause the 500GB drives to
              get a
              higher proportion of the writes than the other drives, but those
              drives
              are almost certainly no faster than the other ones. Because
              there is a
              limited number of outstanding IOs you can have (due to memory
              constraints), eventually all outstanding IOs will be waiting on
              the
              500GB disks while the 73GB disks mostly sit around waiting for
              work.

              What I'd suggest doing is putting all of your 73 disks in the
              same pool
              and your 500GB disks in another pool. I suspect that if you do
              that and
              put your journals on the first partition of each disk, you'll
              see some
              improvement in your benchmark results.

                          And the result:

                          root@ceph-node03:/home/ceph# rados bench -p
                          ceph-cloud 20 write -t 10
                              Maintaining 10 concurrent writes of 4194304
                          bytes for up to 20 seconds
                          or 0 objects
                              Object prefix: benchmark_data_ceph-node03_29727
                                sec Cur ops started finished avg MB/s cur MB/s
                          last lat avg lat
                                  0 0 0 0 0 0 - 0
                                  1 10 30 20 79.9465 80 0.159295 0.378849
                                  2 10 52 42 83.9604 88 0.719616 0.430293
                                  3 10 74 64 85.2991 88 0.487685 0.412956
                                  4 10 97 87 86.9676 92 0.351122 0.418814
                                  5 10 123 113 90.3679 104 0.317011 0.418876
                                  6 10 147 137 91.3012 96 0.562112 0.418178
                                  7 10 172 162 92.5398 100 0.691045 0.413416
                                  8 10 197 187 93.469 100 0.459424 0.415459
                                  9 10 222 212 94.1915 100 0.798889 0.416093
                                 10 10 248 238 95.1697 104 0.440002 0.415609
                                 11 10 267 257 93.4252 76 0.48959 0.41531
                                 12 10 289 279 92.9707 88 0.524622 0.420145
                                 13 10 313 303 93.2016 96 1.02104 0.423955
                                 14 10 336 326 93.1136 92 0.477328 0.420684
                                 15 10 359 349 93.037 92 0.591118 0.418589
                                 16 10 383 373 93.2204 96 0.600392 0.421916
                                 17 10 407 397 93.3812 96 0.240166 0.419829
                                 18 10 431 421 93.526 96 0.746706 0.420971
                                 19 10 457 447 94.0757 104 0.237565 0.419025
                          2013-12-27 13:13:21.817874min lat: 0.101352 max lat:
                          1.81426 avg lat:
                          0.418242
                                sec Cur ops started finished avg MB/s cur MB/s
                          last lat avg lat
                                 20 10 480 470 93.9709 92 0.489254 0.418242
                              Total time run: 20.258064
                          Total writes made: 481
                          Write size: 4194304
                          Bandwidth (MB/sec): 94.975

                          Stddev Bandwidth: 21.7799
                          Max bandwidth (MB/sec): 104
                          Min bandwidth (MB/sec): 0
                          Average Latency: 0.420573
                          Stddev Latency: 0.226378
                          Max latency: 1.81426
                          Min latency: 0.101352
                          root@ceph-node03:/home/ceph#

                          Thanks in advance,

                          Best regards,

                          *German Anders*

                          _______________________________________________
                          ceph-users mailing list
                          ceph-users@xxxxxxxxxxxxxx
                          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

              _______________________________________________
              ceph-users mailing list
              ceph-users@xxxxxxxxxxxxxx
              http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com