Re: Cluster Performance very Poor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage,
           Thanks for the quick answer. Regarding that yes, but if i run a rados bench test with for example 10 concurrent threads does not supposed to get better performance? I'm getting the same numbers. Any other command to test perf in this situation?

Thanks a lot!

Best regards,
 

German Anders








 
--- Original message ---
Asunto: Re: Cluster Performance very Poor
De: Sage Weil <sage@xxxxxxxxxxx>
Para: German Anders <ganders@xxxxxxxxxxxx>
Cc: Mark Nelson <mark.nelson@xxxxxxxxxxx>, <ceph-users@xxxxxxxxxxxxxx>
Fecha: Friday, 27/12/2013 20:19

On Fri, 27 Dec 2013, German Anders wrote:
Hi Mark,
            I've already make those changes but the performance is almost
the same, i make another test with a DD statement and the results were the
same (i've used all of the 73GB disks for the OSD's and also put the Journal
inside the OSD device), also noticed that the network is at Gb:

Wait... this is a 1Gbps network? And you're getting around 100 MB/sec
from a single client? That is about right given what the client NIC is
capable of.

sage



ceph@ceph-node04:~$ sudo rbd -m 10.1.1.151 -p ceph-cloud --size 102400
create rbdCloud -k /etc/ceph/ceph.client.admin.keyring
ceph@ceph-node04:~$ sudo rbd map -m 10.1.1.151 rbdCloud --pool ceph-cloud
--id admin -k /etc/ceph/ceph.client.admin.keyring
ceph@ceph-node04:~$ sudo mkdir /mnt/rbdCloud
ceph@ceph-node04:~$ sudo mkfs.xfs -l size=64m,lazy-count=1 -f
/dev/rbd/ceph-cloud/rbdCloud
log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data="" isize=256    agcount=17,
agsize=1637376 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
ceph@ceph-node04:~$
ceph@ceph-node04:~$ sudo mount /dev/rbd/ceph-cloud/rbdCloud /mnt/rbdCloud
ceph@ceph-node04:~$ cd /mnt/rbdCloud
ceph@ceph-node04:/mnt/rbdCloud$
ceph@ceph-node04:/mnt/rbdCloud$ for i in 1 2 3 4; do sudo dd if=/dev/zero
of=a bs=1M count=1000 conv=fdatasync; done
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10.2545 s, 102 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10.0554 s, 104 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10.2352 s, 102 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10.1197 s, 104 MB/s
ceph@ceph-node04:/mnt/rbdCloud$

OSD tree:

ceph@ceph-node05:~/ceph-cluster-prd$ sudo ceph osd tree
# id    weight    type name    up/down    reweight
-1    3.43    root default
-2    0.6299        host ceph-node01
12    0.06999            osd.12    up    1   
13  �� 0.06999            osd.13    up    1   
14    0.06999            osd.14    up    1   
15    0.06999            osd.15    up    1   
16    0.06999            osd.16    up    1   
17    0.06999            osd.17    up    1   
18    0.06999            osd.18    up    1   
19    0.06999            osd.19    up    1   
20    0.06999            osd.20    up    1   
-3    0.6999        host ceph-node02
22    0.06999            osd.22    up    1   
23    0.06999            osd.23    up    1   
24    0.06999            osd.24    up    1   
25    0.06999            osd.25    up    1   
26    0.06999            osd.26    up    1   
27    0.06999            osd.27    up    1   
28    0.06999            osd.28    up    1   
29    0.06999            osd.29    up    1   
30    0.06999            osd.30    up    1   
31    0.06999            osd.31    up    1   
-4    0.6999        host ceph-node03
32    0.06999            osd.32    up    1   
33    0.06999            osd.33    up    1   
34    0.06999            osd.34    up    1   
35    0.06999            osd.35    up    1   
36    0.06999            osd.36    up    1   
37    0.06999            osd.37    up    1   
38    0.06999            osd.38    up    1   
39    0.06999            osd.39    up    1   
40    0.06999            osd.40    up  �� 1   
41    0.06999            osd.41    up    1   
-5    0.6999        host ceph-node04
0    0.06999            osd.0    up    1   
1    0.06999            osd.1    up    1   
2    0.06999            osd.2    up    1   
3    0.06999            osd.3    up    1   
4    0.06999            osd.4    up    1   
5    0.06999            osd.5    up    1   
6    0.06999            osd.6    up    1   
7    0.06999            osd.7    up    1   
8    0.06999            osd.8    up    1   
9    0.06999            osd.9    up    1   
-6    0.6999        host ceph-node05
10    0.06999            osd.10    up    1   
11    0.06999            osd.11    up    1   
42    0.06999            osd.42    up    1   
43    0.06999            osd.43    up    1   
44    0.06999            osd.44    up    1   
45    0.06999            osd.45    up    1   
46    0.06999            osd.46    up    1   
47    0.06999            osd.47    up    1   
48    0.06999            osd.48    up    1   
49    0.06999            osd.49    up    1


Any ideas?

Thanks in advance,
 

German Anders







 
        --- Original message ---
        Asunto: Re: Cluster Performance very Poor
        De: Mark Nelson <mark.nelson@xxxxxxxxxxx>
        Para: <ceph-users@xxxxxxxxxxxxxx>
        Fecha: Friday, 27/12/2013 15:39

        On 12/27/2013 12:19 PM, German Anders wrote:
                  Hi Cephers,

                       I've run a rados bench to measure the
              throughput of the cluster,
              and found that the performance is really poor:

              The setup is the following:

              OS: Ubuntu 12.10 Server 64 bits


              ceph-node01(mon) 10.77.0.101 ProLiant BL460c G7 32GB
              8 x 2 Ghz
                                                10.1.1.151 D2200sb
              Storage Blade
              (Firmware: 2.30)
              ceph-node02(mon) 10.77.0.102 ProLiant BL460c G7 64GB
              8 x 2 Ghz
                                                10.1.1.152 D2200sb
              Storage Blade
              (Firmware: 2.30)
              ceph-node03(mon) 10.77.0.103 ProLiant BL460c G6 32GB
              8 x 2 Ghz
                                                10.1.1.153 D2200sb
              Storage Blade
              (Firmware: 2.30)
              ceph-node04 10.77.0.104 ProLiant BL460c G7 32GB 8 x
              2 Ghz
                                               10.1.1.154 D2200sb
              Storage Blade
              (Firmware: 2.30)
              ceph-node05(deploy) 10.77.0.105 ProLiant BL460c G6
              32GB 8 x
              2 Ghz
                                                   10.1.1.155
              D2200sb Storage
              Blade (Firmware: 2.30)


        If your servers have controllers with writeback cache, please
        make sure
        it is enabled as that will likely help.


              ceph-node01:

                     /dev/sda 73G (OSD)
                     /dev/sdb 73G (OSD)
                     /dev/sdc 73G (OSD)
                     /dev/sdd 73G (OSD)
                     /dev/sde 73G (OSD)
                     /dev/sdf 73G (OSD)
                     /dev/sdg 73G (OSD)
                     /dev/sdh 73G (OSD)
                     /dev/sdi 73G (OSD)
                     /dev/sdj 73G (Journal)
                     /dev/sdk 500G (OSD)
                     /dev/sdl 500G (OSD)
                     /dev/sdn 146G (Journal)

              ceph-node02:

                     /dev/sda 73G (OSD)
                     /dev/sdb 73G (OSD)
                     /dev/sdc 73G (OSD)
                     /dev/sdd 73G (OSD)
                     /dev/sde 73G (OSD)
                     /dev/sdf 73G (OSD)
                     /dev/sdg 73G (OSD)
                     /dev/sdh 73G (OSD)
                     /dev/sdi 73G (OSD)
                     /dev/sdj 73G (Journal)
                     /dev/sdk 500G (OSD)
                     /dev/sdl 500G (OSD)
                     /dev/sdn 146G (Journal)

              ceph-node03:

                     /dev/sda 73G (OSD)
                     /dev/sdb 73G (OSD)
                     /dev/sdc 73G (OSD)
                     /dev/sdd 73G (OSD)
                     /dev/sde 73G (OSD)
                     /dev/sdf 73G (OSD)
                     /dev/sdg 73G (OSD)
                     /dev/sdh 73G (OSD)
                     /dev/sdi 73G (OSD)
                     /dev/sdj 73G (Journal)
                     /dev/sdk 500G (OSD)
                     /dev/sdl 500G (OSD)
                     /dev/sdn 73G (Journal)

              ceph-node04:

                     /dev/sda 73G (OSD)
                     /dev/sdb 73G (OSD)
                     /dev/sdc 73G (OSD)
                     /dev/sdd 73G (OSD)
                     /dev/sde 73G (OSD)
                     /dev/sdf 73G (OSD)
                     /dev/sdg 73G (OSD)
                     /dev/sdh 73G (OSD)
                     /dev/sdi 73G (OSD)
                     /dev/sdj 73G (Journal)
                     /dev/sdk 500G (OSD)
                     /dev/sdl 500G (OSD)
                     /dev/sdn 146G (Journal)

              ceph-node05:

                     /dev/sda 73G (OSD)
                     /dev/sdb 73G (OSD)
                     /dev/sdc 73G (OSD)
                     /dev/sdd 73G (OSD)
                     /dev/sde 73G (OSD)
                     /dev/sdf 73G (OSD)
                     /dev/sdg 73G (OSD)
                     /dev/sdh 73G (OSD)
                     /dev/sdi 73G (OSD)
                     /dev/sdj 73G (Journal)
                     /dev/sdk 500G (OSD)
                     /dev/sdl 500G (OSD)
                     /dev/sdn 73G (Journal)


        Am I correct in assuming that you've put all of your journals
        for every
        disk in each node on two spinning disks? This is going to be
        quite
        slow, because Ceph does a full write of the data the journal for
        every
        real write. The general solution is to either use SSDs for
        journals
        (preferably multiple fast SSDs with high write endurance and
        only 3-6
        OSD journals each), or put the journals on a partition on the
        data disk.


              And the OSD tree is:

              root@ceph-node03:/home/ceph# ceph osd tree
              # id weight type name up/down reweight
              -1 7.27 root default
              -2 1.15 host ceph-node01
              12 0.06999 osd.12 up 1
              13 0.06999 osd.13 up 1
              14 0.06999 osd.14 up 1
              15 0.06999 osd.15 up 1
              16 0.06999 osd.16 up 1
              17 0.06999 osd.17 up 1
              18 0.06999 osd.18 up 1
              19 0.06999 osd.19 up 1
              20 0.06999 osd.20 up 1
              21 0.45 osd.21 up 1
              22 0.06999 osd.22 up 1
              -3 1.53 host ceph-node02
              23 0.06999 osd.23 up 1
              24 0.06999 osd.24 up 1
              25 0.06999 osd.25 up 1
              26 0.06999 osd.26 up 1
              27 0.06999 osd.27 up 1
              28 0.06999 osd.28 up 1
              29 0.06999 osd.29 up 1
              30 0.06999 osd.30 up 1
              31 0.06999 osd.31 up 1
              32 0.45 osd.32 up 1
              33 0.45 osd.33 up 1
              -4 1.53 host ceph-node03
              34 0.06999 osd.34 up 1
              35 0.06999 osd.35 up 1
              36 0.06999 osd.36 up 1
              37 0.06999 osd.37 up 1
              38 0.06999 osd.38 up 1
              39 0.06999 osd.39 up 1
              40 0.06999 osd.40 up 1
              41 0.06999 osd.41 up 1
              42 0.06999 osd.42 up 1
              43 0.45 osd.43 up 1
              44 0.45 osd.44 up 1
              -5 1.53 host ceph-node04
              0 0.06999 osd.0 up 1
              1 0.06999 osd.1 up 1
              2 0.06999 osd.2 up 1
              3 0.06999 osd.3 up 1
              4 0.06999 osd.4 up 1
              5 0.06999 osd.5 up 1
              6 0.06999 osd.6 up 1
              7 0.06999 osd.7 up 1
              8 0.06999 osd.8 up 1
              9 0.45 osd.9 up 1
              10 0.45 osd.10 up 1
              -6 1.53 host ceph-node05
              11 0.06999 osd.11 up 1
              45 0.06999 osd.45 up 1
              46 0.06999 osd.46 up 1
              47 0.06999 osd.47 up 1
              48 0.06999 osd.48 up 1
              49 0.06999 osd.49 up 1
              50 0.06999 osd.50 up 1
              51 0.06999 osd.51 up 1
              52 0.06999 osd.52 up 1
              53 0.45 osd.53 up 1
              54 0.45 osd.54 up 1


        Based on this, it appears your 500GB drives are weighted much
        higher
        than the 73GB drives. This will help even data distribution out,
        but
        unfortunately will cause the system to be slower if all of the
        OSDs are
        in the same pool. What this does is cause the 500GB drives to
        get a
        higher proportion of the writes than the other drives, but those
        drives
        are almost certainly no faster than the other ones. Because
        there is a
        limited number of outstanding IOs you can have (due to memory
        constraints), eventually all outstanding IOs will be waiting on
        the
        500GB disks while the 73GB disks mostly sit around waiting for
        work.

        What I'd suggest doing is putting all of your 73 disks in the
        same pool
        and your 500GB disks in another pool. I suspect that if you do
        that and
        put your journals on the first partition of each disk, you'll
        see some
        improvement in your benchmark results.



              And the result:

              root@ceph-node03:/home/ceph# rados bench -p
              ceph-cloud 20 write -t 10
                  Maintaining 10 concurrent writes of 4194304
              bytes for up to 20 seconds
              or 0 objects
                  Object prefix: benchmark_data_ceph-node03_29727
                    sec Cur ops started finished avg MB/s cur MB/s
              last lat avg lat
                      0 0 0 0 0 0 - 0
                      1 10 30 20 79.9465 80 0.159295 0.378849
                      2 10 52 42 83.9604 88 0.719616 0.430293
                      3 10 74 64 85.2991 88 0.487685 0.412956
                      4 10 97 87 86.9676 92 0.351122 0.418814
                      5 10 123 113 90.3679 104 0.317011 0.418876
                      6 10 147 137 91.3012 96 0.562112 0.418178
                      7 10 172 162 92.5398 100 0.691045 0.413416
                      8 10 197 187 93.469 100 0.459424 0.415459
                      9 10 222 212 94.1915 100 0.798889 0.416093
                     10 10 248 238 95.1697 104 0.440002 0.415609
                     11 10 267 257 93.4252 76 0.48959 0.41531
                     12 10 289 279 92.9707 88 0.524622 0.420145
                     13 10 313 303 93.2016 96 1.02104 0.423955
                     14 10 336 326 93.1136 92 0.477328 0.420684
                     15 10 359 349 93.037 92 0.591118 0.418589
                     16 10 383 373 93.2204 96 0.600392 0.421916
                     17 10 407 397 93.3812 96 0.240166 0.419829
                     18 10 431 421 93.526 96 0.746706 0.420971
                     19 10 457 447 94.0757 104 0.237565 0.419025
              2013-12-27 13:13:21.817874min lat: 0.101352 max lat:
              1.81426 avg lat:
              0.418242
                    sec Cur ops started finished avg MB/s cur MB/s
              last lat avg lat
                     20 10 480 470 93.9709 92 0.489254 0.418242
                  Total time run: 20.258064
              Total writes made: 481
              Write size: 4194304
              Bandwidth (MB/sec): 94.975

              Stddev Bandwidth: 21.7799
              Max bandwidth (MB/sec): 104
              Min bandwidth (MB/sec): 0
              Average Latency: 0.420573
              Stddev Latency: 0.226378
              Max latency: 1.81426
              Min latency: 0.101352
              root@ceph-node03:/home/ceph#

              Thanks in advance,

              Best regards,

              *German Anders*









              _______________________________________________
              ceph-users mailing list
              ceph-users@xxxxxxxxxxxxxx
              http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


        _______________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux