Re: poor performance

Aleksey Samarin <nrg3tik@xxxxxxxxx> · Sun, 4 Nov 2012 19:13:25 +0400

What may be possible solutions?
Update centos to 6.3?

About issue with writes to lots of disk, i think parallel dd command
will be good as test! :)

2012/11/4 Mark Nelson <mark.nelson@xxxxxxxxxxx>:
> On 11/04/2012 07:18 AM, Aleksey Samarin wrote:
>>
>> Well, i create ceph cluster with 2 osd ( 1 osd per node),  2 mon, 2 mds.
>> here is what I did:
>>   ceph osd pool create bench
>>   ceph osd tell \* bench
>>   rados -p bench bench 30 write --no-cleanup
>> output:
>>
>>   Maintaining 16 concurrent writes of 4194304 bytes for at least 30
>> seconds.
>>   Object prefix: benchmark_data_host01_11635
>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
>> lat
>>       0       0         0         0         0         0         -
>> 0
>>       1      16        16         0         0         0         -
>> 0
>>       2      16        37        21   41.9911        42  0.139005
>> 1.08941
>>       3      16        53        37   49.3243        64  0.754114
>> 1.09392
>>       4      16        75        59   58.9893        88  0.284647
>> 0.914221
>>       5      16        89        73   58.3896        56  0.072228
>> 0.881008
>>       6      16        95        79   52.6575        24   1.56959
>> 0.961477
>>       7      16       111        95   54.2764        64  0.046105
>> 1.08791
>>       8      16       128       112   55.9906        68  0.035714
>> 1.04594
>>       9      16       150       134   59.5457        88  0.046298
>> 1.04415
>>      10      16       166       150   59.9901        64  0.048635
>> 0.986384
>>      11      16       176       160   58.1723        40  0.727784
>> 0.988408
>>      12      16       206       190   63.3231       120   0.28869
>> 0.946624
>>      13      16       225       209   64.2976        76   1.34472
>> 0.919464
>>      14      16       263       247   70.5605       152  0.070926
>> 0.90046
>>      15      16       295       279   74.3887       128  0.041517
>> 0.830466
>>      16      16       315       299   74.7388        80  0.296037
>> 0.841527
>>      17      16       333       317   74.5772        72  0.286097
>> 0.849558
>>      18      16       340       324   71.9891        28  0.295084
>> 0.83922
>>      19      16       343       327   68.8317        12   1.46948
>> 0.845797
>> 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat:
>> 0.861539
>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
>> lat
>>      20      16       378       362    72.389       140  0.566232
>> 0.861539
>>      21      16       400       384   73.1313        88  0.038835
>> 0.857785
>>      22      16       404       388   70.5344        16  0.801216
>> 0.857002
>>      23      16       413       397   69.0327        36  0.062256
>> 0.86376
>>      24      16       428       412   68.6543        60  0.042583
>> 0.89389
>>      25      16       450       434   69.4277        88  0.383877
>> 0.905833
>>      26      16       472       456   70.1415        88  0.269878
>> 0.898023
>>      27      16       472       456   67.5437         0         -
>> 0.898023
>>      28      16       512       496   70.8448        80  0.056798
>> 0.891163
>>      29      16       530       514   70.8843        72   1.20653
>> 0.898112
>>      30      16       542       526   70.1212        48  0.744383
>> 0.890733
>>   Total time run:         30.174151
>> Total writes made:      543
>> Write size:             4194304
>> Bandwidth (MB/sec):     71.982
>>
>> Stddev Bandwidth:       38.318
>> Max bandwidth (MB/sec): 152
>> Min bandwidth (MB/sec): 0
>> Average Latency:        0.889026
>> Stddev Latency:         0.677425
>> Max latency:            2.94467
>> Min latency:            0.035714
>>
>
> Much better for 1 disk per node!  I suspect that lack of syncfs is hurting
> you, or perhaps some other issue with writes to lots of disks at the same
> time.
>
>
>>
>> 2012/11/4 Aleksey Samarin <nrg3tik@xxxxxxxxx>:
>>>
>>> Ok!
>>> Well, I'll take these tests and write about the results.
>>>
>>> btw,
>>> disks are the same, as some may be faster than others?
>>>
>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>:
>>>>
>>>> That's only nine — where are the other three? If you have three slow
>>>> disks that could definitely cause the troubles you're seeing.
>>>>
>>>> Also, what Mark said about sync versus syncfs.
>>>>
>>>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> It`s ok!
>>>>>
>>>>> Output:
>>>>>
>>>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 11.441035 sec at 91650 KB/sec
>>>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 13.225048 sec at 79287 KB/sec
>>>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 13.917157 sec at 75344 KB/sec
>>>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 16.453375 sec at 63730 KB/sec
>>>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 17.108887 sec at 61288 KB/sec
>>>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 11.834639 sec at 88602 KB/sec
>>>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 12.418276 sec at 84438 KB/sec
>>>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 13.011955 sec at 80585 KB/sec
>>>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks
>>>>> of 4096 KB in 13.541710 sec at 77433 KB/sec
>>>>>
>>>>> All the best.
>>>>>
>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>:
>>>>>>
>>>>>> [Sorry for the blank email; I missed!]
>>>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi!
>>>>>>> This command? ceph tell osd \* bench
>>>>>>> Output:  tell target 'osd' not a valid entity name
>>>>>>
>>>>>>
>>>>>> I guess it's "ceph osd tell \* bench". Try that one. :)
>>>>>>
>>>>>>> Well, i did pool by command ceph osd pool create bench2 120
>>>>>>> This output of rados -p bench2 bench 30 write --no-cleanup
>>>>>>>
>>>>>>> rados -p bench2 bench 30 write --no-cleanup
>>>>>>>
>>>>>>>   Maintaining 16 concurrent writes of 4194304 bytes for at least 30
>>>>>>> seconds.
>>>>>>>   Object prefix: benchmark_data_host01_5827
>>>>>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
>>>>>>> avg lat
>>>>>>>       0       0         0         0         0         0         -
>>>>>>> 0
>>>>>>>       1      16        29        13   51.9885        52  0.489268
>>>>>>> 0.186749
>>>>>>>       2      16        52        36   71.9866        92   1.87226
>>>>>>> 0.711888
>>>>>>>       3      16        57        41    54.657        20  0.089697
>>>>>>> 0.697821
>>>>>>>       4      16        60        44   43.9923        12   1.61868
>>>>>>> 0.765361
>>>>>>>       5      16        60        44   35.1941         0         -
>>>>>>> 0.765361
>>>>>>>       6      16        60        44   29.3285         0         -
>>>>>>> 0.765361
>>>>>>>       7      16        60        44   25.1388         0         -
>>>>>>> 0.765361
>>>>>>>       8      16        61        45   22.4964         1   5.89643
>>>>>>> 0.879384
>>>>>>>       9      16        62        46   20.4412         4    6.0234
>>>>>>> 0.991211
>>>>>>>      10      16        62        46   18.3971         0         -
>>>>>>> 0.991211
>>>>>>>      11      16        63        47   17.0883         2   8.79749
>>>>>>> 1.1573
>>>>>>>      12      16        63        47   15.6643         0         -
>>>>>>> 1.1573
>>>>>>>      13      16        63        47   14.4593         0         -
>>>>>>> 1.1573
>>>>>>>      14      16        63        47   13.4266         0         -
>>>>>>> 1.1573
>>>>>>>      15      16        63        47   12.5315         0         -
>>>>>>> 1.1573
>>>>>>>      16      16        63        47   11.7483         0         -
>>>>>>> 1.1573
>>>>>>>      17      16        63        47   11.0572         0         -
>>>>>>> 1.1573
>>>>>>>      18      16        63        47   10.4429         0         -
>>>>>>> 1.1573
>>>>>>>      19      16        63        47   9.89331         0         -
>>>>>>> 1.1573
>>>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat:
>>>>>>> 1.1573
>>>>>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
>>>>>>> avg lat
>>>>>>>      20      16        63        47   9.39865         0         -
>>>>>>> 1.1573
>>>>>>>      21      16        63        47   8.95105         0         -
>>>>>>> 1.1573
>>>>>>>      22      16        63        47   8.54419         0         -
>>>>>>> 1.1573
>>>>>>>      23      16        63        47   8.17271         0         -
>>>>>>> 1.1573
>>>>>>>      24      16        63        47   7.83218         0         -
>>>>>>> 1.1573
>>>>>>>      25      16        63        47    7.5189         0         -
>>>>>>> 1.1573
>>>>>>>      26      16        63        47   7.22972         0         -
>>>>>>> 1.1573
>>>>>>>      27      16        81        65   9.62824       4.5  0.076456
>>>>>>> 4.9428
>>>>>>>      28      16       118       102   14.5693       148  0.427273
>>>>>>> 4.34095
>>>>>>>      29      16       119       103   14.2049         4   1.57897
>>>>>>> 4.31414
>>>>>>>      30      16       132       116   15.4645        52   2.25424
>>>>>>> 4.01492
>>>>>>>      31      16       133       117   15.0946         4  0.974652
>>>>>>> 3.98893
>>>>>>>      32      16       133       117   14.6229         0         -
>>>>>>> 3.98893
>>>>>>>   Total time run:         32.575351
>>>>>>> Total writes made:      133
>>>>>>> Write size:             4194304
>>>>>>> Bandwidth (MB/sec):     16.331
>>>>>>>
>>>>>>> Stddev Bandwidth:       31.8794
>>>>>>> Max bandwidth (MB/sec): 148
>>>>>>> Min bandwidth (MB/sec): 0
>>>>>>> Average Latency:        3.91583
>>>>>>> Stddev Latency:         7.42821
>>>>>>> Max latency:            25.24
>>>>>>> Min latency:            0.036475
>>>>>>>
>>>>>>> Im think problem not in pg. This output of ceph pg dump  >
>>>>>>> http://pastebin.com/BqLsyMBC
>>>>>>
>>>>>>
>>>>>> Well, that did improve it a bit; but yes, I think there's something
>>>>>> else going on. Just wanted to verify. :)
>>>>>>
>>>>>>>
>>>>>>> I have still no idea.
>>>>>>>
>>>>>>> All the best. Alex
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>:
>>>>>>>>
>>>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi all
>>>>>>>>>
>>>>>>>>> Im planning use ceph for cloud storage.
>>>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb
>>>>>>>>> disks per node.
>>>>>>>>> Centos 6.2
>>>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64
>>>>>>>>> This is my config http://pastebin.com/Pzxafnsm
>>>>>>>>> journal on tmpfs
>>>>>>>>> well, im create bench pool and test it:
>>>>>>>>> ceph osd pool create bench
>>>>>>>>> rados -p bench bench 30 write
>>>>>>>>>
>>>>>>>>>   Total time run:         43.258228
>>>>>>>>>   Total writes made:      151
>>>>>>>>>   Write size:             4194304
>>>>>>>>>   Bandwidth (MB/sec):     13.963
>>>>>>>>>   Stddev Bandwidth:       26.307
>>>>>>>>>   Max bandwidth (MB/sec): 128
>>>>>>>>>   Min bandwidth (MB/sec): 0
>>>>>>>>>   Average Latency:        4.48605
>>>>>>>>>   Stddev Latency:         8.17709
>>>>>>>>>   Max latency:            29.7957
>>>>>>>>>   Min latency:            0.039435
>>>>>>>>>
>>>>>>>>> when i do rados -p bench bench 30 seq
>>>>>>>>>   Total time run:        20.626935
>>>>>>>>>   Total reads made:     275
>>>>>>>>>   Read size:            4194304
>>>>>>>>>   Bandwidth (MB/sec):    53.328
>>>>>>>>>   Average Latency:       1.19754
>>>>>>>>>   Max latency:           7.0215
>>>>>>>>>   Min latency:           0.011647
>>>>>>>>>
>>>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile
>>>>>>>>> bs=1024k count=20000
>>>>>>>>> result:  158 MB/sec
>>>>>>>>>
>>>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed
>>>>>>>>> something?
>>>>>>>>
>>>>>>>>
>>>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll
>>>>>>>> go
>>>>>>>> to the "central log" which you can keep an eye on if you run "ceph
>>>>>>>> -w"
>>>>>>>> in another terminal.)
>>>>>>>> I think you also didn't create your bench pool correctly; it
>>>>>>>> probably
>>>>>>>> only has 8 PGs which is not going to perform very well with your
>>>>>>>> disk
>>>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark
>>>>>>>> against
>>>>>>>> that pool. The extra number at the end tells it to create 120
>>>>>>>> placement groups.
>>>>>>>> -Greg
>>
>> --
>>
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html