Re: poor performance

Aleksey Samarin <nrg3tik@xxxxxxxxx> · Fri, 16 Nov 2012 11:41:55 +0400

Thanks for your reply!

I was easier to change rhel on ubuntu. Now everything is fast and
stable! :) If interested can attach logs.

All the best, Alex!

2012/11/16 Gregory Farnum <greg@xxxxxxxxxxx>:
> On Sun, Nov 4, 2012 at 7:13 AM, Aleksey Samarin <nrg3tik@xxxxxxxxx> wrote:
>> What may be possible solutions?
>> Update centos to 6.3?
>
> From what I've heard the RHEL libc doesn't support the syncfs syscall
> (even though the kernel does have it). :( So you'd need to make sure
> the kernel supports it and then build a custom glibc, and then make
> sure your Ceph software is built to use it.
>
>
>> About issue with writes to lots of disk, i think parallel dd command
>> will be good as test! :)
>
> Yes — it really looks like maybe some of your disks are much slower
> than the others. Try benchmarking each individually one-at-a-time, and
> then in groups. I suspect you'll see a problem below the Ceph layers.
>
>>
>> 2012/11/4 Mark Nelson <mark.nelson@xxxxxxxxxxx>:
>>> On 11/04/2012 07:18 AM, Aleksey Samarin wrote:
>>>>
>>>> Well, i create ceph cluster with 2 osd ( 1 osd per node),  2 mon, 2 mds.
>>>> here is what I did:
>>>>   ceph osd pool create bench
>>>>   ceph osd tell \* bench
>>>>   rados -p bench bench 30 write --no-cleanup
>>>> output:
>>>>
>>>>   Maintaining 16 concurrent writes of 4194304 bytes for at least 30
>>>> seconds.
>>>>   Object prefix: benchmark_data_host01_11635
>>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
>>>> lat
>>>>       0       0         0         0         0         0         -
>>>> 0
>>>>       1      16        16         0         0         0         -
>>>> 0
>>>>       2      16        37        21   41.9911        42  0.139005
>>>> 1.08941
>>>>       3      16        53        37   49.3243        64  0.754114
>>>> 1.09392
>>>>       4      16        75        59   58.9893        88  0.284647
>>>> 0.914221
>>>>       5      16        89        73   58.3896        56  0.072228
>>>> 0.881008
>>>>       6      16        95        79   52.6575        24   1.56959
>>>> 0.961477
>>>>       7      16       111        95   54.2764        64  0.046105
>>>> 1.08791
>>>>       8      16       128       112   55.9906        68  0.035714
>>>> 1.04594
>>>>       9      16       150       134   59.5457        88  0.046298
>>>> 1.04415
>>>>      10      16       166       150   59.9901        64  0.048635
>>>> 0.986384
>>>>      11      16       176       160   58.1723        40  0.727784
>>>> 0.988408
>>>>      12      16       206       190   63.3231       120   0.28869
>>>> 0.946624
>>>>      13      16       225       209   64.2976        76   1.34472
>>>> 0.919464
>>>>      14      16       263       247   70.5605       152  0.070926
>>>> 0.90046
>>>>      15      16       295       279   74.3887       128  0.041517
>>>> 0.830466
>>>>      16      16       315       299   74.7388        80  0.296037
>>>> 0.841527
>>>>      17      16       333       317   74.5772        72  0.286097
>>>> 0.849558
>>>>      18      16       340       324   71.9891        28  0.295084
>>>> 0.83922
>>>>      19      16       343       327   68.8317        12   1.46948
>>>> 0.845797
>>>> 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat:
>>>> 0.861539
>>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
>>>> lat
>>>>      20      16       378       362    72.389       140  0.566232
>>>> 0.861539
>>>>      21      16       400       384   73.1313        88  0.038835
>>>> 0.857785
>>>>      22      16       404       388   70.5344        16  0.801216
>>>> 0.857002
>>>>      23      16       413       397   69.0327        36  0.062256
>>>> 0.86376
>>>>      24      16       428       412   68.6543        60  0.042583
>>>> 0.89389
>>>>      25      16       450       434   69.4277        88  0.383877
>>>> 0.905833
>>>>      26      16       472       456   70.1415        88  0.269878
>>>> 0.898023
>>>>      27      16       472       456   67.5437         0         -
>>>> 0.898023
>>>>      28      16       512       496   70.8448        80  0.056798
>>>> 0.891163
>>>>      29      16       530       514   70.8843        72   1.20653
>>>> 0.898112
>>>>      30      16       542       526   70.1212        48  0.744383
>>>> 0.890733
>>>>   Total time run:         30.174151
>>>> Total writes made:      543
>>>> Write size:             4194304
>>>> Bandwidth (MB/sec):     71.982
>>>>
>>>> Stddev Bandwidth:       38.318
>>>> Max bandwidth (MB/sec): 152
>>>> Min bandwidth (MB/sec): 0
>>>> Average Latency:        0.889026
>>>> Stddev Latency:         0.677425
>>>> Max latency:            2.94467
>>>> Min latency:            0.035714
>>>>
>>>
>>> Much better for 1 disk per node!  I suspect that lack of syncfs is hurting
>>> you, or perhaps some other issue with writes to lots of disks at the same
>>> time.
>>>
>>>
>>>>
>>>> 2012/11/4 Aleksey Samarin <nrg3tik@xxxxxxxxx>:
>>>>>
>>>>> Ok!
>>>>> Well, I'll take these tests and write about the results.
>>>>>
>>>>> btw,
>>>>> disks are the same, as some may be faster than others?
>>>>>
>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>:
>>>>>>
>>>>>> That's only nine — where are the other three? If you have three slow
>>>>>> disks that could definitely cause the troubles you're seeing.
>>>>>>
>>>>>> Also, what Mark said about sync versus syncfs.
>>>>>>
>>>>>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> It`s ok!
>>>>>>>
>>>>>>> Output:
>>>>>>>
>>>>>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 11.441035 sec at 91650 KB/sec
>>>>>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 13.225048 sec at 79287 KB/sec
>>>>>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 13.917157 sec at 75344 KB/sec
>>>>>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 16.453375 sec at 63730 KB/sec
>>>>>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 17.108887 sec at 61288 KB/sec
>>>>>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 11.834639 sec at 88602 KB/sec
>>>>>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 12.418276 sec at 84438 KB/sec
>>>>>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 13.011955 sec at 80585 KB/sec
>>>>>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks
>>>>>>> of 4096 KB in 13.541710 sec at 77433 KB/sec
>>>>>>>
>>>>>>> All the best.
>>>>>>>
>>>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>:
>>>>>>>>
>>>>>>>> [Sorry for the blank email; I missed!]
>>>>>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi!
>>>>>>>>> This command? ceph tell osd \* bench
>>>>>>>>> Output:  tell target 'osd' not a valid entity name
>>>>>>>>
>>>>>>>>
>>>>>>>> I guess it's "ceph osd tell \* bench". Try that one. :)
>>>>>>>>
>>>>>>>>> Well, i did pool by command ceph osd pool create bench2 120
>>>>>>>>> This output of rados -p bench2 bench 30 write --no-cleanup
>>>>>>>>>
>>>>>>>>> rados -p bench2 bench 30 write --no-cleanup
>>>>>>>>>
>>>>>>>>>   Maintaining 16 concurrent writes of 4194304 bytes for at least 30
>>>>>>>>> seconds.
>>>>>>>>>   Object prefix: benchmark_data_host01_5827
>>>>>>>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
>>>>>>>>> avg lat
>>>>>>>>>       0       0         0         0         0         0         -
>>>>>>>>> 0
>>>>>>>>>       1      16        29        13   51.9885        52  0.489268
>>>>>>>>> 0.186749
>>>>>>>>>       2      16        52        36   71.9866        92   1.87226
>>>>>>>>> 0.711888
>>>>>>>>>       3      16        57        41    54.657        20  0.089697
>>>>>>>>> 0.697821
>>>>>>>>>       4      16        60        44   43.9923        12   1.61868
>>>>>>>>> 0.765361
>>>>>>>>>       5      16        60        44   35.1941         0         -
>>>>>>>>> 0.765361
>>>>>>>>>       6      16        60        44   29.3285         0         -
>>>>>>>>> 0.765361
>>>>>>>>>       7      16        60        44   25.1388         0         -
>>>>>>>>> 0.765361
>>>>>>>>>       8      16        61        45   22.4964         1   5.89643
>>>>>>>>> 0.879384
>>>>>>>>>       9      16        62        46   20.4412         4    6.0234
>>>>>>>>> 0.991211
>>>>>>>>>      10      16        62        46   18.3971         0         -
>>>>>>>>> 0.991211
>>>>>>>>>      11      16        63        47   17.0883         2   8.79749
>>>>>>>>> 1.1573
>>>>>>>>>      12      16        63        47   15.6643         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      13      16        63        47   14.4593         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      14      16        63        47   13.4266         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      15      16        63        47   12.5315         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      16      16        63        47   11.7483         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      17      16        63        47   11.0572         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      18      16        63        47   10.4429         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      19      16        63        47   9.89331         0         -
>>>>>>>>> 1.1573
>>>>>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat:
>>>>>>>>> 1.1573
>>>>>>>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
>>>>>>>>> avg lat
>>>>>>>>>      20      16        63        47   9.39865         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      21      16        63        47   8.95105         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      22      16        63        47   8.54419         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      23      16        63        47   8.17271         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      24      16        63        47   7.83218         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      25      16        63        47    7.5189         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      26      16        63        47   7.22972         0         -
>>>>>>>>> 1.1573
>>>>>>>>>      27      16        81        65   9.62824       4.5  0.076456
>>>>>>>>> 4.9428
>>>>>>>>>      28      16       118       102   14.5693       148  0.427273
>>>>>>>>> 4.34095
>>>>>>>>>      29      16       119       103   14.2049         4   1.57897
>>>>>>>>> 4.31414
>>>>>>>>>      30      16       132       116   15.4645        52   2.25424
>>>>>>>>> 4.01492
>>>>>>>>>      31      16       133       117   15.0946         4  0.974652
>>>>>>>>> 3.98893
>>>>>>>>>      32      16       133       117   14.6229         0         -
>>>>>>>>> 3.98893
>>>>>>>>>   Total time run:         32.575351
>>>>>>>>> Total writes made:      133
>>>>>>>>> Write size:             4194304
>>>>>>>>> Bandwidth (MB/sec):     16.331
>>>>>>>>>
>>>>>>>>> Stddev Bandwidth:       31.8794
>>>>>>>>> Max bandwidth (MB/sec): 148
>>>>>>>>> Min bandwidth (MB/sec): 0
>>>>>>>>> Average Latency:        3.91583
>>>>>>>>> Stddev Latency:         7.42821
>>>>>>>>> Max latency:            25.24
>>>>>>>>> Min latency:            0.036475
>>>>>>>>>
>>>>>>>>> Im think problem not in pg. This output of ceph pg dump  >
>>>>>>>>> http://pastebin.com/BqLsyMBC
>>>>>>>>
>>>>>>>>
>>>>>>>> Well, that did improve it a bit; but yes, I think there's something
>>>>>>>> else going on. Just wanted to verify. :)
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have still no idea.
>>>>>>>>>
>>>>>>>>> All the best. Alex
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>:
>>>>>>>>>>
>>>>>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@xxxxxxxxx>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi all
>>>>>>>>>>>
>>>>>>>>>>> Im planning use ceph for cloud storage.
>>>>>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb
>>>>>>>>>>> disks per node.
>>>>>>>>>>> Centos 6.2
>>>>>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64
>>>>>>>>>>> This is my config http://pastebin.com/Pzxafnsm
>>>>>>>>>>> journal on tmpfs
>>>>>>>>>>> well, im create bench pool and test it:
>>>>>>>>>>> ceph osd pool create bench
>>>>>>>>>>> rados -p bench bench 30 write
>>>>>>>>>>>
>>>>>>>>>>>   Total time run:         43.258228
>>>>>>>>>>>   Total writes made:      151
>>>>>>>>>>>   Write size:             4194304
>>>>>>>>>>>   Bandwidth (MB/sec):     13.963
>>>>>>>>>>>   Stddev Bandwidth:       26.307
>>>>>>>>>>>   Max bandwidth (MB/sec): 128
>>>>>>>>>>>   Min bandwidth (MB/sec): 0
>>>>>>>>>>>   Average Latency:        4.48605
>>>>>>>>>>>   Stddev Latency:         8.17709
>>>>>>>>>>>   Max latency:            29.7957
>>>>>>>>>>>   Min latency:            0.039435
>>>>>>>>>>>
>>>>>>>>>>> when i do rados -p bench bench 30 seq
>>>>>>>>>>>   Total time run:        20.626935
>>>>>>>>>>>   Total reads made:     275
>>>>>>>>>>>   Read size:            4194304
>>>>>>>>>>>   Bandwidth (MB/sec):    53.328
>>>>>>>>>>>   Average Latency:       1.19754
>>>>>>>>>>>   Max latency:           7.0215
>>>>>>>>>>>   Min latency:           0.011647
>>>>>>>>>>>
>>>>>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile
>>>>>>>>>>> bs=1024k count=20000
>>>>>>>>>>> result:  158 MB/sec
>>>>>>>>>>>
>>>>>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed
>>>>>>>>>>> something?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll
>>>>>>>>>> go
>>>>>>>>>> to the "central log" which you can keep an eye on if you run "ceph
>>>>>>>>>> -w"
>>>>>>>>>> in another terminal.)
>>>>>>>>>> I think you also didn't create your bench pool correctly; it
>>>>>>>>>> probably
>>>>>>>>>> only has 8 PGs which is not going to perform very well with your
>>>>>>>>>> disk
>>>>>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark
>>>>>>>>>> against
>>>>>>>>>> that pool. The extra number at the end tells it to create 120
>>>>>>>>>> placement groups.
>>>>>>>>>> -Greg
>>>>
>>>> --
>>>>
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html