Thanks for your reply! I was easier to change rhel on ubuntu. Now everything is fast and stable! :) If interested can attach logs. All the best, Alex! 2012/11/16 Gregory Farnum <greg@xxxxxxxxxxx>: > On Sun, Nov 4, 2012 at 7:13 AM, Aleksey Samarin <nrg3tik@xxxxxxxxx> wrote: >> What may be possible solutions? >> Update centos to 6.3? > > From what I've heard the RHEL libc doesn't support the syncfs syscall > (even though the kernel does have it). :( So you'd need to make sure > the kernel supports it and then build a custom glibc, and then make > sure your Ceph software is built to use it. > > >> About issue with writes to lots of disk, i think parallel dd command >> will be good as test! :) > > Yes — it really looks like maybe some of your disks are much slower > than the others. Try benchmarking each individually one-at-a-time, and > then in groups. I suspect you'll see a problem below the Ceph layers. > >> >> 2012/11/4 Mark Nelson <mark.nelson@xxxxxxxxxxx>: >>> On 11/04/2012 07:18 AM, Aleksey Samarin wrote: >>>> >>>> Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. >>>> here is what I did: >>>> ceph osd pool create bench >>>> ceph osd tell \* bench >>>> rados -p bench bench 30 write --no-cleanup >>>> output: >>>> >>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>>> seconds. >>>> Object prefix: benchmark_data_host01_11635 >>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>>> lat >>>> 0 0 0 0 0 0 - >>>> 0 >>>> 1 16 16 0 0 0 - >>>> 0 >>>> 2 16 37 21 41.9911 42 0.139005 >>>> 1.08941 >>>> 3 16 53 37 49.3243 64 0.754114 >>>> 1.09392 >>>> 4 16 75 59 58.9893 88 0.284647 >>>> 0.914221 >>>> 5 16 89 73 58.3896 56 0.072228 >>>> 0.881008 >>>> 6 16 95 79 52.6575 24 1.56959 >>>> 0.961477 >>>> 7 16 111 95 54.2764 64 0.046105 >>>> 1.08791 >>>> 8 16 128 112 55.9906 68 0.035714 >>>> 1.04594 >>>> 9 16 150 134 59.5457 88 0.046298 >>>> 1.04415 >>>> 10 16 166 150 59.9901 64 0.048635 >>>> 0.986384 >>>> 11 16 176 160 58.1723 40 0.727784 >>>> 0.988408 >>>> 12 16 206 190 63.3231 120 0.28869 >>>> 0.946624 >>>> 13 16 225 209 64.2976 76 1.34472 >>>> 0.919464 >>>> 14 16 263 247 70.5605 152 0.070926 >>>> 0.90046 >>>> 15 16 295 279 74.3887 128 0.041517 >>>> 0.830466 >>>> 16 16 315 299 74.7388 80 0.296037 >>>> 0.841527 >>>> 17 16 333 317 74.5772 72 0.286097 >>>> 0.849558 >>>> 18 16 340 324 71.9891 28 0.295084 >>>> 0.83922 >>>> 19 16 343 327 68.8317 12 1.46948 >>>> 0.845797 >>>> 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat: >>>> 0.861539 >>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>>> lat >>>> 20 16 378 362 72.389 140 0.566232 >>>> 0.861539 >>>> 21 16 400 384 73.1313 88 0.038835 >>>> 0.857785 >>>> 22 16 404 388 70.5344 16 0.801216 >>>> 0.857002 >>>> 23 16 413 397 69.0327 36 0.062256 >>>> 0.86376 >>>> 24 16 428 412 68.6543 60 0.042583 >>>> 0.89389 >>>> 25 16 450 434 69.4277 88 0.383877 >>>> 0.905833 >>>> 26 16 472 456 70.1415 88 0.269878 >>>> 0.898023 >>>> 27 16 472 456 67.5437 0 - >>>> 0.898023 >>>> 28 16 512 496 70.8448 80 0.056798 >>>> 0.891163 >>>> 29 16 530 514 70.8843 72 1.20653 >>>> 0.898112 >>>> 30 16 542 526 70.1212 48 0.744383 >>>> 0.890733 >>>> Total time run: 30.174151 >>>> Total writes made: 543 >>>> Write size: 4194304 >>>> Bandwidth (MB/sec): 71.982 >>>> >>>> Stddev Bandwidth: 38.318 >>>> Max bandwidth (MB/sec): 152 >>>> Min bandwidth (MB/sec): 0 >>>> Average Latency: 0.889026 >>>> Stddev Latency: 0.677425 >>>> Max latency: 2.94467 >>>> Min latency: 0.035714 >>>> >>> >>> Much better for 1 disk per node! I suspect that lack of syncfs is hurting >>> you, or perhaps some other issue with writes to lots of disks at the same >>> time. >>> >>> >>>> >>>> 2012/11/4 Aleksey Samarin <nrg3tik@xxxxxxxxx>: >>>>> >>>>> Ok! >>>>> Well, I'll take these tests and write about the results. >>>>> >>>>> btw, >>>>> disks are the same, as some may be faster than others? >>>>> >>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>: >>>>>> >>>>>> That's only nine — where are the other three? If you have three slow >>>>>> disks that could definitely cause the troubles you're seeing. >>>>>> >>>>>> Also, what Mark said about sync versus syncfs. >>>>>> >>>>>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> It`s ok! >>>>>>> >>>>>>> Output: >>>>>>> >>>>>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>>>>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>>>>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>>>>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>>>>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>>>>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>>>>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>>>>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>>>>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>>>>>> >>>>>>> All the best. >>>>>>> >>>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>: >>>>>>>> >>>>>>>> [Sorry for the blank email; I missed!] >>>>>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi! >>>>>>>>> This command? ceph tell osd \* bench >>>>>>>>> Output: tell target 'osd' not a valid entity name >>>>>>>> >>>>>>>> >>>>>>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>>>>>> >>>>>>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>>>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>>>>>> >>>>>>>>> rados -p bench2 bench 30 write --no-cleanup >>>>>>>>> >>>>>>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>>>>>>>> seconds. >>>>>>>>> Object prefix: benchmark_data_host01_5827 >>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>>> avg lat >>>>>>>>> 0 0 0 0 0 0 - >>>>>>>>> 0 >>>>>>>>> 1 16 29 13 51.9885 52 0.489268 >>>>>>>>> 0.186749 >>>>>>>>> 2 16 52 36 71.9866 92 1.87226 >>>>>>>>> 0.711888 >>>>>>>>> 3 16 57 41 54.657 20 0.089697 >>>>>>>>> 0.697821 >>>>>>>>> 4 16 60 44 43.9923 12 1.61868 >>>>>>>>> 0.765361 >>>>>>>>> 5 16 60 44 35.1941 0 - >>>>>>>>> 0.765361 >>>>>>>>> 6 16 60 44 29.3285 0 - >>>>>>>>> 0.765361 >>>>>>>>> 7 16 60 44 25.1388 0 - >>>>>>>>> 0.765361 >>>>>>>>> 8 16 61 45 22.4964 1 5.89643 >>>>>>>>> 0.879384 >>>>>>>>> 9 16 62 46 20.4412 4 6.0234 >>>>>>>>> 0.991211 >>>>>>>>> 10 16 62 46 18.3971 0 - >>>>>>>>> 0.991211 >>>>>>>>> 11 16 63 47 17.0883 2 8.79749 >>>>>>>>> 1.1573 >>>>>>>>> 12 16 63 47 15.6643 0 - >>>>>>>>> 1.1573 >>>>>>>>> 13 16 63 47 14.4593 0 - >>>>>>>>> 1.1573 >>>>>>>>> 14 16 63 47 13.4266 0 - >>>>>>>>> 1.1573 >>>>>>>>> 15 16 63 47 12.5315 0 - >>>>>>>>> 1.1573 >>>>>>>>> 16 16 63 47 11.7483 0 - >>>>>>>>> 1.1573 >>>>>>>>> 17 16 63 47 11.0572 0 - >>>>>>>>> 1.1573 >>>>>>>>> 18 16 63 47 10.4429 0 - >>>>>>>>> 1.1573 >>>>>>>>> 19 16 63 47 9.89331 0 - >>>>>>>>> 1.1573 >>>>>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: >>>>>>>>> 1.1573 >>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>>> avg lat >>>>>>>>> 20 16 63 47 9.39865 0 - >>>>>>>>> 1.1573 >>>>>>>>> 21 16 63 47 8.95105 0 - >>>>>>>>> 1.1573 >>>>>>>>> 22 16 63 47 8.54419 0 - >>>>>>>>> 1.1573 >>>>>>>>> 23 16 63 47 8.17271 0 - >>>>>>>>> 1.1573 >>>>>>>>> 24 16 63 47 7.83218 0 - >>>>>>>>> 1.1573 >>>>>>>>> 25 16 63 47 7.5189 0 - >>>>>>>>> 1.1573 >>>>>>>>> 26 16 63 47 7.22972 0 - >>>>>>>>> 1.1573 >>>>>>>>> 27 16 81 65 9.62824 4.5 0.076456 >>>>>>>>> 4.9428 >>>>>>>>> 28 16 118 102 14.5693 148 0.427273 >>>>>>>>> 4.34095 >>>>>>>>> 29 16 119 103 14.2049 4 1.57897 >>>>>>>>> 4.31414 >>>>>>>>> 30 16 132 116 15.4645 52 2.25424 >>>>>>>>> 4.01492 >>>>>>>>> 31 16 133 117 15.0946 4 0.974652 >>>>>>>>> 3.98893 >>>>>>>>> 32 16 133 117 14.6229 0 - >>>>>>>>> 3.98893 >>>>>>>>> Total time run: 32.575351 >>>>>>>>> Total writes made: 133 >>>>>>>>> Write size: 4194304 >>>>>>>>> Bandwidth (MB/sec): 16.331 >>>>>>>>> >>>>>>>>> Stddev Bandwidth: 31.8794 >>>>>>>>> Max bandwidth (MB/sec): 148 >>>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>>> Average Latency: 3.91583 >>>>>>>>> Stddev Latency: 7.42821 >>>>>>>>> Max latency: 25.24 >>>>>>>>> Min latency: 0.036475 >>>>>>>>> >>>>>>>>> Im think problem not in pg. This output of ceph pg dump > >>>>>>>>> http://pastebin.com/BqLsyMBC >>>>>>>> >>>>>>>> >>>>>>>> Well, that did improve it a bit; but yes, I think there's something >>>>>>>> else going on. Just wanted to verify. :) >>>>>>>> >>>>>>>>> >>>>>>>>> I have still no idea. >>>>>>>>> >>>>>>>>> All the best. Alex >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>: >>>>>>>>>> >>>>>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@xxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi all >>>>>>>>>>> >>>>>>>>>>> Im planning use ceph for cloud storage. >>>>>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb >>>>>>>>>>> disks per node. >>>>>>>>>>> Centos 6.2 >>>>>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>>>>>> journal on tmpfs >>>>>>>>>>> well, im create bench pool and test it: >>>>>>>>>>> ceph osd pool create bench >>>>>>>>>>> rados -p bench bench 30 write >>>>>>>>>>> >>>>>>>>>>> Total time run: 43.258228 >>>>>>>>>>> Total writes made: 151 >>>>>>>>>>> Write size: 4194304 >>>>>>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>>>>>> Stddev Bandwidth: 26.307 >>>>>>>>>>> Max bandwidth (MB/sec): 128 >>>>>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>>>>> Average Latency: 4.48605 >>>>>>>>>>> Stddev Latency: 8.17709 >>>>>>>>>>> Max latency: 29.7957 >>>>>>>>>>> Min latency: 0.039435 >>>>>>>>>>> >>>>>>>>>>> when i do rados -p bench bench 30 seq >>>>>>>>>>> Total time run: 20.626935 >>>>>>>>>>> Total reads made: 275 >>>>>>>>>>> Read size: 4194304 >>>>>>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>>>>>> Average Latency: 1.19754 >>>>>>>>>>> Max latency: 7.0215 >>>>>>>>>>> Min latency: 0.011647 >>>>>>>>>>> >>>>>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>>>>>>> bs=1024k count=20000 >>>>>>>>>>> result: 158 MB/sec >>>>>>>>>>> >>>>>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed >>>>>>>>>>> something? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll >>>>>>>>>> go >>>>>>>>>> to the "central log" which you can keep an eye on if you run "ceph >>>>>>>>>> -w" >>>>>>>>>> in another terminal.) >>>>>>>>>> I think you also didn't create your bench pool correctly; it >>>>>>>>>> probably >>>>>>>>>> only has 8 PGs which is not going to perform very well with your >>>>>>>>>> disk >>>>>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark >>>>>>>>>> against >>>>>>>>>> that pool. The extra number at the end tells it to create 120 >>>>>>>>>> placement groups. >>>>>>>>>> -Greg >>>> >>>> -- >>>> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html