On Sun, Nov 4, 2012 at 7:13 AM, Aleksey Samarin <nrg3tik@xxxxxxxxx> wrote: > What may be possible solutions? > Update centos to 6.3? >From what I've heard the RHEL libc doesn't support the syncfs syscall (even though the kernel does have it). :( So you'd need to make sure the kernel supports it and then build a custom glibc, and then make sure your Ceph software is built to use it. > About issue with writes to lots of disk, i think parallel dd command > will be good as test! :) Yes — it really looks like maybe some of your disks are much slower than the others. Try benchmarking each individually one-at-a-time, and then in groups. I suspect you'll see a problem below the Ceph layers. > > 2012/11/4 Mark Nelson <mark.nelson@xxxxxxxxxxx>: >> On 11/04/2012 07:18 AM, Aleksey Samarin wrote: >>> >>> Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. >>> here is what I did: >>> ceph osd pool create bench >>> ceph osd tell \* bench >>> rados -p bench bench 30 write --no-cleanup >>> output: >>> >>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>> seconds. >>> Object prefix: benchmark_data_host01_11635 >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>> lat >>> 0 0 0 0 0 0 - >>> 0 >>> 1 16 16 0 0 0 - >>> 0 >>> 2 16 37 21 41.9911 42 0.139005 >>> 1.08941 >>> 3 16 53 37 49.3243 64 0.754114 >>> 1.09392 >>> 4 16 75 59 58.9893 88 0.284647 >>> 0.914221 >>> 5 16 89 73 58.3896 56 0.072228 >>> 0.881008 >>> 6 16 95 79 52.6575 24 1.56959 >>> 0.961477 >>> 7 16 111 95 54.2764 64 0.046105 >>> 1.08791 >>> 8 16 128 112 55.9906 68 0.035714 >>> 1.04594 >>> 9 16 150 134 59.5457 88 0.046298 >>> 1.04415 >>> 10 16 166 150 59.9901 64 0.048635 >>> 0.986384 >>> 11 16 176 160 58.1723 40 0.727784 >>> 0.988408 >>> 12 16 206 190 63.3231 120 0.28869 >>> 0.946624 >>> 13 16 225 209 64.2976 76 1.34472 >>> 0.919464 >>> 14 16 263 247 70.5605 152 0.070926 >>> 0.90046 >>> 15 16 295 279 74.3887 128 0.041517 >>> 0.830466 >>> 16 16 315 299 74.7388 80 0.296037 >>> 0.841527 >>> 17 16 333 317 74.5772 72 0.286097 >>> 0.849558 >>> 18 16 340 324 71.9891 28 0.295084 >>> 0.83922 >>> 19 16 343 327 68.8317 12 1.46948 >>> 0.845797 >>> 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat: >>> 0.861539 >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>> lat >>> 20 16 378 362 72.389 140 0.566232 >>> 0.861539 >>> 21 16 400 384 73.1313 88 0.038835 >>> 0.857785 >>> 22 16 404 388 70.5344 16 0.801216 >>> 0.857002 >>> 23 16 413 397 69.0327 36 0.062256 >>> 0.86376 >>> 24 16 428 412 68.6543 60 0.042583 >>> 0.89389 >>> 25 16 450 434 69.4277 88 0.383877 >>> 0.905833 >>> 26 16 472 456 70.1415 88 0.269878 >>> 0.898023 >>> 27 16 472 456 67.5437 0 - >>> 0.898023 >>> 28 16 512 496 70.8448 80 0.056798 >>> 0.891163 >>> 29 16 530 514 70.8843 72 1.20653 >>> 0.898112 >>> 30 16 542 526 70.1212 48 0.744383 >>> 0.890733 >>> Total time run: 30.174151 >>> Total writes made: 543 >>> Write size: 4194304 >>> Bandwidth (MB/sec): 71.982 >>> >>> Stddev Bandwidth: 38.318 >>> Max bandwidth (MB/sec): 152 >>> Min bandwidth (MB/sec): 0 >>> Average Latency: 0.889026 >>> Stddev Latency: 0.677425 >>> Max latency: 2.94467 >>> Min latency: 0.035714 >>> >> >> Much better for 1 disk per node! I suspect that lack of syncfs is hurting >> you, or perhaps some other issue with writes to lots of disks at the same >> time. >> >> >>> >>> 2012/11/4 Aleksey Samarin <nrg3tik@xxxxxxxxx>: >>>> >>>> Ok! >>>> Well, I'll take these tests and write about the results. >>>> >>>> btw, >>>> disks are the same, as some may be faster than others? >>>> >>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>: >>>>> >>>>> That's only nine — where are the other three? If you have three slow >>>>> disks that could definitely cause the troubles you're seeing. >>>>> >>>>> Also, what Mark said about sync versus syncfs. >>>>> >>>>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> It`s ok! >>>>>> >>>>>> Output: >>>>>> >>>>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>>>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>>>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>>>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>>>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>>>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>>>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>>>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>>>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>>>>> >>>>>> All the best. >>>>>> >>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>: >>>>>>> >>>>>>> [Sorry for the blank email; I missed!] >>>>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@xxxxxxxxx> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> This command? ceph tell osd \* bench >>>>>>>> Output: tell target 'osd' not a valid entity name >>>>>>> >>>>>>> >>>>>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>>>>> >>>>>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>>>>> >>>>>>>> rados -p bench2 bench 30 write --no-cleanup >>>>>>>> >>>>>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>>>>>>> seconds. >>>>>>>> Object prefix: benchmark_data_host01_5827 >>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>> avg lat >>>>>>>> 0 0 0 0 0 0 - >>>>>>>> 0 >>>>>>>> 1 16 29 13 51.9885 52 0.489268 >>>>>>>> 0.186749 >>>>>>>> 2 16 52 36 71.9866 92 1.87226 >>>>>>>> 0.711888 >>>>>>>> 3 16 57 41 54.657 20 0.089697 >>>>>>>> 0.697821 >>>>>>>> 4 16 60 44 43.9923 12 1.61868 >>>>>>>> 0.765361 >>>>>>>> 5 16 60 44 35.1941 0 - >>>>>>>> 0.765361 >>>>>>>> 6 16 60 44 29.3285 0 - >>>>>>>> 0.765361 >>>>>>>> 7 16 60 44 25.1388 0 - >>>>>>>> 0.765361 >>>>>>>> 8 16 61 45 22.4964 1 5.89643 >>>>>>>> 0.879384 >>>>>>>> 9 16 62 46 20.4412 4 6.0234 >>>>>>>> 0.991211 >>>>>>>> 10 16 62 46 18.3971 0 - >>>>>>>> 0.991211 >>>>>>>> 11 16 63 47 17.0883 2 8.79749 >>>>>>>> 1.1573 >>>>>>>> 12 16 63 47 15.6643 0 - >>>>>>>> 1.1573 >>>>>>>> 13 16 63 47 14.4593 0 - >>>>>>>> 1.1573 >>>>>>>> 14 16 63 47 13.4266 0 - >>>>>>>> 1.1573 >>>>>>>> 15 16 63 47 12.5315 0 - >>>>>>>> 1.1573 >>>>>>>> 16 16 63 47 11.7483 0 - >>>>>>>> 1.1573 >>>>>>>> 17 16 63 47 11.0572 0 - >>>>>>>> 1.1573 >>>>>>>> 18 16 63 47 10.4429 0 - >>>>>>>> 1.1573 >>>>>>>> 19 16 63 47 9.89331 0 - >>>>>>>> 1.1573 >>>>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: >>>>>>>> 1.1573 >>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>> avg lat >>>>>>>> 20 16 63 47 9.39865 0 - >>>>>>>> 1.1573 >>>>>>>> 21 16 63 47 8.95105 0 - >>>>>>>> 1.1573 >>>>>>>> 22 16 63 47 8.54419 0 - >>>>>>>> 1.1573 >>>>>>>> 23 16 63 47 8.17271 0 - >>>>>>>> 1.1573 >>>>>>>> 24 16 63 47 7.83218 0 - >>>>>>>> 1.1573 >>>>>>>> 25 16 63 47 7.5189 0 - >>>>>>>> 1.1573 >>>>>>>> 26 16 63 47 7.22972 0 - >>>>>>>> 1.1573 >>>>>>>> 27 16 81 65 9.62824 4.5 0.076456 >>>>>>>> 4.9428 >>>>>>>> 28 16 118 102 14.5693 148 0.427273 >>>>>>>> 4.34095 >>>>>>>> 29 16 119 103 14.2049 4 1.57897 >>>>>>>> 4.31414 >>>>>>>> 30 16 132 116 15.4645 52 2.25424 >>>>>>>> 4.01492 >>>>>>>> 31 16 133 117 15.0946 4 0.974652 >>>>>>>> 3.98893 >>>>>>>> 32 16 133 117 14.6229 0 - >>>>>>>> 3.98893 >>>>>>>> Total time run: 32.575351 >>>>>>>> Total writes made: 133 >>>>>>>> Write size: 4194304 >>>>>>>> Bandwidth (MB/sec): 16.331 >>>>>>>> >>>>>>>> Stddev Bandwidth: 31.8794 >>>>>>>> Max bandwidth (MB/sec): 148 >>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>> Average Latency: 3.91583 >>>>>>>> Stddev Latency: 7.42821 >>>>>>>> Max latency: 25.24 >>>>>>>> Min latency: 0.036475 >>>>>>>> >>>>>>>> Im think problem not in pg. This output of ceph pg dump > >>>>>>>> http://pastebin.com/BqLsyMBC >>>>>>> >>>>>>> >>>>>>> Well, that did improve it a bit; but yes, I think there's something >>>>>>> else going on. Just wanted to verify. :) >>>>>>> >>>>>>>> >>>>>>>> I have still no idea. >>>>>>>> >>>>>>>> All the best. Alex >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2012/11/4 Gregory Farnum <greg@xxxxxxxxxxx>: >>>>>>>>> >>>>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@xxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi all >>>>>>>>>> >>>>>>>>>> Im planning use ceph for cloud storage. >>>>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb >>>>>>>>>> disks per node. >>>>>>>>>> Centos 6.2 >>>>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>>>>> journal on tmpfs >>>>>>>>>> well, im create bench pool and test it: >>>>>>>>>> ceph osd pool create bench >>>>>>>>>> rados -p bench bench 30 write >>>>>>>>>> >>>>>>>>>> Total time run: 43.258228 >>>>>>>>>> Total writes made: 151 >>>>>>>>>> Write size: 4194304 >>>>>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>>>>> Stddev Bandwidth: 26.307 >>>>>>>>>> Max bandwidth (MB/sec): 128 >>>>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>>>> Average Latency: 4.48605 >>>>>>>>>> Stddev Latency: 8.17709 >>>>>>>>>> Max latency: 29.7957 >>>>>>>>>> Min latency: 0.039435 >>>>>>>>>> >>>>>>>>>> when i do rados -p bench bench 30 seq >>>>>>>>>> Total time run: 20.626935 >>>>>>>>>> Total reads made: 275 >>>>>>>>>> Read size: 4194304 >>>>>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>>>>> Average Latency: 1.19754 >>>>>>>>>> Max latency: 7.0215 >>>>>>>>>> Min latency: 0.011647 >>>>>>>>>> >>>>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>>>>>> bs=1024k count=20000 >>>>>>>>>> result: 158 MB/sec >>>>>>>>>> >>>>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed >>>>>>>>>> something? >>>>>>>>> >>>>>>>>> >>>>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll >>>>>>>>> go >>>>>>>>> to the "central log" which you can keep an eye on if you run "ceph >>>>>>>>> -w" >>>>>>>>> in another terminal.) >>>>>>>>> I think you also didn't create your bench pool correctly; it >>>>>>>>> probably >>>>>>>>> only has 8 PGs which is not going to perform very well with your >>>>>>>>> disk >>>>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark >>>>>>>>> against >>>>>>>>> that pool. The extra number at the end tells it to create 120 >>>>>>>>> placement groups. >>>>>>>>> -Greg >>> >>> -- >>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html