Hi Yehuda,
Thanks for fast reply.I disabled ops log using
'rgw enable ops log = false'
# cat radosgw.log |grep put_obj |grep status=200
...
2013-02-17 13:51:37.415981 7f54167f4700 2 req 613:0.327243:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object611:put_obj:http status=200
2013-02-17 13:51:37.431779 7f54137ee700 2 req 614:0.318996:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object612:put_obj:http status=200
2013-02-17 13:51:37.447688 7f53f37ae700 2 req 615:0.319085:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object613:put_obj:http status=200
2013-02-17 13:51:37.460531 7f53fbfbf700 2 req 581:0.887859:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object579:put_obj:http status=200
2013-02-17 13:51:37.468215 7f5411feb700 2 req 616:0.326575:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object614:put_obj:http status=200
2013-02-17 13:51:37.480233 7f54267fc700 2 req 617:0.335292:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object615:put_obj:http status=200
2013-02-17 13:51:37.503042 7f54147f0700 2 req 618:0.330277:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object616:put_obj:http status=200
2013-02-17 13:51:37.519647 7f5413fef700 2 req 619:0.306762:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object617:put_obj:http status=200
2013-02-17 13:51:37.520274 7f5427fff700 2 req 620:0.307374:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object618:put_obj:http status=200
...
If i read this correctly requests take 0.32 secs on average. Again, if i'm looking at things the right way, that would make 320 secs for like 1175 requests, and if i divide that by 20 paralel requests that would give 18.8 secs, which means for a 20 sec test most of the time is passed at radosgw <-> rados.
On Sun, Feb 17, 2013 at 11:39 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx> wrote:
What version are you running? Are the ops logs disabled? How did youOn Sun, Feb 17, 2013 at 12:53 PM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> wrote:
> Hi all,
>
> We have just deployed our cluster and wanted to get started immediately by
> loading our current files but hit some rocks on the way.
>
> We trying to upload millions of 10-20kB files as of now, but we were not
> able to get past 40-50 PUTs/s. Googling through archives i found the cause
> might be the default pg_num of 8 for .rgw.buckets. I confirmed it using
> 'rados bench'. While the data pool with 64 pgs could push 1000 writes/s,
> .rgw.buckets was capable of 50 writes/s. Assured that was the problem, i
> deleted the .rgw pools to recreate them with larger pg_nums.
>
> Now i am able to push .rgw.buckets with XXX writes/s using 'rados bench'.
>
> # rados -p .rgw.buckets bench 20 write -t 20 -b 20480
> Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds.
> Object prefix: benchmark_data_ceph-10_30068
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 20 2817 2797 54.6103 54.6289 0.0039930.00708019
> 2 20 4626 4606 44.9699 35.332 0.0057070.00857852
> 3 20 5247 5227 34.0229 12.1289 0.004025 0.0112324
> 4 20 6234 6214 30.336 19.2773 0.004463 0.0127949
> 5 20 7611 7591 29.6468 26.8945 0.161584 0.012928
> 6 20 8669 8649 28.1491 20.6641 0.006752 0.0138092
> 7 20 9758 9738 27.166 21.2695 0.002992 0.0143627
> 8 20 10672 10652 26.0014 17.8516 0.003206 0.0148701
> 9 20 11607 11587 25.1411 18.2617 0.010047 0.0155073
> 10 20 12593 12573 24.5526 19.2578 0.011297 0.0157349
> 11 20 13732 13712 24.3426 22.2461 0.002604 0.0160289
> 12 20 14707 14687 23.9005 19.043 0.003153 0.0163188
> 13 20 15764 15744 23.6498 20.6445 0.018784 0.0164889
> 14 20 16570 16550 23.0848 15.7422 0.00304 0.0168921
> 15 20 17397 17377 22.6224 16.1523 0.003808 0.0171995
> 16 20 18288 18268 22.296 17.4023 0.002723 0.0175055
> 17 20 19357 19337 22.2124 20.8789 0.003635 0.017552
> 18 20 20252 20232 21.9493 17.4805 0.003274 0.0177607
> 19 20 21392 21372 21.9657 22.2656 0.003191 0.0177641
> 2013-02-17 12:48:20.025013min lat: 0.002303 max lat: 0.395696 avg lat:
> 0.0176627
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 20 20 22607 22587 22.0537 23.7305 0.005424 0.0176627
> Total time run: 20.131108
> Total writes made: 22608
> Write size: 20480
> Bandwidth (MB/sec): 21.934
>
> Stddev Bandwidth: 10.0135
> Max bandwidth (MB/sec): 54.6289
> Min bandwidth (MB/sec): 0
> Average Latency: 0.0177993
> Stddev Latency: 0.0296493
> Max latency: 0.395696
> Min latency: 0.002303
>
>
> But using rest-bench, it didn't make much difference:
>
> # rest-bench \
>> --api-host=myhost.com \
>> --access-key=AAA \
>> --secret=SSS \
>> --protocol=http \
>> --uri_style=path \
>> --bucket=mybucket \
>> --seconds=20 \
>> --concurrent-ios=20 \
>> --block-size=20480 \
>> write
> host=myhost.com
> Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds.
> Object prefix: benchmark_data_ceph-10_30174
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 3 3 0 0 0 - 0
> 1 20 86 66 1.28862 1.28906 0.327494 0.248925
> 2 20 146 126 1.23016 1.17188 0.331882 0.289053
> 3 20 206 186 1.21068 1.17188 0.303186 0.300404
> 4 20 266 246 1.20093 1.17188 0.327556 0.311229
> 5 20 324 304 1.18727 1.13281 0.279916 0.315768
> 6 20 386 366 1.19118 1.21094 0.324231 0.31818
> 7 20 443 423 1.18003 1.11328 0.312167 0.321635
> 8 20 503 483 1.17898 1.17188 0.347861 0.324332
> 9 20 561 541 1.17381 1.13281 0.29931 0.327285
> 10 20 622 602 1.17555 1.19141 0.299793 0.326244
> 11 20 677 657 1.16632 1.07422 0.280473 0.328129
> 12 20 735 715 1.16352 1.13281 0.311044 0.330388
> 13 20 793 773 1.16114 1.13281 0.324021 0.330745
> 14 20 855 835 1.16469 1.21094 0.299689 0.331978
> 15 20 913 893 1.16255 1.13281 0.287512 0.331909
> 16 20 974 954 1.16434 1.19141 0.279736 0.331314
> 17 20 1027 1007 1.15674 1.03516 0.374434 0.333145
> 18 20 1076 1056 1.14563 0.957031 0.328377 0.337489
> 19 20 1130 1110 1.14084 1.05469 0.376122 0.338493
> 2013-02-17 12:50:31.520161min lat: 0.031979 max lat: 1.12062 avg lat:
> 0.340584
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 20 20 1174 1154 1.12676 0.859375 0.343657 0.340584
> Total time run: 20.473862
> Total writes made: 1175
> Write size: 20480
> Bandwidth (MB/sec): 1.121
>
> Stddev Bandwidth: 0.263018
> Max bandwidth (MB/sec): 1.28906
> Min bandwidth (MB/sec): 0
> Average Latency: 0.347583
> Stddev Latency: 0.128529
> Max latency: 1.60713
> Min latency: 0.031979
>
>
> I tried disabling rgw logs and apache logs, increasing the rgw thread pool
> size with no chance. Is there something i am missing?
>
disable logs (did you do 'debug rgw = 0'?)
You can try isolating the issue by looking at the radosgw logs (debug
rgw = 2). Look at each put_obj request completion, it'll dump the
total time it took to complete. That'll give a hint whether the
problem is on the radosgw<->rados side, or whether it's on the
apache<->radosgw. There could also be an issue of the client starting
a new connection for every new request (rest-bench <-> apache).
Another thing to look at would be the radosgw perf counters, which you
can do by connecting to the radosgw admin socket (ceph --admin-daemon
<path-to-admin-socket> help).
Yehuda
--
erdem agaoglu
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com