On Sun, Feb 17, 2013 at 12:53 PM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> wrote: > Hi all, > > We have just deployed our cluster and wanted to get started immediately by > loading our current files but hit some rocks on the way. > > We trying to upload millions of 10-20kB files as of now, but we were not > able to get past 40-50 PUTs/s. Googling through archives i found the cause > might be the default pg_num of 8 for .rgw.buckets. I confirmed it using > 'rados bench'. While the data pool with 64 pgs could push 1000 writes/s, > .rgw.buckets was capable of 50 writes/s. Assured that was the problem, i > deleted the .rgw pools to recreate them with larger pg_nums. > > Now i am able to push .rgw.buckets with XXX writes/s using 'rados bench'. > > # rados -p .rgw.buckets bench 20 write -t 20 -b 20480 > Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds. > Object prefix: benchmark_data_ceph-10_30068 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 20 2817 2797 54.6103 54.6289 0.0039930.00708019 > 2 20 4626 4606 44.9699 35.332 0.0057070.00857852 > 3 20 5247 5227 34.0229 12.1289 0.004025 0.0112324 > 4 20 6234 6214 30.336 19.2773 0.004463 0.0127949 > 5 20 7611 7591 29.6468 26.8945 0.161584 0.012928 > 6 20 8669 8649 28.1491 20.6641 0.006752 0.0138092 > 7 20 9758 9738 27.166 21.2695 0.002992 0.0143627 > 8 20 10672 10652 26.0014 17.8516 0.003206 0.0148701 > 9 20 11607 11587 25.1411 18.2617 0.010047 0.0155073 > 10 20 12593 12573 24.5526 19.2578 0.011297 0.0157349 > 11 20 13732 13712 24.3426 22.2461 0.002604 0.0160289 > 12 20 14707 14687 23.9005 19.043 0.003153 0.0163188 > 13 20 15764 15744 23.6498 20.6445 0.018784 0.0164889 > 14 20 16570 16550 23.0848 15.7422 0.00304 0.0168921 > 15 20 17397 17377 22.6224 16.1523 0.003808 0.0171995 > 16 20 18288 18268 22.296 17.4023 0.002723 0.0175055 > 17 20 19357 19337 22.2124 20.8789 0.003635 0.017552 > 18 20 20252 20232 21.9493 17.4805 0.003274 0.0177607 > 19 20 21392 21372 21.9657 22.2656 0.003191 0.0177641 > 2013-02-17 12:48:20.025013min lat: 0.002303 max lat: 0.395696 avg lat: > 0.0176627 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 20 20 22607 22587 22.0537 23.7305 0.005424 0.0176627 > Total time run: 20.131108 > Total writes made: 22608 > Write size: 20480 > Bandwidth (MB/sec): 21.934 > > Stddev Bandwidth: 10.0135 > Max bandwidth (MB/sec): 54.6289 > Min bandwidth (MB/sec): 0 > Average Latency: 0.0177993 > Stddev Latency: 0.0296493 > Max latency: 0.395696 > Min latency: 0.002303 > > > But using rest-bench, it didn't make much difference: > > # rest-bench \ >> --api-host=myhost.com \ >> --access-key=AAA \ >> --secret=SSS \ >> --protocol=http \ >> --uri_style=path \ >> --bucket=mybucket \ >> --seconds=20 \ >> --concurrent-ios=20 \ >> --block-size=20480 \ >> write > host=myhost.com > Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds. > Object prefix: benchmark_data_ceph-10_30174 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 3 3 0 0 0 - 0 > 1 20 86 66 1.28862 1.28906 0.327494 0.248925 > 2 20 146 126 1.23016 1.17188 0.331882 0.289053 > 3 20 206 186 1.21068 1.17188 0.303186 0.300404 > 4 20 266 246 1.20093 1.17188 0.327556 0.311229 > 5 20 324 304 1.18727 1.13281 0.279916 0.315768 > 6 20 386 366 1.19118 1.21094 0.324231 0.31818 > 7 20 443 423 1.18003 1.11328 0.312167 0.321635 > 8 20 503 483 1.17898 1.17188 0.347861 0.324332 > 9 20 561 541 1.17381 1.13281 0.29931 0.327285 > 10 20 622 602 1.17555 1.19141 0.299793 0.326244 > 11 20 677 657 1.16632 1.07422 0.280473 0.328129 > 12 20 735 715 1.16352 1.13281 0.311044 0.330388 > 13 20 793 773 1.16114 1.13281 0.324021 0.330745 > 14 20 855 835 1.16469 1.21094 0.299689 0.331978 > 15 20 913 893 1.16255 1.13281 0.287512 0.331909 > 16 20 974 954 1.16434 1.19141 0.279736 0.331314 > 17 20 1027 1007 1.15674 1.03516 0.374434 0.333145 > 18 20 1076 1056 1.14563 0.957031 0.328377 0.337489 > 19 20 1130 1110 1.14084 1.05469 0.376122 0.338493 > 2013-02-17 12:50:31.520161min lat: 0.031979 max lat: 1.12062 avg lat: > 0.340584 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 20 20 1174 1154 1.12676 0.859375 0.343657 0.340584 > Total time run: 20.473862 > Total writes made: 1175 > Write size: 20480 > Bandwidth (MB/sec): 1.121 > > Stddev Bandwidth: 0.263018 > Max bandwidth (MB/sec): 1.28906 > Min bandwidth (MB/sec): 0 > Average Latency: 0.347583 > Stddev Latency: 0.128529 > Max latency: 1.60713 > Min latency: 0.031979 > > > I tried disabling rgw logs and apache logs, increasing the rgw thread pool > size with no chance. Is there something i am missing? > What version are you running? Are the ops logs disabled? How did you disable logs (did you do 'debug rgw = 0'?) You can try isolating the issue by looking at the radosgw logs (debug rgw = 2). Look at each put_obj request completion, it'll dump the total time it took to complete. That'll give a hint whether the problem is on the radosgw<->rados side, or whether it's on the apache<->radosgw. There could also be an issue of the client starting a new connection for every new request (rest-bench <-> apache). Another thing to look at would be the radosgw perf counters, which you can do by connecting to the radosgw admin socket (ceph --admin-daemon <path-to-admin-socket> help). Yehuda _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com