Re: radosgw performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Yehuda,

Thanks for fast reply.
We are running 0.56.3. On ubuntu 12.04.

I disabled ops log using
'rgw enable ops log = false'
but i missed 'debug rgw = 0' so what i did was simply 'log file = /dev/null' :)

Anyways. i tried 'debug rgw = 0' but it didn't change anything. To see what's going on in rgw i did 'debug rgw=2' and rerun the test.

# cat radosgw.log |grep put_obj |grep status=200
...
2013-02-17 13:51:37.415981 7f54167f4700  2 req 613:0.327243:s3:PUT /mybucket/benchmark_data_ceph-10_31019_object611:put_obj:http status=200
2013-02-17 13:51:37.431779 7f54137ee700  2 req 614:0.318996:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object612:put_obj:http status=200
2013-02-17 13:51:37.447688 7f53f37ae700  2 req 615:0.319085:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object613:put_obj:http status=200
2013-02-17 13:51:37.460531 7f53fbfbf700  2 req 581:0.887859:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object579:put_obj:http status=200
2013-02-17 13:51:37.468215 7f5411feb700  2 req 616:0.326575:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object614:put_obj:http status=200
2013-02-17 13:51:37.480233 7f54267fc700  2 req 617:0.335292:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object615:put_obj:http status=200
2013-02-17 13:51:37.503042 7f54147f0700  2 req 618:0.330277:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object616:put_obj:http status=200
2013-02-17 13:51:37.519647 7f5413fef700  2 req 619:0.306762:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object617:put_obj:http status=200
2013-02-17 13:51:37.520274 7f5427fff700  2 req 620:0.307374:s3:PUT /
mybucket/benchmark_data_ceph-10_31019_object618:put_obj:http status=200
...


If i read this correctly requests take 0.32 secs on average. Again, if i'm looking at things the right way, that would make 320 secs for like 1175 requests, and if i divide that by 20 paralel requests that would give 18.8 secs, which means for a 20 sec test most of the time is passed at radosgw <-> rados.

i'm not sure how to check apache <-> radosgw or if that's relevant. But i can say there is no problem between rest-client <-> apache as i am getting the same throughput with a bunch of other clients.

i noticed that earlier but rgw does not have any admin socket, i guess i'm running defaults in this matter. Is there an option missing to have an admin socket?

Since i was able to increase rados pool performance but not the radosgw performance, the only thing that makes sense to me is that i forgot the delete some configuration or something while i delete the original .rgw. pools. All i did was 'ceph osd pool delete' is there something more to clean?


On Sun, Feb 17, 2013 at 11:39 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx> wrote:
On Sun, Feb 17, 2013 at 12:53 PM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> wrote:
> Hi all,
>
> We have just deployed our cluster and wanted to get started immediately by
> loading our current files but hit some rocks on the way.
>
> We trying to upload millions of 10-20kB files as of now, but we were not
> able to get past 40-50 PUTs/s. Googling through archives i found the cause
> might be the default pg_num of 8 for .rgw.buckets. I confirmed it using
> 'rados bench'. While the data pool with 64 pgs could push 1000 writes/s,
> .rgw.buckets was capable of 50 writes/s. Assured that was the problem, i
> deleted the .rgw pools to recreate them with larger pg_nums.
>
> Now i am able to push .rgw.buckets with XXX writes/s using 'rados bench'.
>
> # rados -p .rgw.buckets bench 20 write -t 20 -b 20480
>  Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds.
>  Object prefix: benchmark_data_ceph-10_30068
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       0         0         0         0         0         -         0
>      1      20      2817      2797   54.6103   54.6289  0.0039930.00708019
>      2      20      4626      4606   44.9699    35.332  0.0057070.00857852
>      3      20      5247      5227   34.0229   12.1289  0.004025 0.0112324
>      4      20      6234      6214    30.336   19.2773  0.004463 0.0127949
>      5      20      7611      7591   29.6468   26.8945  0.161584  0.012928
>      6      20      8669      8649   28.1491   20.6641  0.006752 0.0138092
>      7      20      9758      9738    27.166   21.2695  0.002992 0.0143627
>      8      20     10672     10652   26.0014   17.8516  0.003206 0.0148701
>      9      20     11607     11587   25.1411   18.2617  0.010047 0.0155073
>     10      20     12593     12573   24.5526   19.2578  0.011297 0.0157349
>     11      20     13732     13712   24.3426   22.2461  0.002604 0.0160289
>     12      20     14707     14687   23.9005    19.043  0.003153 0.0163188
>     13      20     15764     15744   23.6498   20.6445  0.018784 0.0164889
>     14      20     16570     16550   23.0848   15.7422   0.00304 0.0168921
>     15      20     17397     17377   22.6224   16.1523  0.003808 0.0171995
>     16      20     18288     18268    22.296   17.4023  0.002723 0.0175055
>     17      20     19357     19337   22.2124   20.8789  0.003635  0.017552
>     18      20     20252     20232   21.9493   17.4805  0.003274 0.0177607
>     19      20     21392     21372   21.9657   22.2656  0.003191 0.0177641
> 2013-02-17 12:48:20.025013min lat: 0.002303 max lat: 0.395696 avg lat:
> 0.0176627
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>     20      20     22607     22587   22.0537   23.7305  0.005424 0.0176627
>  Total time run:         20.131108
> Total writes made:      22608
> Write size:             20480
> Bandwidth (MB/sec):     21.934
>
> Stddev Bandwidth:       10.0135
> Max bandwidth (MB/sec): 54.6289
> Min bandwidth (MB/sec): 0
> Average Latency:        0.0177993
> Stddev Latency:         0.0296493
> Max latency:            0.395696
> Min latency:            0.002303
>
>
> But using rest-bench, it didn't make much difference:
>
> # rest-bench \
>> --api-host=myhost.com \
>> --access-key=AAA \
>> --secret=SSS \
>> --protocol=http \
>> --uri_style=path \
>> --bucket=mybucket \
>> --seconds=20 \
>> --concurrent-ios=20 \
>> --block-size=20480 \
>> write
> host=myhost.com
>  Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds.
>  Object prefix: benchmark_data_ceph-10_30174
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       3         3         0         0         0         -         0
>      1      20        86        66   1.28862   1.28906  0.327494  0.248925
>      2      20       146       126   1.23016   1.17188  0.331882  0.289053
>      3      20       206       186   1.21068   1.17188  0.303186  0.300404
>      4      20       266       246   1.20093   1.17188  0.327556  0.311229
>      5      20       324       304   1.18727   1.13281  0.279916  0.315768
>      6      20       386       366   1.19118   1.21094  0.324231   0.31818
>      7      20       443       423   1.18003   1.11328  0.312167  0.321635
>      8      20       503       483   1.17898   1.17188  0.347861  0.324332
>      9      20       561       541   1.17381   1.13281   0.29931  0.327285
>     10      20       622       602   1.17555   1.19141  0.299793  0.326244
>     11      20       677       657   1.16632   1.07422  0.280473  0.328129
>     12      20       735       715   1.16352   1.13281  0.311044  0.330388
>     13      20       793       773   1.16114   1.13281  0.324021  0.330745
>     14      20       855       835   1.16469   1.21094  0.299689  0.331978
>     15      20       913       893   1.16255   1.13281  0.287512  0.331909
>     16      20       974       954   1.16434   1.19141  0.279736  0.331314
>     17      20      1027      1007   1.15674   1.03516  0.374434  0.333145
>     18      20      1076      1056   1.14563  0.957031  0.328377  0.337489
>     19      20      1130      1110   1.14084   1.05469  0.376122  0.338493
> 2013-02-17 12:50:31.520161min lat: 0.031979 max lat: 1.12062 avg lat:
> 0.340584
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>     20      20      1174      1154   1.12676  0.859375  0.343657  0.340584
>  Total time run:         20.473862
> Total writes made:      1175
> Write size:             20480
> Bandwidth (MB/sec):     1.121
>
> Stddev Bandwidth:       0.263018
> Max bandwidth (MB/sec): 1.28906
> Min bandwidth (MB/sec): 0
> Average Latency:        0.347583
> Stddev Latency:         0.128529
> Max latency:            1.60713
> Min latency:            0.031979
>
>
> I tried disabling rgw logs and apache logs, increasing the rgw thread pool
> size with no chance. Is there something i am missing?
>

What version are you running? Are the ops logs disabled? How did you
disable logs (did you do 'debug rgw = 0'?)

You can try isolating the issue by looking at the radosgw logs (debug
rgw = 2). Look at each put_obj request completion, it'll dump the
total time it took to complete. That'll give a hint whether the
problem is on the radosgw<->rados side, or whether it's on the
apache<->radosgw. There could also be an issue of the client starting
a new connection for every new request (rest-bench <-> apache).

Another thing to look at would be the radosgw perf counters, which you
can do by connecting to the radosgw admin socket (ceph --admin-daemon
<path-to-admin-socket> help).

Yehuda



--
erdem agaoglu
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux