Re: radosgw performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 17, 2013 at 12:53 PM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> wrote:
> Hi all,
>
> We have just deployed our cluster and wanted to get started immediately by
> loading our current files but hit some rocks on the way.
>
> We trying to upload millions of 10-20kB files as of now, but we were not
> able to get past 40-50 PUTs/s. Googling through archives i found the cause
> might be the default pg_num of 8 for .rgw.buckets. I confirmed it using
> 'rados bench'. While the data pool with 64 pgs could push 1000 writes/s,
> .rgw.buckets was capable of 50 writes/s. Assured that was the problem, i
> deleted the .rgw pools to recreate them with larger pg_nums.
>
> Now i am able to push .rgw.buckets with XXX writes/s using 'rados bench'.
>
> # rados -p .rgw.buckets bench 20 write -t 20 -b 20480
>  Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds.
>  Object prefix: benchmark_data_ceph-10_30068
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       0         0         0         0         0         -         0
>      1      20      2817      2797   54.6103   54.6289  0.0039930.00708019
>      2      20      4626      4606   44.9699    35.332  0.0057070.00857852
>      3      20      5247      5227   34.0229   12.1289  0.004025 0.0112324
>      4      20      6234      6214    30.336   19.2773  0.004463 0.0127949
>      5      20      7611      7591   29.6468   26.8945  0.161584  0.012928
>      6      20      8669      8649   28.1491   20.6641  0.006752 0.0138092
>      7      20      9758      9738    27.166   21.2695  0.002992 0.0143627
>      8      20     10672     10652   26.0014   17.8516  0.003206 0.0148701
>      9      20     11607     11587   25.1411   18.2617  0.010047 0.0155073
>     10      20     12593     12573   24.5526   19.2578  0.011297 0.0157349
>     11      20     13732     13712   24.3426   22.2461  0.002604 0.0160289
>     12      20     14707     14687   23.9005    19.043  0.003153 0.0163188
>     13      20     15764     15744   23.6498   20.6445  0.018784 0.0164889
>     14      20     16570     16550   23.0848   15.7422   0.00304 0.0168921
>     15      20     17397     17377   22.6224   16.1523  0.003808 0.0171995
>     16      20     18288     18268    22.296   17.4023  0.002723 0.0175055
>     17      20     19357     19337   22.2124   20.8789  0.003635  0.017552
>     18      20     20252     20232   21.9493   17.4805  0.003274 0.0177607
>     19      20     21392     21372   21.9657   22.2656  0.003191 0.0177641
> 2013-02-17 12:48:20.025013min lat: 0.002303 max lat: 0.395696 avg lat:
> 0.0176627
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>     20      20     22607     22587   22.0537   23.7305  0.005424 0.0176627
>  Total time run:         20.131108
> Total writes made:      22608
> Write size:             20480
> Bandwidth (MB/sec):     21.934
>
> Stddev Bandwidth:       10.0135
> Max bandwidth (MB/sec): 54.6289
> Min bandwidth (MB/sec): 0
> Average Latency:        0.0177993
> Stddev Latency:         0.0296493
> Max latency:            0.395696
> Min latency:            0.002303
>
>
> But using rest-bench, it didn't make much difference:
>
> # rest-bench \
>> --api-host=myhost.com \
>> --access-key=AAA \
>> --secret=SSS \
>> --protocol=http \
>> --uri_style=path \
>> --bucket=mybucket \
>> --seconds=20 \
>> --concurrent-ios=20 \
>> --block-size=20480 \
>> write
> host=myhost.com
>  Maintaining 20 concurrent writes of 20480 bytes for at least 20 seconds.
>  Object prefix: benchmark_data_ceph-10_30174
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       3         3         0         0         0         -         0
>      1      20        86        66   1.28862   1.28906  0.327494  0.248925
>      2      20       146       126   1.23016   1.17188  0.331882  0.289053
>      3      20       206       186   1.21068   1.17188  0.303186  0.300404
>      4      20       266       246   1.20093   1.17188  0.327556  0.311229
>      5      20       324       304   1.18727   1.13281  0.279916  0.315768
>      6      20       386       366   1.19118   1.21094  0.324231   0.31818
>      7      20       443       423   1.18003   1.11328  0.312167  0.321635
>      8      20       503       483   1.17898   1.17188  0.347861  0.324332
>      9      20       561       541   1.17381   1.13281   0.29931  0.327285
>     10      20       622       602   1.17555   1.19141  0.299793  0.326244
>     11      20       677       657   1.16632   1.07422  0.280473  0.328129
>     12      20       735       715   1.16352   1.13281  0.311044  0.330388
>     13      20       793       773   1.16114   1.13281  0.324021  0.330745
>     14      20       855       835   1.16469   1.21094  0.299689  0.331978
>     15      20       913       893   1.16255   1.13281  0.287512  0.331909
>     16      20       974       954   1.16434   1.19141  0.279736  0.331314
>     17      20      1027      1007   1.15674   1.03516  0.374434  0.333145
>     18      20      1076      1056   1.14563  0.957031  0.328377  0.337489
>     19      20      1130      1110   1.14084   1.05469  0.376122  0.338493
> 2013-02-17 12:50:31.520161min lat: 0.031979 max lat: 1.12062 avg lat:
> 0.340584
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>     20      20      1174      1154   1.12676  0.859375  0.343657  0.340584
>  Total time run:         20.473862
> Total writes made:      1175
> Write size:             20480
> Bandwidth (MB/sec):     1.121
>
> Stddev Bandwidth:       0.263018
> Max bandwidth (MB/sec): 1.28906
> Min bandwidth (MB/sec): 0
> Average Latency:        0.347583
> Stddev Latency:         0.128529
> Max latency:            1.60713
> Min latency:            0.031979
>
>
> I tried disabling rgw logs and apache logs, increasing the rgw thread pool
> size with no chance. Is there something i am missing?
>

What version are you running? Are the ops logs disabled? How did you
disable logs (did you do 'debug rgw = 0'?)

You can try isolating the issue by looking at the radosgw logs (debug
rgw = 2). Look at each put_obj request completion, it'll dump the
total time it took to complete. That'll give a hint whether the
problem is on the radosgw<->rados side, or whether it's on the
apache<->radosgw. There could also be an issue of the client starting
a new connection for every new request (rest-bench <-> apache).

Another thing to look at would be the radosgw perf counters, which you
can do by connecting to the radosgw admin socket (ceph --admin-daemon
<path-to-admin-socket> help).

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux