Thanks for the response, Yehuda. [ more text below ] On 2014?08?05? 05:33, Yehuda Sadeh wrote: > On Fri, Aug 1, 2014 at 9:49 AM, Osier Yang <agedosier at gmail.com> wrote: >> [ correct the URL ] >> >> >> On 2014?08?02? 00:42, Osier Yang wrote: >>> Hi, list, >>> >>> I managed to setup radosgw in testing environment to see if it's >>> stable/mature enough >>> for production use these several days. In the meanwhile, I tried to read >>> the source code >>> of radosgw to understand how it actually manages the underlying storage. >>> >>> The testing result shows the the write performance to a bucket is not >>> good, as far as I >>> understood from the code, it's caused by there is only *one* bucket index >>> object for a >>> single bucket, which is not nice in principle. And moreover, requests to >>> the whole bucket >>> could be blocked if the corresponding bucket index object happens to be in >>> recovering or >>> backfilling process. This is not acceptable in production use. Although I >>> saw Guang Yang >>> did some work (the prototype patches [1]) to try to resolve the problem >>> with the bucket >>> index sharding, I'm not quite confident about if it could solve the >>> problem from root, >>> since it looks like radosgw is trying to manage millions or billions >>> objects in one bucket >>> with the index, I'm a bit worried about it even the index sharding is >>> supported. >>> >>> Another problem I encounted is: when I upgraded radosgw to latest version >>> (Firefly), >>> radosgw-admin works well, read request works well too, but all write >>> request fails. Note >>> that I didn't do any changes on the config files, it means there is some >>> compactibilties >>> problems (client in new version fails to talk with ceph cluster in old >>> version). The error >>> looks like: >>> >>> 2014-07-31 10:13:10.045921 7fdb40ddd700 0 ERROR: can't read user header: >>> ret=-95 >>> 2014-07-31 10:13:10.045930 7fdb40ddd700 0 ERROR: sync_user() failed, >>> user=osier ret=-95 >>> 2014-07-31 17:00:56.075066 7fe514fe6780 0 ceph version 0.80.5 >>> (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 19974 >>> 2014-07-31 17:00:56.197659 7fe514fe6780 0 framework: fastcgi >>> 2014-07-31 17:00:56.197666 7fe514fe6780 0 starting handler: fastcgi >>> 2014-07-31 17:00:56.198941 7fe4f8ff9700 0 ERROR: FCGX_Accept_r returned -9 >>> 2014-07-31 17:00:56.211176 7fe4f9ffb700 0 ERROR: can't read user header: >>> ret=-95 >>> 2014-07-31 17:00:56.211197 7fe4f9ffb700 0 ERROR: sync_user() failed, >>> user=Bob Dylon ret=-95 >>> 2014-07-31 17:00:56.212306 7fe4f9ffb700 0 ERROR: can't read user header: >>> ret=-95 >>> 2014-07-31 17:00:56.212325 7fe4f9ffb700 0 ERROR: sync_user() failed, >>> user=osier ret=-95 > Did you upgrade the osds? Did you restart the osds after upgrade? No, I didn't upgrade osds, and didn't restart osds. What I did is simply using newer version radosgw against the ceph cluster which still is using the old version. So it sounds like using newer version radosgw requires newer osd? > >>> With these two experience, I was starting to think about if radosgw is >>> stable/mature >>> enough yet. It seems that dreamhost is the only one using radosgw for >>> service, though >>> it seems there are use cases in private environments from google. I have >>> no way to >>> demonstrate if it's stable and mature enough for production use except >>> trying to understand >>> how it works, however, I guess everybody knows it will be too hard to go >>> back if a distributed >>> system is already in production use. So I'm asking here to see if I could >>> get some advices/ >>> thoughts/suggestions from who already managed to setup radosgw for >>> production use. >>> >>> In case of the mail is long/boring enough, I'm submarizing my questions >>> here: >>> >>> 1) Is radosgw stable/mature enough for production use? > We consider it stable and mature for production use. > >>> 2) How it behaves in performance (especially on writing) in practice? > Different use cases and patterns have different performance > characteristics. As you mentioned, objects going to the same bucket > will contend on the bucket index. In the future we will be able to > shard that and it will mitigate the problem a bit. Other ideas are to > drop the bucket index altogether for use cases where object listing is > not really needed. Bucket listing is important for us too. Disabling it will just make me crazy. :-) I did the performance testing yesterday, I'd like share the result here: 1) Tesing environment ceph cluster: 3 nodes, 1 monitor and 2 osd on each node. These 3 nodes are relatively cheap PC (I even don't want to mention the description of their CPU and memory here). ceph version: Emperor radosgw version: Emperor too (I didn't managed to successfully test with newer radosgw). radosgw instance: 1; VM; memory/4G; CPU/1 Client: VM; memory/1G, CPU/1 Internal network bandwidth: 1g I would execute testing command on the "client vm"; And since the testing environment is far more worse than production environment, the testing result somehow will be meaningless, so I would do testing with both "rest-bench" against radosgw and "rados bench", thus the result could tell how much performance eaten up by radosgw (mainly the single bucket index object). 2) radosgw config (only the ones which *might* affect performace are listed) rgw thread pool size = 1000 rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1 # Operation logs are disabled #rgw enable ops log = true #rgw ops log rados = true #debug rgw = 20 #debug ms = 1 rgw cache lru size = 100000 # the default is 10000 3) Testing command: root at testing-bob:~# rados --cluster=s3test0 -p osier_test bench 20 write -b $size -t 50 root at testing-bob:~# rest-bench --api-host=testing-s3gw0 --access-key=L6K3FF1OOXO4EY1FH9RF --secret="/pYIF3jc3NSkVCWPklSM+BIf7IVr74MSnSvbc4Ac" --protocol=http --uri_style=path --bucket=bob0 --seconds=20 --concurrent-ios=50 --block-size=$size --show-time write As above command tells, I'm testing with 50 cocurrent threads in 20s for both "rest-bench" and "rados bench". 4) Testing result NOTE: * The data size I'm using for testing is: 10Bytes; 1Kib; 10Kib; 100Kib; 500Kib; 1Mib; 5Mib; 20Mib; and 40Mib * Pay attention to the "finished" operations, instead of "Total writes made"; And the total time, since "rest-bench" doesn't always ends up in 20 seconds. * As the testing result shows: for small data writes, the maximum performance lost with radosgw is nearly 80%, comparing with writes with rados directly. * With the data size growing up, the performance lost is decreased, the best result is 50% lost. However, when the data size grows up to some point (see 40Mib testing result), we can see clearly that the writes are serialized, and the performance lost is huge then. * As a summary, the single bucket index object affects the performance a lot. [ 10Bytes ] == rados == 2014-08-04 22:34:04.151231min lat: 0.008503 max lat: 1.72593 avg lat: 0.197445 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 50 4944 4894 0.002333140.000705719 0.250708 0.197445 Total time run: 20.646440 Total writes made: 4945 Write size: 10 Bandwidth (MB/sec): 0.002 Stddev Bandwidth: 0.00130826 Max bandwidth (MB/sec): 0.00367165 Min bandwidth (MB/sec): 0 Average Latency: 0.208751 Stddev Latency: 0.223234 Max latency: 1.72593 Min latency: 0.008503 == radosgw == 2014-08-04 23:51:41.327835min lat: 0.055295 max lat: 3.28076 avg lat: 1.20371 2014-08-04 23:51:41.327835 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-04 23:51:41.327835 20 50 849 7990.0003807580.000324249 1.51293 1.20371 2014-08-04 23:51:42.328136 21 50 850 8000.0003630869.53674e-06 3.06026 1.20603 2014-08-04 23:51:43.328345 22 50 850 8000.000346588 0 - 1.20603 2014-08-04 23:51:44.328556 23 50 850 8000.000331524 0 - 1.20603 2014-08-04 23:51:45.328769 24 50 850 8000.000317716 0 - 1.20603 2014-08-04 23:51:46.328989 25 50 850 8000.000305011 0 - 1.20603 2014-08-04 23:51:47.329214 26 49 850 8010.000293651.90735e-06 6.33663 1.21244 2014-08-04 23:51:48.329488 Total time run: 26.759887 Total writes made: 850 Write size: 10 Bandwidth (MB/sec): 0.000 Stddev Bandwidth: 0.000185797 Max bandwidth (MB/sec): 0.000543594 Min bandwidth (MB/sec): 0 Average Latency: 1.56037 Stddev Latency: 1.54032 Max latency: 9.433 Min latency: 0.055295 [ 1Kib ] == rados == 2014-08-04 22:38:12.770177min lat: 0.006787 max lat: 2.03145 avg lat: 0.191323 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 50 5196 5146 0.251217 0.0810547 0.951518 0.191323 Total time run: 20.694827 Total writes made: 5197 Write size: 1024 Bandwidth (MB/sec): 0.245 Stddev Bandwidth: 0.209637 Max bandwidth (MB/sec): 0.999023 Min bandwidth (MB/sec): 0 Average Latency: 0.199098 Stddev Latency: 0.263302 Max latency: 2.03145 Min latency: 0.006787 == radosgw == 2014-08-04 22:39:16.448663min lat: 0.058305 max lat: 5.84678 avg lat: 1.55047 2014-08-04 22:39:16.448663 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-04 22:39:16.448663 20 50 666 616 0.0300611 0.0449219 0.611505 1.55047 2014-08-04 22:39:17.448976 21 45 667 622 0.02890880.00585938 1.15587 1.54779 2014-08-04 22:39:18.449173 22 45 667 622 0.0275952 0 - 1.54779 2014-08-04 22:39:19.449449 Total time run: 22.759544 Total writes made: 667 Write size: 1024 Bandwidth (MB/sec): 0.029 Stddev Bandwidth: 0.0214045 Max bandwidth (MB/sec): 0.0722656 Min bandwidth (MB/sec): 0 Average Latency: 1.69066 Stddev Latency: 1.11007 Max latency: 5.89554 Min latency: 0.058305 [ 10Kib ] == rados == 2014-08-04 22:41:01.756019min lat: 0.003658 max lat: 1.33439 avg lat: 0.1979 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 50 5075 5025 2.45309 2.94922 0.13717 0.1979 Total time run: 20.132932 Total writes made: 5076 Write size: 10240 Bandwidth (MB/sec): 2.462 Stddev Bandwidth: 1.57143 Max bandwidth (MB/sec): 6.03516 Min bandwidth (MB/sec): 0 Average Latency: 0.198299 Stddev Latency: 0.206342 Max latency: 1.33439 Min latency: 0.003658 == radosgw == 2014-08-04 23:59:08.103196min lat: 0.081382 max lat: 4.20567 avg lat: 1.21503 2014-08-04 23:59:08.103196 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-04 23:59:08.103196 20 50 832 782 0.381632 0.273438 1.16601 1.21503 2014-08-04 23:59:09.103480 21 47 833 786 0.365322 0.0390625 3.06067 1.21989 2014-08-04 23:59:10.103669 22 45 833 788 0.34961 0.0195312 1.74456 1.22298 2014-08-04 23:59:11.103882 23 45 833 788 0.334413 0 - 1.22298 2014-08-04 23:59:12.104123 Total time run: 23.028529 Total writes made: 833 Write size: 10240 Bandwidth (MB/sec): 0.353 Stddev Bandwidth: 0.189902 Max bandwidth (MB/sec): 0.546875 Min bandwidth (MB/sec): 0 Average Latency: 1.36834 Stddev Latency: 0.93226 Max latency: 5.58653 Min latency: 0.081382 [ 100Kib ] == rados == 2014-08-04 22:43:12.878546min lat: 0.00586 max lat: 1.92724 avg lat: 0.215224 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 50 4600 4550 22.212 19.3359 0.111257 0.215224 Total time run: 20.668664 Total writes made: 4601 Write size: 102400 Bandwidth (MB/sec): 21.739 Stddev Bandwidth: 13.2295 Max bandwidth (MB/sec): 60.0586 Min bandwidth (MB/sec): 0 Average Latency: 0.224462 Stddev Latency: 0.226179 Max latency: 1.92724 Min latency: 0.00586 == radosgw == 2014-08-04 23:54:52.136557min lat: 0.121387 max lat: 5.76303 avg lat: 1.35267 2014-08-04 23:54:52.136557 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-04 23:54:52.136557 20 50 699 649 3.16721 0.341797 2.90547 1.35267 2014-08-04 23:54:53.136860 21 44 700 656 3.04896 0.683594 3.0446 1.37228 2014-08-04 23:54:54.137117 Total time run: 21.477508 Total writes made: 700 Write size: 102400 Bandwidth (MB/sec): 3.183 Stddev Bandwidth: 2.13648 Max bandwidth (MB/sec): 7.03125 Min bandwidth (MB/sec): 0 Average Latency: 1.5243 Stddev Latency: 1.20656 Max latency: 5.76303 Min latency: 0.121387 [ 500Kib ] == rados == 2014-08-04 22:45:55.736963min lat: 0.028845 max lat: 3.00344 avg lat: 0.386006 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 50 1816 1766 43.1048 0 - 0.386006 21 50 1816 1766 41.0521 0 - 0.386006 22 50 1816 1766 39.1861 0 - 0.386006 23 50 1816 1766 37.4825 0 - 0.386006 24 50 1817 1767 35.9411 0.0697545 11.6738 0.392395 Total time run: 24.548976 Total writes made: 1817 Write size: 512000 Bandwidth (MB/sec): 36.140 Stddev Bandwidth: 34.3547 Max bandwidth (MB/sec): 80.0781 Min bandwidth (MB/sec): 0 Average Latency: 0.675502 Stddev Latency: 1.76753 Max latency: 12.673 Min latency: 0.028845 == radosgw == 2014-08-04 22:46:51.406535min lat: 0.199692 max lat: 6.7487 avg lat: 1.95867 2014-08-04 22:46:51.406535 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-04 22:46:51.406535 20 50 490 440 10.7355 6.83594 4.42465 1.95867 2014-08-04 22:46:52.406851 21 50 491 441 10.2476 0.488281 4.87118 1.96528 2014-08-04 22:46:53.407047 22 50 491 441 9.78203 0 - 1.96528 2014-08-04 22:46:54.407243 23 50 491 441 9.35689 0 - 1.96528 2014-08-04 22:46:55.407438 24 50 491 441 8.96716 0 - 1.96528 2014-08-04 22:46:56.407725 Total time run: 24.562034 Total writes made: 491 Write size: 512000 Bandwidth (MB/sec): 9.761 Stddev Bandwidth: 7.0541 Max bandwidth (MB/sec): 23.9258 Min bandwidth (MB/sec): 0 Average Latency: 2.48753 Stddev Latency: 2.01542 Max latency: 10.482 Min latency: 0.199692 [ 1Mib ] == rados == 2014-08-04 22:48:05.348669min lat: 0.059332 max lat: 5.30887 avg lat: 0.895115 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 50 1099 1049 51.2076 11.7188 3.94854 0.895115 Total time run: 20.426123 Total writes made: 1100 Write size: 1024000 Bandwidth (MB/sec): 52.590 Stddev Bandwidth: 32.4589 Max bandwidth (MB/sec): 87.8906 Min bandwidth (MB/sec): 0 Average Latency: 0.927253 Stddev Latency: 0.945177 Max latency: 5.30887 Min latency: 0.059332 == radosgw == 2014-08-05 00:01:57.779506min lat: 0.291824 max lat: 11.4166 avg lat: 3.84634 2014-08-05 00:01:57.779506 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-05 00:01:57.779506 20 50 266 216 10.5389 0 - 3.84634 2014-08-05 00:01:58.779846 21 50 266 216 10.0373 0 - 3.84634 2014-08-05 00:01:59.780046 22 49 267 218 9.66998 0.651042 5.49671 3.86582 2014-08-05 00:02:00.780275 23 49 267 218 9.24974 0 - 3.86582 2014-08-05 00:02:01.780540 Total time run: 23.846098 Total writes made: 267 Write size: 1024000 Bandwidth (MB/sec): 10.934 Stddev Bandwidth: 8.56096 Max bandwidth (MB/sec): 35.1562 Min bandwidth (MB/sec): 0 Average Latency: 4.44906 Stddev Latency: 2.73214 Max latency: 11.4166 Min latency: 0.291824 [ 5Mib ] == rados == 2014-08-04 22:50:20.362025min lat: 1.95706 max lat: 8.57053 avg lat: 3.77608 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 49 290 241 58.8244 107.422 2.18729 3.77608 21 17 291 274 63.6944 161.133 1.15035 3.50209 22 17 291 274 60.7992 0 - 3.50209 23 15 291 276 58.5804 4.88281 3.83261 3.50451 Total time run: 23.359546 Total writes made: 291 Write size: 5120000 Bandwidth (MB/sec): 60.827 Stddev Bandwidth: 52.3299 Max bandwidth (MB/sec): 161.133 Min bandwidth (MB/sec): 0 Average Latency: 3.51905 Stddev Latency: 1.97284 Max latency: 8.57053 Min latency: 0.995771 == radosgw == 2014-08-05 00:03:36.219328min lat: 4.10734 max lat: 16.7196 avg lat: 8.66002 2014-08-05 00:03:36.219328 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-05 00:03:36.219328 20 50 138 88 21.4675 9.76562 7.59011 8.66002 2014-08-05 00:03:37.219679 21 50 138 88 20.4456 0 - 8.66002 2014-08-05 00:03:38.219884 22 50 139 89 19.7386 2.44141 12.5832 8.70411 2014-08-05 00:03:39.220072 23 50 139 89 18.8808 0 - 8.70411 2014-08-05 00:03:40.220275 24 50 139 89 18.0945 0 - 8.70411 2014-08-05 00:03:41.220484 25 50 139 89 17.3711 0 - 8.70411 2014-08-05 00:03:42.220688 26 44 139 95 17.8293 7.32422 8.48089 8.92694 2014-08-05 00:03:43.220929 Total time run: 26.178957 Total writes made: 139 Write size: 5120000 Bandwidth (MB/sec): 25.926 Stddev Bandwidth: 18.0919 Max bandwidth (MB/sec): 68.3594 Min bandwidth (MB/sec): 0 Average Latency: 9.39576 Stddev Latency: 3.19278 Max latency: 18.8883 Min latency: 4.10734 [ 20Mib ] == rados == 2014-08-04 22:52:59.584809min lat: 9999 max lat: 0 avg lat: 0 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 34 34 0 0 0 - 0 21 36 36 0 0 0 - 0 22 38 38 0 0 0 - 0 23 39 39 0 0 0 - 0 24 41 41 0 0 0 - 0 25 42 42 0 0 0 - 0 26 42 42 0 0 0 - 0 27 43 43 0 0 0 - 0 28 46 46 0 0 0 - 0 29 48 48 0 0 0 - 0 30 41 50 9 5.81711 5.85938 28.2801 29.4592 31 4 50 46 28.7793 722.656 3.37337 18.5771 32 1 50 49 29.7045 58.5938 3.48328 17.6497 Total time run: 32.246896 Total writes made: 50 Write size: 20480000 Bandwidth (MB/sec): 30.284 Stddev Bandwidth: 125.863 Max bandwidth (MB/sec): 722.656 Min bandwidth (MB/sec): 0 Average Latency: 17.3433 Stddev Latency: 9.29014 Max latency: 30.1162 Min latency: 2.33119 == radosgw == 2014-08-04 22:54:03.379562min lat: 13.3435 max lat: 19.4876 avg lat: 17.8951 2014-08-04 22:54:03.379562 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-04 22:54:03.379562 20 49 64 15 14.5565 136.719 19.4876 17.8951 2014-08-04 22:54:04.379954 21 50 65 15 13.8672 0 - 17.8951 2014-08-04 22:54:05.380137 22 50 65 15 13.2404 0 - 17.8951 2014-08-04 22:54:06.380336 23 50 65 15 12.6677 0 - 17.8951 2014-08-04 22:54:07.380551 24 50 65 15 12.1426 0 - 17.8951 2014-08-04 22:54:08.380742 25 50 65 15 11.6593 0 - 17.8951 2014-08-04 22:54:09.380915 26 50 65 15 11.2129 0 - 17.8951 2014-08-04 22:54:10.381107 27 50 65 15 10.7995 0 - 17.8951 2014-08-04 22:54:11.381314 28 50 65 15 10.4155 0 - 17.8951 2014-08-04 22:54:12.381502 29 50 65 15 10.0579 0 - 17.8951 2014-08-04 22:54:13.381701 30 48 65 17 11.0205 3.90625 29.8248 19.2985 2014-08-04 22:54:14.381900 31 47 65 18 11.2938 19.5312 16.1941 19.1261 2014-08-04 22:54:15.382104 32 47 65 18 10.9422 0 - 19.1261 2014-08-04 22:54:16.538449 Total time run: 32.852659 Total writes made: 65 Write size: 20480000 Bandwidth (MB/sec): 38.643 Stddev Bandwidth: 24.8996 Max bandwidth (MB/sec): 136.719 Min bandwidth (MB/sec): 0 Average Latency: 24.9568 Stddev Latency: 8.35865 Max latency: 32.807 Min latency: 13.2119 [ 40Mib ] == rados == 2014-08-04 22:56:17.539373min lat: 9999 max lat: 0 avg lat: 0 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 40 49 49 0 0 0 - 0 41 49 49 0 0 0 - 0 42 1 50 49 45.5638 45.5729 3.07695 21.9831 Total time run: 42.303319 Total writes made: 50 Write size: 40960000 Bandwidth (MB/sec): 46.170 Stddev Bandwidth: 6.9498 Max bandwidth (MB/sec): 45.5729 Min bandwidth (MB/sec): 0 Average Latency: 21.6088 Stddev Latency: 12.1152 Max latency: 41.2854 Min latency: 3.07695 == radosgw == 2014-08-04 23:04:17.740359min lat: 9999 max lat: 0 avg lat: 0 2014-08-04 23:04:17.740359 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 2014-08-04 23:04:17.740359 40 50 50 0 0 0 - 0 2014-08-04 23:04:18.740650 41 50 50 0 0 0 - 0 2014-08-04 23:04:19.740852 42 50 50 0 0 0 - 0 2014-08-04 23:04:20.741063 43 50 50 0 0 0 - 0 2014-08-04 23:04:21.741239 44 50 50 0 0 0 - 0 2014-08-04 23:04:22.742059 45 49 51 2 1.73223 1.73611 44.3911 44.3332 2014-08-04 23:04:23.742235 46 49 51 2 1.69465 0 - 44.3332 2014-08-04 23:04:24.742429 47 49 51 2 1.65866 0 - 44.3332 2014-08-04 23:04:25.742675 Total time run: 47.742303 Total writes made: 51 Write size: 40960000 Bandwidth (MB/sec): 41.728 Stddev Bandwidth: 0.250586 Max bandwidth (MB/sec): 1.73611 Min bandwidth (MB/sec): 0 Average Latency: 46.4062 Stddev Latency: 6.17721 Max latency: 47.7395 Min latency: 3.36783 Regards, Osier