Right. I misread the log. Note the following request: 2.547155 7f036ffc7700 1 -- 10.102.4.11:0/1009401 --> 10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711 .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 [call version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0 0x7f026802e2c0 con 0x7f040c055ca0 ... 7.619750 7f041ddf4700 1 -- 10.102.4.11:0/1009401 <== osd.32 10.102.4.14:6809/7428 208252 ==== osd_op_reply(27514711 .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 [call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6 ==== 338+0+336 (3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0 It took more than 5 seconds to complete. Understanding why is really the key here. You can try looking at the osds (turn on debug ms = 1 there too), and see if it was actually taking that long, or was just waiting in the messenger queues due to throttling. If that's not the case then you'll have to dig deeper into the osd. Yehuda On Tue, Oct 7, 2014 at 10:19 PM, Daniel Schneller <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > Hi! > > By looking at these logs it seems that there are only 8 pgs on the > .rgw pool, if this is correct then you may want to change that > considering your workload. > > > Thanks. See out pg_num configuration below. We had already suspected > that the 1600 that we had previously (48 OSDs * 100 / triple redundancy) > were not ideal, so we increased the .rgw.buckets pool to 2048. > > The number of objects and their size was in an earlier email, but for > completeness I will put them up once again. > > Any other ideas where to look? > > ============== > for i in $(rados df | awk '{ print $1 }' | grep '^\.'); do > echo $i; echo -n " - “; > ceph osd pool get $i pg_num; > echo -n " - “; > ceph osd pool get $i pgp_num; > done > > .intent-log > - pg_num: 1600 > - pgp_num: 1600 > .log > - pg_num: 1600 > - pgp_num: 1600 > .rgw > - pg_num: 1600 > - pgp_num: 1600 > .rgw.buckets > - pg_num: 2048 > - pgp_num: 2048 > .rgw.buckets.index > - pg_num: 1600 > - pgp_num: 1600 > .rgw.control > - pg_num: 1600 > - pgp_num: 1600 > .rgw.gc > - pg_num: 1600 > - pgp_num: 1600 > .rgw.root > - pg_num: 100 > - pgp_num: 100 > .usage > - pg_num: 1600 > - pgp_num: 1600 > .users > - pg_num: 1600 > - pgp_num: 1600 > .users.email > - pg_num: 1600 > - pgp_num: 1600 > .users.swift > - pg_num: 1600 > - pgp_num: 1600 > .users.uid > - pg_num: 1600 > - pgp_num: 1600 > =============== > > > .rgw > > ========================= > > KB: 1,966,932 > > objects: 9,094,552 > > rd: 195,747,645 > > rd KB: 153,585,472 > > wr: 30,191,844 > > wr KB: 10,751,065 > > > .rgw.buckets > > ========================= > > KB: 2,038,313,855 > > objects: 22,088,103 > > rd: 5,455,123 > > rd KB: 408,416,317 > > wr: 149,377,728 > > wr KB: 1,882,517,472 > > > .rgw.buckets.index > > ========================= > > KB: 0 > > objects: 5,374,376 > > rd: 267,996,778 > > rd KB: 262,626,106 > > wr: 107,142,891 > > wr KB: 0 > > > .rgw.control > > ========================= > > KB: 0 > > objects: 8 > > rd: 0 > > rd KB: 0 > > wr: 0 > > wr KB: 0 > > > .rgw.gc > > ========================= > > KB: 0 > > objects: 32 > > rd: 5,554,407 > > rd KB: 5,713,942 > > wr: 8,355,934 > > wr KB: 0 > > > .rgw.root > > ========================= > > KB: 1 > > objects: 3 > > rd: 524 > > rd KB: 346 > > wr: 3 > wr KB: 3 > > > > Daniel > > On 08 Oct 2014, at 01:03, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote: > > This operation stalled quite a bit, seems that it was waiting for the osd: > > 2.547155 7f036ffc7700 1 -- 10.102.4.11:0/1009401 --> > 10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711 > .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 [call > version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0 > 0x7f026802e2c0 con 0x7f040c055ca0 > ... > 7.619750 7f041ddf4700 1 -- 10.102.4.11:0/1009401 <== osd.32 > 10.102.4.14:6809/7428 208252 ==== osd_op_reply(27514711 > .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 > [call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6 ==== 338+0+336 > (3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0 > > By looking at these logs it seems that there are only 8 pgs on the > .rgw pool, if this is correct then you may want to change that > considering your workload. > > Yehuda > > > On Tue, Oct 7, 2014 at 3:46 PM, Daniel Schneller > <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > > Hi! > > Sorry, I must have missed the enabling of that debug module. > However, the test setup has been the same all the time - > I only have the one test-application :) > > But maybe I phrased it a bit ambiguously when I wrote > > It then continuously created containers - both empty > and such with 10 objects of around 100k random data in them. > > > 100 kilobytes is the size of a single object, of which we create 10 > per container. The container gets created first, without any > objects, naturally, then 10 objects are added. One of these objects > is called “version”, the rest have generated names with a fixed > prefix and appended 1-9. The version object is the one I picked > for the example logs I sent earlier. > > I hope this makes the setup clearer. > > Attached you will find the (now more extensive) logs for the outliers > again. As you did not say that I garbled the logs, I assume the > pre-processing was OK, so I have prepared the new data in a similar > fashion, marking the relevant request with <CNT-UUID-FINDME>. > > I have not removed any lines in between the beginning of the > “interesting” request and its completion to keep all the network > traffic log intact. Due to the increased verbosity, I will not post > the logs inline, but only attach them gzipped. > > As before, should the full data set be needed, I can provide > an archived version. > > > > > Thanks for your support! > Daniel > > > > > On 07 Oct 2014, at 22:45, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote: > > The logs here don't include the messenger (debug ms = 1). It's hard to > tell what going on from looking at the outliers. Also, in your > previous mail you described a different benchmark, you tested writing > large number of objects into a single bucket, whereas in this test > you're testing multiple bucket creations, which have a completely > different characteristics. > > > On Tue, Oct 7, 2014 at 1:26 PM, Daniel Schneller > <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > > Hi! > > I have re-run our test as follows: > > * 4 Rados Gateways, on 4 baremetal machines which have > a total of 48 spinning rust OSDs. > > * Benchmark run on a virtual machine talking to HAProxy > which balances the requests across the 4 Rados GWs. > > * Three instances of the benchmark run in parallel. Each > instance creates 1000 containers, puts 11 objects into > each container. Once they have all been created, each > instances deletes its own containers again. > > I configured one of the radosgws with the debug levels > you requested. The tests produced quite an amount of data > (approx. 1GB of text), so I took the liberty to > pre-process that a bit. > > In this run we landed at around 1.2s per container > created (including the objects in them) on average. > This was slightly better than the 1.6s we witnessed > before, but that test ran for much longer, so there might > have been some queue-up effect. > > Interestingly enough the average is actually somewhat > misleading. The logs below show the creation of one > object in a container each, one being the fastest of this > benchmark (at least on the debug-enabled radosgw), one > being the slowest. > > The fastest one was completed in 0.013s, the longest one > took 4.93s(!). > > The attached logs are cleaned up so that they each show > just a single request and replaced longish, but constant > information with placeholders. Our container names are > of the form “stresstest-xxxxxxxxxxx” which I shortened > to “<CNT-UUID>” for brevity. Also, I removed the redundant > prefix (date, hour, minute of day). > > The column before the log level looked like a thread-id. > As I focused on a single request, I removed all the lines > that did not match the same id, replacing the actual value > with “<ID>”. That makes the logs much easier to read and > understand. > > Just in case I might have removed too much information > for the logs to be useful, the complete log is available > in BZIP2 compressed form for download. Just let me know > if you need it, then I will provide a link via direct email. > > To me it seems like there might indeed be a contention > issue. It would be interesting to know, if this is correct > and if there are any settings that we could adjust to > alleviate the issue. > > Daniel > > ============================== > > ➜ ~ cat rados_shortest.txt > 21.431185 <ID> 20 QUERY_STRING=page=swift¶ms=/v1/<CNT-UUID>/version > 21.431187 <ID> 20 REMOTE_ADDR=10.102.8.140 > 21.431188 <ID> 20 REMOTE_PORT=44007 > 21.431189 <ID> 20 REQUEST_METHOD=PUT > 21.431190 <ID> 20 REQUEST_SCHEME=https > 21.431191 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version > 21.431192 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi > 21.431193 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version > 21.431194 <ID> 20 > SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version > 21.431195 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version > 21.431196 <ID> 20 SERVER_ADDR=10.102.9.11 > 21.431197 <ID> 20 SERVER_ADMIN=[no address given] > 21.431198 <ID> 20 SERVER_NAME=localhost > 21.431199 <ID> 20 SERVER_PORT=8405 > 21.431200 <ID> 20 SERVER_PORT_SECURE=443 > 21.431201 <ID> 20 SERVER_PROTOCOL=HTTP/1.1 > 21.431202 <ID> 20 SERVER_SIGNATURE= > 21.431203 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu) > 21.431205 <ID> 1 ====== starting new request req=0x7f038c019e90 ===== > 21.431219 <ID> 2 req 980641:0.000015::PUT > /swift/v1/<CNT-UUID>/version::initializing > 21.431259 <ID> 10 ver=v1 first=<CNT-UUID> req=version > 21.431265 <ID> 10 s->object=version s->bucket=<CNT-UUID> > 21.431269 <ID> 2 req 980641:0.000065:swift:PUT > /swift/v1/<CNT-UUID>/version::getting op > 21.431274 <ID> 2 req 980641:0.000070:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:authorizing > 21.431321 <ID> 20 get_obj_state: rctx=0x7f030800b720 > obj=.users.swift:documentstore:swift state=0x7f03080f31e8 s->prefetch_data=0 > 21.431332 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit > 21.431338 <ID> 20 get_obj_state: s->obj_tag was set empty > 21.431344 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit > 21.431369 <ID> 20 get_obj_state: rctx=0x7f030800b720 > obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0 > 21.431374 <ID> 10 cache get: name=.users.uid+documentstore : hit > 21.431378 <ID> 20 get_obj_state: s->obj_tag was set empty > 21.431382 <ID> 10 cache get: name=.users.uid+documentstore : hit > 21.431401 <ID> 10 swift_user=documentstore:swift > 21.431416 <ID> 20 build_token > token=13000000646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17 > 21.431467 <ID> 2 req 980641:0.000262:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:reading permissions > 21.431493 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:<CNT-UUID> > state=0x7f03080f31e8 s->prefetch_data=0 > 21.431508 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss (requested=22, > cached=19) > 21.433081 <ID> 10 cache put: name=.rgw+<CNT-UUID> > 21.433106 <ID> 10 removing entry: > name=.rgw+stresstest-ab9ee3e2-dcf5-4a5b-ab40-931d94c7784038242 from cache > LRU > 21.433114 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end > 21.433120 <ID> 20 get_obj_state: s->obj_tag was set empty > 21.433122 <ID> 20 Read xattr: user.rgw.idtag > 21.433124 <ID> 20 Read xattr: user.rgw.manifest > 21.433129 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit > 21.433141 <ID> 20 rgw_get_bucket_info: bucket instance: > <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.118911]) > 21.433148 <ID> 20 reading from > .rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911 > 21.433169 <ID> 20 get_obj_state: rctx=0x7f03837ed250 > obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911 > state=0x7f0308005778 s->prefetch_data=0 > 21.433185 <ID> 10 cache get: > name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : type miss > (requested=22, cached=19) > 21.434632 <ID> 10 cache put: > name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 > 21.434650 <ID> 10 moving > .rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 to cache LRU end > 21.434657 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177 > 21.434664 <ID> 20 get_obj_state: s->obj_tag was set empty > 21.434667 <ID> 20 Read xattr: user.rgw.acl > 21.434668 <ID> 20 Read xattr: user.rgw.idtag > 21.434669 <ID> 20 Read xattr: user.rgw.manifest > 21.434675 <ID> 10 cache get: > name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : hit > 21.434701 <ID> 15 Read AccessControlPolicy<AccessControlPolicy > xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document > Store</DisplayName></Owner><AccessControlList><Grant><Grantee > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document > Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy> > 21.434710 <ID> 2 req 980641:0.003506:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:init op > 21.434716 <ID> 2 req 980641:0.003512:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask > 21.434719 <ID> 20 required_mask= 2 user.op_mask=7 > 21.434720 <ID> 2 req 980641:0.003516:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions > 21.434723 <ID> 5 Searching permissions for uid=documentstore mask=50 > 21.434725 <ID> 5 Found permission: 15 > 21.434726 <ID> 5 Searching permissions for group=1 mask=50 > 21.434727 <ID> 5 Permissions for group not found > 21.434729 <ID> 5 Searching permissions for group=2 mask=50 > 21.434730 <ID> 5 Permissions for group not found > 21.434731 <ID> 5 Getting permissions id=documentstore owner=documentstore > perm=2 > 21.434732 <ID> 10 uid=documentstore requested perm (type)=2, policy perm=2, > user_perm_mask=2, acl perm=2 > 21.434734 <ID> 2 req 980641:0.003530:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:verifying op params > 21.434737 <ID> 2 req 980641:0.003533:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:executing > 21.434850 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=<CNT-UUID>:version > state=0x7f03080c5378 s->prefetch_data=0 > 21.436144 <ID> 10 setting object write_tag=default.78418684.980641 > 21.444613 <ID> 2 req 980641:0.013408:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:http status=201 > > ============================== > > ➜ ~ cat rados_longest.txt > 31.886128 <ID> 20 QUERY_STRING=page=swift¶ms=/v1/<CNT-UUID>/version > 31.886129 <ID> 20 REMOTE_ADDR=10.102.8.140 > 31.886130 <ID> 20 REMOTE_PORT=46714 > 31.886131 <ID> 20 REQUEST_METHOD=PUT > 31.886132 <ID> 20 REQUEST_SCHEME=https > 31.886134 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version > 31.886135 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi > 31.886136 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version > 31.886137 <ID> 20 > SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version > 31.886138 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version > 31.886139 <ID> 20 SERVER_ADDR=10.102.9.11 > 31.886140 <ID> 20 SERVER_ADMIN=[no address given] > 31.886141 <ID> 20 SERVER_NAME=localhost > 31.886143 <ID> 20 SERVER_PORT=8405 > 31.886144 <ID> 20 SERVER_PORT_SECURE=443 > 31.886145 <ID> 20 SERVER_PROTOCOL=HTTP/1.1 > 31.886146 <ID> 20 SERVER_SIGNATURE= > 31.886147 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu) > 31.886148 <ID> 1 ====== starting new request req=0x7f038c024c50 ===== > 31.886162 <ID> 2 req 983095:0.000013::PUT > /swift/v1/<CNT-UUID>/version::initializing > 31.886195 <ID> 10 ver=v1 first=<CNT-UUID> req=version > 31.886200 <ID> 10 s->object=version s->bucket=<CNT-UUID> > 31.886203 <ID> 2 req 983095:0.000055:swift:PUT > /swift/v1/<CNT-UUID>/version::getting op > 31.886208 <ID> 2 req 983095:0.000060:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:authorizing > 31.886242 <ID> 20 get_obj_state: rctx=0x7f02b0062770 > obj=.users.swift:documentstore:swift state=0x7f02b007ac18 s->prefetch_data=0 > 31.886250 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit > 31.886255 <ID> 20 get_obj_state: s->obj_tag was set empty > 31.886260 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit > 31.886297 <ID> 20 get_obj_state: rctx=0x7f02b0062770 > obj=.users.uid:documentstore state=0x7f02b007ac18 s->prefetch_data=0 > 31.886303 <ID> 10 cache get: name=.users.uid+documentstore : hit > 31.886308 <ID> 20 get_obj_state: s->obj_tag was set empty > 31.886312 <ID> 10 cache get: name=.users.uid+documentstore : hit > 31.886329 <ID> 10 swift_user=documentstore:swift > 31.886343 <ID> 20 build_token > token=13000000646f63756d656e7473746f72653a737769667410f2006ed65dbbaa4f453554207a0f1f > 31.886393 <ID> 2 req 983095:0.000245:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:reading permissions > 31.886419 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=.rgw:<CNT-UUID> > state=0x7f02b007ac18 s->prefetch_data=0 > 31.886430 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss (requested=22, > cached=19) > 36.746327 <ID> 10 cache put: name=.rgw+<CNT-UUID> > 36.746404 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end > 36.746426 <ID> 20 get_obj_state: s->obj_tag was set empty > 36.746431 <ID> 20 Read xattr: user.rgw.idtag > 36.746433 <ID> 20 Read xattr: user.rgw.manifest > 36.746452 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit > 36.746481 <ID> 20 rgw_get_bucket_info: bucket instance: > <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.119116]) > 36.746491 <ID> 20 reading from > .rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116 > 36.746549 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 > obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116 > state=0x7f02b00ce638 s->prefetch_data=0 > 36.746585 <ID> 10 cache get: > name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : type miss > (requested=22, cached=19) > 36.747938 <ID> 10 cache put: > name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 > 36.747955 <ID> 10 moving > .rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 to cache LRU end > 36.747963 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177 > 36.747972 <ID> 20 get_obj_state: s->obj_tag was set empty > 36.747975 <ID> 20 Read xattr: user.rgw.acl > 36.747977 <ID> 20 Read xattr: user.rgw.idtag > 36.747978 <ID> 20 Read xattr: user.rgw.manifest > 36.747985 <ID> 10 cache get: > name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : hit > 36.748025 <ID> 15 Read AccessControlPolicy<AccessControlPolicy > xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document > Store</DisplayName></Owner><AccessControlList><Grant><Grantee > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document > Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy> > 36.748037 <ID> 2 req 983095:4.861888:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:init op > 36.748043 <ID> 2 req 983095:4.861895:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask > 36.748046 <ID> 20 required_mask= 2 user.op_mask=7 > 36.748050 <ID> 2 req 983095:4.861902:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions > 36.748054 <ID> 5 Searching permissions for uid=documentstore mask=50 > 36.748056 <ID> 5 Found permission: 15 > 36.748058 <ID> 5 Searching permissions for group=1 mask=50 > 36.748060 <ID> 5 Permissions for group not found > 36.748061 <ID> 5 Searching permissions for group=2 mask=50 > 36.748063 <ID> 5 Permissions for group not found > 36.748064 <ID> 5 Getting permissions id=documentstore owner=documentstore > perm=2 > 36.748066 <ID> 10 uid=documentstore requested perm (type)=2, policy perm=2, > user_perm_mask=2, acl perm=2 > 36.748069 <ID> 2 req 983095:4.861921:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:verifying op params > 36.748072 <ID> 2 req 983095:4.861924:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:executing > 36.748200 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=<CNT-UUID>:version > state=0x7f02b0042618 s->prefetch_data=0 > 36.802077 <ID> 10 setting object write_tag=default.78418684.983095 > 36.818727 <ID> 2 req 983095:4.932579:swift:PUT > /swift/v1/<CNT-UUID>/version:put_obj:http status=201 > > ============================== > > > -- > Daniel Schneller > Mobile Development Lead > > CenterDevice GmbH | Merscheider Straße 1 > | 42699 Solingen > tel: +49 1754155711 | Deutschland > daniel.schneller@xxxxxxxxxxxxxxxx | www.centerdevice.com > > > > > On 06 Oct 2014, at 19:26, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote: > > It'd be interesting to see which rados operation is slowing down the > requests. Can you provide a log dump of a request (with 'debug rgw = > 20', and 'debug ms = 1'). This might give us a better idea as to > what's going on. > > Thanks, > Yehuda > > On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller > <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > > Hi again! > > We have done some tests regarding the limits of storing lots and > lots of buckets through Rados Gateway into Ceph. > > Our test used a single user for which we removed the default max > buckets limit. It then continuously created containers - both empty > and such with 10 objects of around 100k random data in them. > > With 3 parallel processes we saw relatively consistent time of > about 500-700ms per such container. > > This kept steady until we reached approx. 3 million containers > after which the time per insert sharply went up to currently > around 1600ms and rising. Due to some hiccups with network > equipment the tests were aborted a few times, but then resumed without > deleting any of the previous runs created containers, so the actual > number might be 2.8 or 3.2 million, but still in that ballpark. > We aborted the test here. > > Judging by the advice given earlier (see quoted mail below) that > we might hit a limit on some per-user data structures, we created > another user account, removed its max-bucket limit as well and > restarted the benchmark with that one, _expecting_ the times to be > down to the original range of 500-700ms. > > However, what we are seeing is that the times stay at the 1600ms > and higher levels even for that fresh account. > > Here is the output of `rados df`, reformatted to fit the email. > clones, degraded and unfound were 0 in all cases and have been > left out for clarity: > > .rgw > ========================= > KB: 1,966,932 > objects: 9,094,552 > rd: 195,747,645 > rd KB: 153,585,472 > wr: 30,191,844 > wr KB: 10,751,065 > > .rgw.buckets > ========================= > KB: 2,038,313,855 > objects: 22,088,103 > rd: 5,455,123 > rd KB: 408,416,317 > wr: 149,377,728 > wr KB: 1,882,517,472 > > .rgw.buckets.index > ========================= > KB: 0 > objects: 5,374,376 > rd: 267,996,778 > rd KB: 262,626,106 > wr: 107,142,891 > wr KB: 0 > > .rgw.control > ========================= > KB: 0 > objects: 8 > rd: 0 > rd KB: 0 > wr: 0 > wr KB: 0 > > .rgw.gc > ========================= > KB: 0 > objects: 32 > rd: 5,554,407 > rd KB: 5,713,942 > wr: 8,355,934 > wr KB: 0 > > .rgw.root > ========================= > KB: 1 > objects: 3 > rd: 524 > rd KB: 346 > wr: 3 > wr KB: 3 > > > We would very much like to understand what is going on here > in order to decide if Rados Gateway is a viable option to base > our production system on (where we expect similar counts > as in the benchmark), or if we need to investigate using librados > directly which we would like to avoid if possible. > > Any advice on what configuration parameters to check or > which additional information to provide to analyze this would be > very much welcome. > > Cheers, > Daniel > > > -- > Daniel Schneller > Mobile Development Lead > > CenterDevice GmbH | Merscheider Straße 1 > | 42699 Solingen > tel: +49 1754155711 | Deutschland > daniel.schneller@xxxxxxxxxxxxxxxx | www.centerdevice.com > > > > > On 10 Sep 2014, at 19:42, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > > On Wednesday, September 10, 2014, Daniel Schneller > <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > > > On 09 Sep 2014, at 21:43, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > > > Yehuda can talk about this with more expertise than I can, but I think > it should be basically fine. By creating so many buckets you're > decreasing the effectiveness of RGW's metadata caching, which means > > the initial lookup in a particular bucket might take longer. > > > Thanks for your thoughts. With “initial lookup in a particular bucket” > do you mean accessing any of the objects in a bucket? If we directly > access the object (not enumerating the buckets content), would that > still be an issue? > Just trying to understand the inner workings a bit better to make > more educated guesses :) > > > > When doing an object lookup, the gateway combines the "bucket ID" with a > mangled version of the object name to try and do a read out of RADOS. It > first needs to get that bucket ID though -- it will cache an the bucket > name->ID mapping, but if you have a ton of buckets there could be enough > entries to degrade the cache's effectiveness. (So, you're more likely to pay > that extra disk access lookup.) > > > > > The big concern is that we do maintain a per-user list of all their > buckets — which is stored in a single RADOS object — so if you have an > extreme number of buckets that RADOS object could get pretty big and > become a bottleneck when creating/removing/listing the buckets. You > > > Alright. Listing buckets is no problem, that we don’t do. Can you > say what “pretty big” would be in terms of MB? How much space does a > bucket record consume in there? Based on that I could run a few numbers. > > > > Uh, a kilobyte per bucket? You could look it up in the source (I'm on my > phone) but I *believe* the bucket name is allowed to be larger than the rest > combined... > More particularly, though, if you've got a single user uploading documents, > each creating a new bucket, then those bucket creates are going to serialize > on this one object. > -Greg > > > > > should run your own experiments to figure out what the limits are > there; perhaps you have an easy way of sharding up documents into > different users. > > > Good advice. We can do that per distributor (an org unit in our > software) to at least compartmentalize any potential locking issues > in this area to that single entity. Still, there would be quite > a lot of buckets/objects per distributor, so some more detail on > the above items would be great. > > Thanks a lot! > > > Daniel > > > > > -- > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com