Re: max_bucket limit -- safe to disable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This operation stalled quite a bit, seems that it was waiting for the osd:

2.547155 7f036ffc7700  1 -- 10.102.4.11:0/1009401 -->
10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711
.bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 [call
version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0
0x7f026802e2c0 con 0x7f040c055ca0
...
7.619750 7f041ddf4700  1 -- 10.102.4.11:0/1009401 <== osd.32
10.102.4.14:6809/7428 208252 ==== osd_op_reply(27514711
.bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043
[call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6 ==== 338+0+336
(3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0

By looking at these logs it seems that there are only 8 pgs on the
.rgw pool, if this is correct then you may want to change that
considering your workload.

Yehuda


On Tue, Oct 7, 2014 at 3:46 PM, Daniel Schneller
<daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
> Hi!
>
> Sorry, I must have missed the enabling of that debug module.
> However, the test setup has been the same all the time -
> I only have the one test-application :)
>
> But maybe I phrased it a bit ambiguously when I wrote
>
>> It then continuously created containers - both empty
>> and such with 10 objects of around 100k random data in them.
>
> 100 kilobytes is the size of a single object, of which we create 10
> per container. The container gets created first, without any
> objects, naturally, then 10 objects are added. One of these objects
> is called “version”, the rest have generated names with a fixed
> prefix and appended 1-9. The version object is the one I picked
> for the example logs I sent earlier.
>
> I hope this makes the setup clearer.
>
> Attached you will find the (now more extensive) logs for the outliers
> again. As you did not say that I garbled the logs, I assume the
> pre-processing was OK, so I have prepared the new data in a similar
> fashion, marking the relevant request with <CNT-UUID-FINDME>.
>
> I have not removed any lines in between the beginning of the
> “interesting” request and its completion to keep all the network
> traffic log intact. Due to the increased verbosity, I will not post
> the logs inline, but only attach them gzipped.
>
> As before, should the full data set be needed, I can provide
> an archived version.
>
>
>
>
> Thanks for your support!
> Daniel
>
>
>
>
>> On 07 Oct 2014, at 22:45, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote:
>>
>> The logs here don't include the messenger (debug ms = 1). It's hard to
>> tell what going on from looking at the outliers. Also, in your
>> previous mail you described a different benchmark, you tested writing
>> large number of objects into a single bucket, whereas in this test
>> you're testing multiple bucket creations, which have a completely
>> different characteristics.
>>
>>
>> On Tue, Oct 7, 2014 at 1:26 PM, Daniel Schneller
>> <daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
>>> Hi!
>>>
>>> I have re-run our test as follows:
>>>
>>> * 4 Rados Gateways, on 4 baremetal machines which have
>>>  a total of 48 spinning rust OSDs.
>>>
>>> * Benchmark run on a virtual machine talking to HAProxy
>>>  which balances the requests across the 4 Rados GWs.
>>>
>>> * Three instances of the benchmark run in parallel. Each
>>>  instance creates 1000 containers, puts 11 objects into
>>>  each container. Once they have all been created, each
>>>  instances deletes its own containers again.
>>>
>>> I configured one of the radosgws with the debug levels
>>> you requested. The tests produced quite an amount of data
>>> (approx. 1GB of text), so I took the liberty to
>>> pre-process that a bit.
>>>
>>> In this run we landed at around 1.2s per container
>>> created (including the objects in them) on average.
>>> This was slightly better than the 1.6s we witnessed
>>> before, but that test ran for much longer, so there might
>>> have been some queue-up effect.
>>>
>>> Interestingly enough the average is actually somewhat
>>> misleading. The logs below show the creation of one
>>> object in a container each, one being the fastest of this
>>> benchmark (at least on the debug-enabled radosgw), one
>>> being the slowest.
>>>
>>> The fastest one was completed in 0.013s, the longest one
>>> took 4.93s(!).
>>>
>>> The attached logs are cleaned up so that they each show
>>> just a single request and replaced longish, but constant
>>> information with placeholders. Our container names are
>>> of the form “stresstest-xxxxxxxxxxx” which I shortened
>>> to “<CNT-UUID>” for brevity. Also, I removed the redundant
>>> prefix (date, hour, minute of day).
>>>
>>> The column before the log level looked like a thread-id.
>>> As I focused on a single request, I removed all the lines
>>> that did not match the same id, replacing the actual value
>>> with “<ID>”. That makes the logs much easier to read and
>>> understand.
>>>
>>> Just in case I might have removed too much information
>>> for the logs to be useful, the complete log is available
>>> in BZIP2 compressed form for download. Just let me know
>>> if you need it, then I will provide a link via direct email.
>>>
>>> To me it seems like there might indeed be a contention
>>> issue. It would be interesting to know, if this is correct
>>> and if there are any settings that we could adjust to
>>> alleviate the issue.
>>>
>>> Daniel
>>>
>>> ==============================
>>>
>>> ➜  ~  cat rados_shortest.txt
>>> 21.431185 <ID> 20 QUERY_STRING=page=swift&params=/v1/<CNT-UUID>/version
>>> 21.431187 <ID> 20 REMOTE_ADDR=10.102.8.140
>>> 21.431188 <ID> 20 REMOTE_PORT=44007
>>> 21.431189 <ID> 20 REQUEST_METHOD=PUT
>>> 21.431190 <ID> 20 REQUEST_SCHEME=https
>>> 21.431191 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
>>> 21.431192 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
>>> 21.431193 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
>>> 21.431194 <ID> 20
>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version
>>> 21.431195 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
>>> 21.431196 <ID> 20 SERVER_ADDR=10.102.9.11
>>> 21.431197 <ID> 20 SERVER_ADMIN=[no address given]
>>> 21.431198 <ID> 20 SERVER_NAME=localhost
>>> 21.431199 <ID> 20 SERVER_PORT=8405
>>> 21.431200 <ID> 20 SERVER_PORT_SECURE=443
>>> 21.431201 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
>>> 21.431202 <ID> 20 SERVER_SIGNATURE=
>>> 21.431203 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
>>> 21.431205 <ID>  1 ====== starting new request req=0x7f038c019e90 =====
>>> 21.431219 <ID>  2 req 980641:0.000015::PUT
>>> /swift/v1/<CNT-UUID>/version::initializing
>>> 21.431259 <ID> 10 ver=v1 first=<CNT-UUID> req=version
>>> 21.431265 <ID> 10 s->object=version s->bucket=<CNT-UUID>
>>> 21.431269 <ID>  2 req 980641:0.000065:swift:PUT
>>> /swift/v1/<CNT-UUID>/version::getting op
>>> 21.431274 <ID>  2 req 980641:0.000070:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing
>>> 21.431321 <ID> 20 get_obj_state: rctx=0x7f030800b720
>>> obj=.users.swift:documentstore:swift state=0x7f03080f31e8 s->prefetch_data=0
>>> 21.431332 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>> 21.431338 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 21.431344 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>> 21.431369 <ID> 20 get_obj_state: rctx=0x7f030800b720
>>> obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0
>>> 21.431374 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>> 21.431378 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 21.431382 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>> 21.431401 <ID> 10 swift_user=documentstore:swift
>>> 21.431416 <ID> 20 build_token
>>> token=13000000646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17
>>> 21.431467 <ID>  2 req 980641:0.000262:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
>>> 21.431493 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:<CNT-UUID>
>>> state=0x7f03080f31e8 s->prefetch_data=0
>>> 21.431508 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss (requested=22,
>>> cached=19)
>>> 21.433081 <ID> 10 cache put: name=.rgw+<CNT-UUID>
>>> 21.433106 <ID> 10 removing entry:
>>> name=.rgw+stresstest-ab9ee3e2-dcf5-4a5b-ab40-931d94c7784038242 from cache
>>> LRU
>>> 21.433114 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
>>> 21.433120 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 21.433122 <ID> 20 Read xattr: user.rgw.idtag
>>> 21.433124 <ID> 20 Read xattr: user.rgw.manifest
>>> 21.433129 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
>>> 21.433141 <ID> 20 rgw_get_bucket_info: bucket instance:
>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.118911])
>>> 21.433148 <ID> 20 reading from
>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911
>>> 21.433169 <ID> 20 get_obj_state: rctx=0x7f03837ed250
>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911
>>> state=0x7f0308005778 s->prefetch_data=0
>>> 21.433185 <ID> 10 cache get:
>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : type miss
>>> (requested=22, cached=19)
>>> 21.434632 <ID> 10 cache put:
>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911
>>> 21.434650 <ID> 10 moving
>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 to cache LRU end
>>> 21.434657 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
>>> 21.434664 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 21.434667 <ID> 20 Read xattr: user.rgw.acl
>>> 21.434668 <ID> 20 Read xattr: user.rgw.idtag
>>> 21.434669 <ID> 20 Read xattr: user.rgw.manifest
>>> 21.434675 <ID> 10 cache get:
>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : hit
>>> 21.434701 <ID> 15 Read AccessControlPolicy<AccessControlPolicy
>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/";><Owner><ID>documentstore</ID><DisplayName>Document
>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document
>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
>>> 21.434710 <ID>  2 req 980641:0.003506:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:init op
>>> 21.434716 <ID>  2 req 980641:0.003512:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
>>> 21.434719 <ID> 20 required_mask= 2 user.op_mask=7
>>> 21.434720 <ID>  2 req 980641:0.003516:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
>>> 21.434723 <ID>  5 Searching permissions for uid=documentstore mask=50
>>> 21.434725 <ID>  5 Found permission: 15
>>> 21.434726 <ID>  5 Searching permissions for group=1 mask=50
>>> 21.434727 <ID>  5 Permissions for group not found
>>> 21.434729 <ID>  5 Searching permissions for group=2 mask=50
>>> 21.434730 <ID>  5 Permissions for group not found
>>> 21.434731 <ID>  5 Getting permissions id=documentstore owner=documentstore
>>> perm=2
>>> 21.434732 <ID> 10  uid=documentstore requested perm (type)=2, policy perm=2,
>>> user_perm_mask=2, acl perm=2
>>> 21.434734 <ID>  2 req 980641:0.003530:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
>>> 21.434737 <ID>  2 req 980641:0.003533:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:executing
>>> 21.434850 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=<CNT-UUID>:version
>>> state=0x7f03080c5378 s->prefetch_data=0
>>> 21.436144 <ID> 10 setting object write_tag=default.78418684.980641
>>> 21.444613 <ID>  2 req 980641:0.013408:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201
>>>
>>> ==============================
>>>
>>> ➜  ~  cat rados_longest.txt
>>> 31.886128 <ID> 20 QUERY_STRING=page=swift&params=/v1/<CNT-UUID>/version
>>> 31.886129 <ID> 20 REMOTE_ADDR=10.102.8.140
>>> 31.886130 <ID> 20 REMOTE_PORT=46714
>>> 31.886131 <ID> 20 REQUEST_METHOD=PUT
>>> 31.886132 <ID> 20 REQUEST_SCHEME=https
>>> 31.886134 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
>>> 31.886135 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
>>> 31.886136 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
>>> 31.886137 <ID> 20
>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version
>>> 31.886138 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
>>> 31.886139 <ID> 20 SERVER_ADDR=10.102.9.11
>>> 31.886140 <ID> 20 SERVER_ADMIN=[no address given]
>>> 31.886141 <ID> 20 SERVER_NAME=localhost
>>> 31.886143 <ID> 20 SERVER_PORT=8405
>>> 31.886144 <ID> 20 SERVER_PORT_SECURE=443
>>> 31.886145 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
>>> 31.886146 <ID> 20 SERVER_SIGNATURE=
>>> 31.886147 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
>>> 31.886148 <ID>  1 ====== starting new request req=0x7f038c024c50 =====
>>> 31.886162 <ID>  2 req 983095:0.000013::PUT
>>> /swift/v1/<CNT-UUID>/version::initializing
>>> 31.886195 <ID> 10 ver=v1 first=<CNT-UUID> req=version
>>> 31.886200 <ID> 10 s->object=version s->bucket=<CNT-UUID>
>>> 31.886203 <ID>  2 req 983095:0.000055:swift:PUT
>>> /swift/v1/<CNT-UUID>/version::getting op
>>> 31.886208 <ID>  2 req 983095:0.000060:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing
>>> 31.886242 <ID> 20 get_obj_state: rctx=0x7f02b0062770
>>> obj=.users.swift:documentstore:swift state=0x7f02b007ac18 s->prefetch_data=0
>>> 31.886250 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>> 31.886255 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 31.886260 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>> 31.886297 <ID> 20 get_obj_state: rctx=0x7f02b0062770
>>> obj=.users.uid:documentstore state=0x7f02b007ac18 s->prefetch_data=0
>>> 31.886303 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>> 31.886308 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 31.886312 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>> 31.886329 <ID> 10 swift_user=documentstore:swift
>>> 31.886343 <ID> 20 build_token
>>> token=13000000646f63756d656e7473746f72653a737769667410f2006ed65dbbaa4f453554207a0f1f
>>> 31.886393 <ID>  2 req 983095:0.000245:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
>>> 31.886419 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=.rgw:<CNT-UUID>
>>> state=0x7f02b007ac18 s->prefetch_data=0
>>> 31.886430 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss (requested=22,
>>> cached=19)
>>> 36.746327 <ID> 10 cache put: name=.rgw+<CNT-UUID>
>>> 36.746404 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
>>> 36.746426 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 36.746431 <ID> 20 Read xattr: user.rgw.idtag
>>> 36.746433 <ID> 20 Read xattr: user.rgw.manifest
>>> 36.746452 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
>>> 36.746481 <ID> 20 rgw_get_bucket_info: bucket instance:
>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.119116])
>>> 36.746491 <ID> 20 reading from
>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116
>>> 36.746549 <ID> 20 get_obj_state: rctx=0x7f0378fd8250
>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116
>>> state=0x7f02b00ce638 s->prefetch_data=0
>>> 36.746585 <ID> 10 cache get:
>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : type miss
>>> (requested=22, cached=19)
>>> 36.747938 <ID> 10 cache put:
>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116
>>> 36.747955 <ID> 10 moving
>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 to cache LRU end
>>> 36.747963 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
>>> 36.747972 <ID> 20 get_obj_state: s->obj_tag was set empty
>>> 36.747975 <ID> 20 Read xattr: user.rgw.acl
>>> 36.747977 <ID> 20 Read xattr: user.rgw.idtag
>>> 36.747978 <ID> 20 Read xattr: user.rgw.manifest
>>> 36.747985 <ID> 10 cache get:
>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : hit
>>> 36.748025 <ID> 15 Read AccessControlPolicy<AccessControlPolicy
>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/";><Owner><ID>documentstore</ID><DisplayName>Document
>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document
>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
>>> 36.748037 <ID>  2 req 983095:4.861888:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:init op
>>> 36.748043 <ID>  2 req 983095:4.861895:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
>>> 36.748046 <ID> 20 required_mask= 2 user.op_mask=7
>>> 36.748050 <ID>  2 req 983095:4.861902:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
>>> 36.748054 <ID>  5 Searching permissions for uid=documentstore mask=50
>>> 36.748056 <ID>  5 Found permission: 15
>>> 36.748058 <ID>  5 Searching permissions for group=1 mask=50
>>> 36.748060 <ID>  5 Permissions for group not found
>>> 36.748061 <ID>  5 Searching permissions for group=2 mask=50
>>> 36.748063 <ID>  5 Permissions for group not found
>>> 36.748064 <ID>  5 Getting permissions id=documentstore owner=documentstore
>>> perm=2
>>> 36.748066 <ID> 10  uid=documentstore requested perm (type)=2, policy perm=2,
>>> user_perm_mask=2, acl perm=2
>>> 36.748069 <ID>  2 req 983095:4.861921:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
>>> 36.748072 <ID>  2 req 983095:4.861924:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:executing
>>> 36.748200 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=<CNT-UUID>:version
>>> state=0x7f02b0042618 s->prefetch_data=0
>>> 36.802077 <ID> 10 setting object write_tag=default.78418684.983095
>>> 36.818727 <ID>  2 req 983095:4.932579:swift:PUT
>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201
>>>
>>> ==============================
>>>
>>>
>>> --
>>> Daniel Schneller
>>> Mobile Development Lead
>>>
>>> CenterDevice GmbH                  | Merscheider Straße 1
>>>                                   | 42699 Solingen
>>> tel: +49 1754155711                | Deutschland
>>> daniel.schneller@xxxxxxxxxxxxxxxx  | www.centerdevice.com
>>>
>>>
>>>
>>>
>>> On 06 Oct 2014, at 19:26, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote:
>>>
>>> It'd be interesting to see which rados operation is slowing down the
>>> requests. Can you provide a log dump of a request (with 'debug rgw =
>>> 20', and 'debug ms = 1'). This might give us a better idea as to
>>> what's going on.
>>>
>>> Thanks,
>>> Yehuda
>>>
>>> On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller
>>> <daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hi again!
>>>
>>> We have done some tests regarding the limits of storing lots and
>>> lots of buckets through Rados Gateway into Ceph.
>>>
>>> Our test used a single user for which we removed the default max
>>> buckets limit. It then continuously created containers - both empty
>>> and such with 10 objects of around 100k random data in them.
>>>
>>> With 3 parallel processes we saw relatively consistent time of
>>> about   500-700ms    per such container.
>>>
>>> This kept steady until we reached approx. 3 million containers
>>> after which the time per insert sharply went up to currently
>>> around   1600ms   and rising. Due to some hiccups with network
>>> equipment the tests were aborted a few times, but then resumed without
>>> deleting any of the previous runs created containers, so the actual
>>> number might be 2.8 or 3.2 million, but still in that ballpark.
>>> We aborted the test here.
>>>
>>> Judging by the advice given earlier (see quoted mail below) that
>>> we might hit a limit on some per-user data structures, we created
>>> another user account, removed its max-bucket limit as well and
>>> restarted the benchmark with that one, _expecting_ the times to be
>>> down to the original range of 500-700ms.
>>>
>>> However, what we are seeing is that the times stay at the   1600ms
>>> and higher levels even for that fresh account.
>>>
>>> Here is the output of `rados df`, reformatted to fit the email.
>>> clones, degraded and unfound were 0 in all cases and have been
>>> left out for clarity:
>>>
>>> .rgw
>>> =========================
>>>      KB:     1,966,932
>>> objects:     9,094,552
>>>      rd:   195,747,645
>>>   rd KB:   153,585,472
>>>      wr:    30,191,844
>>>   wr KB:    10,751,065
>>>
>>> .rgw.buckets
>>> =========================
>>>      KB: 2,038,313,855
>>> objects:    22,088,103
>>>      rd:     5,455,123
>>>   rd KB:   408,416,317
>>>      wr:   149,377,728
>>>   wr KB: 1,882,517,472
>>>
>>> .rgw.buckets.index
>>> =========================
>>>      KB:             0
>>> objects:     5,374,376
>>>      rd:   267,996,778
>>>   rd KB:   262,626,106
>>>      wr:   107,142,891
>>>   wr KB:             0
>>>
>>> .rgw.control
>>> =========================
>>>      KB:             0
>>> objects:             8
>>>      rd:             0
>>>   rd KB:             0
>>>      wr:             0
>>>   wr KB:             0
>>>
>>> .rgw.gc
>>> =========================
>>>      KB:             0
>>> objects:            32
>>>      rd:     5,554,407
>>>   rd KB:     5,713,942
>>>      wr:     8,355,934
>>>   wr KB:             0
>>>
>>> .rgw.root
>>> =========================
>>>      KB:             1
>>> objects:             3
>>>      rd:           524
>>>   rd KB:           346
>>>      wr:             3
>>>   wr KB:             3
>>>
>>>
>>> We would very much like to understand what is going on here
>>> in order to decide if Rados Gateway is a viable option to base
>>> our production system on (where we expect similar counts
>>> as in the benchmark), or if we need to investigate using librados
>>> directly which we would like to avoid if possible.
>>>
>>> Any advice on what configuration parameters to check or
>>> which additional information to provide to analyze this would be
>>> very much welcome.
>>>
>>> Cheers,
>>> Daniel
>>>
>>>
>>> --
>>> Daniel Schneller
>>> Mobile Development Lead
>>>
>>> CenterDevice GmbH                  | Merscheider Straße 1
>>>                                 | 42699 Solingen
>>> tel: +49 1754155711                | Deutschland
>>> daniel.schneller@xxxxxxxxxxxxxxxx  | www.centerdevice.com
>>>
>>>
>>>
>>>
>>> On 10 Sep 2014, at 19:42, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>
>>> On Wednesday, September 10, 2014, Daniel Schneller
>>> <daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
>>>
>>>
>>> On 09 Sep 2014, at 21:43, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>
>>>
>>> Yehuda can talk about this with more expertise than I can, but I think
>>> it should be basically fine. By creating so many buckets you're
>>> decreasing the effectiveness of RGW's metadata caching, which means
>>>
>>> the initial lookup in a particular bucket might take longer.
>>>
>>>
>>> Thanks for your thoughts. With “initial lookup in a particular bucket”
>>> do you mean accessing any of the objects in a bucket? If we directly
>>> access the object (not enumerating the buckets content), would that
>>> still be an issue?
>>> Just trying to understand the inner workings a bit better to make
>>> more educated guesses :)
>>>
>>>
>>>
>>> When doing an object lookup, the gateway combines the "bucket ID" with a
>>> mangled version of the object name to try and do a read out of RADOS. It
>>> first needs to get that bucket ID though -- it will cache an the bucket
>>> name->ID mapping, but if you have a ton of buckets there could be enough
>>> entries to degrade the cache's effectiveness. (So, you're more likely to pay
>>> that extra disk access lookup.)
>>>
>>>
>>>
>>>
>>> The big concern is that we do maintain a per-user list of all their
>>> buckets — which is stored in a single RADOS object — so if you have an
>>> extreme number of buckets that RADOS object could get pretty big and
>>> become a bottleneck when creating/removing/listing the buckets. You
>>>
>>>
>>> Alright. Listing buckets is no problem, that we don’t do. Can you
>>> say what “pretty big” would be in terms of MB? How much space does a
>>> bucket record consume in there? Based on that I could run a few numbers.
>>>
>>>
>>>
>>> Uh, a kilobyte per bucket? You could look it up in the source (I'm on my
>>> phone) but I *believe* the bucket name is allowed to be larger than the rest
>>> combined...
>>> More particularly, though, if you've got a single user uploading documents,
>>> each creating a new bucket, then those bucket creates are going to serialize
>>> on this one object.
>>> -Greg
>>>
>>>
>>>
>>>
>>> should run your own experiments to figure out what the limits are
>>> there; perhaps you have an easy way of sharding up documents into
>>> different users.
>>>
>>>
>>> Good advice. We can do that per distributor (an org unit in our
>>> software) to at least compartmentalize any potential locking issues
>>> in this area to that single entity. Still, there would be quite
>>> a lot of buckets/objects per distributor, so some more detail on
>>> the above items would be great.
>>>
>>> Thanks a lot!
>>>
>>>
>>> Daniel
>>>
>>>
>>>
>>>
>>> --
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux