Re: max_bucket limit -- safe to disable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

I have re-run our test as follows:

* 4 Rados Gateways, on 4 baremetal machines which have
  a total of 48 spinning rust OSDs.

* Benchmark run on a virtual machine talking to HAProxy
  which balances the requests across the 4 Rados GWs.

* Three instances of the benchmark run in parallel. Each
  instance creates 1000 containers, puts 11 objects into
  each container. Once they have all been created, each
  instances deletes its own containers again.

I configured one of the radosgws with the debug levels
you requested. The tests produced quite an amount of data
(approx. 1GB of text), so I took the liberty to 
pre-process that a bit.

In this run we landed at around 1.2s per container
created (including the objects in them) on average.
This was slightly better than the 1.6s we witnessed
before, but that test ran for much longer, so there might
have been some queue-up effect. 

Interestingly enough the average is actually somewhat
misleading. The logs below show the creation of one
object in a container each, one being the fastest of this
benchmark (at least on the debug-enabled radosgw), one
being the slowest.

The fastest one was completed in 0.013s, the longest one
took 4.93s(!).

The attached logs are cleaned up so that they each show
just a single request and replaced longish, but constant
information with placeholders. Our container names are
of the form “stresstest-xxxxxxxxxxx” which I shortened
to “<CNT-UUID>” for brevity. Also, I removed the redundant
prefix (date, hour, minute of day). 

The column before the log level looked like a thread-id.
As I focused on a single request, I removed all the lines
that did not match the same id, replacing the actual value
with “<ID>”. That makes the logs much easier to read and
understand.

Just in case I might have removed too much information
for the logs to be useful, the complete log is available
in BZIP2 compressed form for download. Just let me know
if you need it, then I will provide a link via direct email.

To me it seems like there might indeed be a contention
issue. It would be interesting to know, if this is correct
and if there are any settings that we could adjust to 
alleviate the issue.

Daniel

==============================

➜  cat rados_shortest.txt 
21.431185 <ID> 20 QUERY_STRING=page=swift&params=/v1/<CNT-UUID>/version
21.431187 <ID> 20 REMOTE_ADDR=10.102.8.140
21.431188 <ID> 20 REMOTE_PORT=44007
21.431189 <ID> 20 REQUEST_METHOD=PUT
21.431190 <ID> 20 REQUEST_SCHEME=https
21.431191 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
21.431192 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
21.431193 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
21.431195 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
21.431196 <ID> 20 SERVER_ADDR=10.102.9.11
21.431197 <ID> 20 SERVER_ADMIN=[no address given]
21.431198 <ID> 20 SERVER_NAME=localhost
21.431199 <ID> 20 SERVER_PORT=8405
21.431200 <ID> 20 SERVER_PORT_SECURE=443
21.431201 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
21.431202 <ID> 20 SERVER_SIGNATURE=
21.431203 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
21.431205 <ID>  1 ====== starting new request req=0x7f038c019e90 =====
21.431219 <ID>  2 req 980641:0.000015::PUT /swift/v1/<CNT-UUID>/version::initializing
21.431259 <ID> 10 ver=v1 first=<CNT-UUID> req=version
21.431265 <ID> 10 s->object=version s->bucket=<CNT-UUID>
21.431269 <ID>  2 req 980641:0.000065:swift:PUT /swift/v1/<CNT-UUID>/version::getting op
21.431274 <ID>  2 req 980641:0.000070:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:authorizing
21.431321 <ID> 20 get_obj_state: rctx=0x7f030800b720 obj=.users.swift:documentstore:swift state=0x7f03080f31e8 s->prefetch_data=0
21.431332 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
21.431338 <ID> 20 get_obj_state: s->obj_tag was set empty
21.431344 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
21.431369 <ID> 20 get_obj_state: rctx=0x7f030800b720 obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0
21.431374 <ID> 10 cache get: name=.users.uid+documentstore : hit
21.431378 <ID> 20 get_obj_state: s->obj_tag was set empty
21.431382 <ID> 10 cache get: name=.users.uid+documentstore : hit
21.431401 <ID> 10 swift_user=documentstore:swift
21.431416 <ID> 20 build_token token=13000000646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17
21.431467 <ID>  2 req 980641:0.000262:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
21.431493 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:<CNT-UUID> state=0x7f03080f31e8 s->prefetch_data=0
21.431508 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss (requested=22, cached=19)
21.433081 <ID> 10 cache put: name=.rgw+<CNT-UUID>
21.433106 <ID> 10 removing entry: name=.rgw+stresstest-ab9ee3e2-dcf5-4a5b-ab40-931d94c7784038242 from cache LRU
21.433114 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
21.433120 <ID> 20 get_obj_state: s->obj_tag was set empty
21.433122 <ID> 20 Read xattr: user.rgw.idtag
21.433124 <ID> 20 Read xattr: user.rgw.manifest
21.433129 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
21.433141 <ID> 20 rgw_get_bucket_info: bucket instance: <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.118911])
21.433148 <ID> 20 reading from .rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911
21.433169 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911 state=0x7f0308005778 s->prefetch_data=0
21.433185 <ID> 10 cache get: name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : type miss (requested=22, cached=19)
21.434632 <ID> 10 cache put: name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911
21.434650 <ID> 10 moving .rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 to cache LRU end
21.434657 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
21.434664 <ID> 20 get_obj_state: s->obj_tag was set empty
21.434667 <ID> 20 Read xattr: user.rgw.acl
21.434668 <ID> 20 Read xattr: user.rgw.idtag
21.434669 <ID> 20 Read xattr: user.rgw.manifest
21.434675 <ID> 10 cache get: name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : hit
21.434701 <ID> 15 Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document Store</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
21.434710 <ID>  2 req 980641:0.003506:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:init op
21.434716 <ID>  2 req 980641:0.003512:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
21.434719 <ID> 20 required_mask= 2 user.op_mask=7
21.434720 <ID>  2 req 980641:0.003516:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
21.434723 <ID>  5 Searching permissions for uid=documentstore mask=50
21.434725 <ID>  5 Found permission: 15
21.434726 <ID>  5 Searching permissions for group=1 mask=50
21.434727 <ID>  5 Permissions for group not found
21.434729 <ID>  5 Searching permissions for group=2 mask=50
21.434730 <ID>  5 Permissions for group not found
21.434731 <ID>  5 Getting permissions id=documentstore owner=documentstore perm=2
21.434732 <ID> 10  uid=documentstore requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
21.434734 <ID>  2 req 980641:0.003530:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
21.434737 <ID>  2 req 980641:0.003533:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:executing
21.434850 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=<CNT-UUID>:version state=0x7f03080c5378 s->prefetch_data=0
21.436144 <ID> 10 setting object write_tag=default.78418684.980641
21.444613 <ID>  2 req 980641:0.013408:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:http status=201

==============================

➜  cat rados_longest.txt 
31.886128 <ID> 20 QUERY_STRING=page=swift&params=/v1/<CNT-UUID>/version
31.886129 <ID> 20 REMOTE_ADDR=10.102.8.140
31.886130 <ID> 20 REMOTE_PORT=46714
31.886131 <ID> 20 REQUEST_METHOD=PUT
31.886132 <ID> 20 REQUEST_SCHEME=https
31.886134 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
31.886135 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
31.886136 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
31.886138 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
31.886139 <ID> 20 SERVER_ADDR=10.102.9.11
31.886140 <ID> 20 SERVER_ADMIN=[no address given]
31.886141 <ID> 20 SERVER_NAME=localhost
31.886143 <ID> 20 SERVER_PORT=8405
31.886144 <ID> 20 SERVER_PORT_SECURE=443
31.886145 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
31.886146 <ID> 20 SERVER_SIGNATURE=
31.886147 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
31.886148 <ID>  1 ====== starting new request req=0x7f038c024c50 =====
31.886162 <ID>  2 req 983095:0.000013::PUT /swift/v1/<CNT-UUID>/version::initializing
31.886195 <ID> 10 ver=v1 first=<CNT-UUID> req=version
31.886200 <ID> 10 s->object=version s->bucket=<CNT-UUID>
31.886203 <ID>  2 req 983095:0.000055:swift:PUT /swift/v1/<CNT-UUID>/version::getting op
31.886208 <ID>  2 req 983095:0.000060:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:authorizing
31.886242 <ID> 20 get_obj_state: rctx=0x7f02b0062770 obj=.users.swift:documentstore:swift state=0x7f02b007ac18 s->prefetch_data=0
31.886250 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
31.886255 <ID> 20 get_obj_state: s->obj_tag was set empty
31.886260 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
31.886297 <ID> 20 get_obj_state: rctx=0x7f02b0062770 obj=.users.uid:documentstore state=0x7f02b007ac18 s->prefetch_data=0
31.886303 <ID> 10 cache get: name=.users.uid+documentstore : hit
31.886308 <ID> 20 get_obj_state: s->obj_tag was set empty
31.886312 <ID> 10 cache get: name=.users.uid+documentstore : hit
31.886329 <ID> 10 swift_user=documentstore:swift
31.886343 <ID> 20 build_token token=13000000646f63756d656e7473746f72653a737769667410f2006ed65dbbaa4f453554207a0f1f
31.886393 <ID>  2 req 983095:0.000245:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
31.886419 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=.rgw:<CNT-UUID> state=0x7f02b007ac18 s->prefetch_data=0
31.886430 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss (requested=22, cached=19)
36.746327 <ID> 10 cache put: name=.rgw+<CNT-UUID>
36.746404 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
36.746426 <ID> 20 get_obj_state: s->obj_tag was set empty
36.746431 <ID> 20 Read xattr: user.rgw.idtag
36.746433 <ID> 20 Read xattr: user.rgw.manifest
36.746452 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
36.746481 <ID> 20 rgw_get_bucket_info: bucket instance: <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.119116])
36.746491 <ID> 20 reading from .rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116
36.746549 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116 state=0x7f02b00ce638 s->prefetch_data=0
36.746585 <ID> 10 cache get: name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : type miss (requested=22, cached=19)
36.747938 <ID> 10 cache put: name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116
36.747955 <ID> 10 moving .rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 to cache LRU end
36.747963 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
36.747972 <ID> 20 get_obj_state: s->obj_tag was set empty
36.747975 <ID> 20 Read xattr: user.rgw.acl
36.747977 <ID> 20 Read xattr: user.rgw.idtag
36.747978 <ID> 20 Read xattr: user.rgw.manifest
36.747985 <ID> 10 cache get: name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : hit
36.748025 <ID> 15 Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document Store</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
36.748037 <ID>  2 req 983095:4.861888:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:init op
36.748043 <ID>  2 req 983095:4.861895:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
36.748046 <ID> 20 required_mask= 2 user.op_mask=7
36.748050 <ID>  2 req 983095:4.861902:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
36.748054 <ID>  5 Searching permissions for uid=documentstore mask=50
36.748056 <ID>  5 Found permission: 15
36.748058 <ID>  5 Searching permissions for group=1 mask=50
36.748060 <ID>  5 Permissions for group not found
36.748061 <ID>  5 Searching permissions for group=2 mask=50
36.748063 <ID>  5 Permissions for group not found
36.748064 <ID>  5 Getting permissions id=documentstore owner=documentstore perm=2
36.748066 <ID> 10  uid=documentstore requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
36.748069 <ID>  2 req 983095:4.861921:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
36.748072 <ID>  2 req 983095:4.861924:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:executing
36.748200 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=<CNT-UUID>:version state=0x7f02b0042618 s->prefetch_data=0
36.802077 <ID> 10 setting object write_tag=default.78418684.983095
36.818727 <ID>  2 req 983095:4.932579:swift:PUT /swift/v1/<CNT-UUID>/version:put_obj:http status=201

==============================


-- 
Daniel Schneller
Mobile Development Lead
 
CenterDevice GmbH                  | Merscheider Straße 1
                                   | 42699 Solingen
tel: +49 1754155711                | Deutschland



On 06 Oct 2014, at 19:26, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote:

It'd be interesting to see which rados operation is slowing down the
requests. Can you provide a log dump of a request (with 'debug rgw =
20', and 'debug ms = 1'). This might give us a better idea as to
what's going on.

Thanks,
Yehuda

On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller
<daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
Hi again!

We have done some tests regarding the limits of storing lots and
lots of buckets through Rados Gateway into Ceph.

Our test used a single user for which we removed the default max
buckets limit. It then continuously created containers - both empty
and such with 10 objects of around 100k random data in them.

With 3 parallel processes we saw relatively consistent time of
about   500-700ms    per such container.

This kept steady until we reached approx. 3 million containers
after which the time per insert sharply went up to currently
around   1600ms   and rising. Due to some hiccups with network
equipment the tests were aborted a few times, but then resumed without
deleting any of the previous runs created containers, so the actual
number might be 2.8 or 3.2 million, but still in that ballpark.
We aborted the test here.

Judging by the advice given earlier (see quoted mail below) that
we might hit a limit on some per-user data structures, we created
another user account, removed its max-bucket limit as well and
restarted the benchmark with that one, _expecting_ the times to be
down to the original range of 500-700ms.

However, what we are seeing is that the times stay at the   1600ms
and higher levels even for that fresh account.

Here is the output of `rados df`, reformatted to fit the email.
clones, degraded and unfound were 0 in all cases and have been
left out for clarity:

.rgw
=========================
      KB:     1,966,932
 objects:     9,094,552
      rd:   195,747,645
   rd KB:   153,585,472
      wr:    30,191,844
   wr KB:    10,751,065

.rgw.buckets
=========================
      KB: 2,038,313,855
 objects:    22,088,103
      rd:     5,455,123
   rd KB:   408,416,317
      wr:   149,377,728
   wr KB: 1,882,517,472

.rgw.buckets.index
=========================
      KB:             0
 objects:     5,374,376
      rd:   267,996,778
   rd KB:   262,626,106
      wr:   107,142,891
   wr KB:             0

.rgw.control
=========================
      KB:             0
 objects:             8
      rd:             0
   rd KB:             0
      wr:             0
   wr KB:             0

.rgw.gc
=========================
      KB:             0
 objects:            32
      rd:     5,554,407
   rd KB:     5,713,942
      wr:     8,355,934
   wr KB:             0

.rgw.root
=========================
      KB:             1
 objects:             3
      rd:           524
   rd KB:           346
      wr:             3
   wr KB:             3


We would very much like to understand what is going on here
in order to decide if Rados Gateway is a viable option to base
our production system on (where we expect similar counts
as in the benchmark), or if we need to investigate using librados
directly which we would like to avoid if possible.

Any advice on what configuration parameters to check or
which additional information to provide to analyze this would be
very much welcome.

Cheers,
Daniel


--
Daniel Schneller
Mobile Development Lead

CenterDevice GmbH                  | Merscheider Straße 1
                                 | 42699 Solingen
tel: +49 1754155711                | Deutschland
daniel.schneller@xxxxxxxxxxxxxxxx  | www.centerdevice.com




On 10 Sep 2014, at 19:42, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

On Wednesday, September 10, 2014, Daniel Schneller
<daniel.schneller@xxxxxxxxxxxxxxxx> wrote:

On 09 Sep 2014, at 21:43, Gregory Farnum <greg@xxxxxxxxxxx> wrote:


Yehuda can talk about this with more expertise than I can, but I think
it should be basically fine. By creating so many buckets you're
decreasing the effectiveness of RGW's metadata caching, which means

the initial lookup in a particular bucket might take longer.


Thanks for your thoughts. With “initial lookup in a particular bucket”
do you mean accessing any of the objects in a bucket? If we directly
access the object (not enumerating the buckets content), would that
still be an issue?
Just trying to understand the inner workings a bit better to make
more educated guesses :)


When doing an object lookup, the gateway combines the "bucket ID" with a
mangled version of the object name to try and do a read out of RADOS. It
first needs to get that bucket ID though -- it will cache an the bucket
name->ID mapping, but if you have a ton of buckets there could be enough
entries to degrade the cache's effectiveness. (So, you're more likely to pay
that extra disk access lookup.)




The big concern is that we do maintain a per-user list of all their
buckets — which is stored in a single RADOS object — so if you have an
extreme number of buckets that RADOS object could get pretty big and
become a bottleneck when creating/removing/listing the buckets. You


Alright. Listing buckets is no problem, that we don’t do. Can you
say what “pretty big” would be in terms of MB? How much space does a
bucket record consume in there? Based on that I could run a few numbers.


Uh, a kilobyte per bucket? You could look it up in the source (I'm on my
phone) but I *believe* the bucket name is allowed to be larger than the rest
combined...
More particularly, though, if you've got a single user uploading documents,
each creating a new bucket, then those bucket creates are going to serialize
on this one object.
-Greg




should run your own experiments to figure out what the limits are
there; perhaps you have an easy way of sharding up documents into
different users.


Good advice. We can do that per distributor (an org unit in our
software) to at least compartmentalize any potential locking issues
in this area to that single entity. Still, there would be quite
a lot of buckets/objects per distributor, so some more detail on
the above items would be great.

Thanks a lot!


Daniel



--
Software Engineer #42 @ http://inktank.com | http://ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux