Re: max_bucket limit -- safe to disable?

Yehuda Sadeh <yehuda@xxxxxxxxxx> · Mon, 6 Oct 2014 10:26:47 -0700

It'd be interesting to see which rados operation is slowing down the
requests. Can you provide a log dump of a request (with 'debug rgw =
20', and 'debug ms = 1'). This might give us a better idea as to
what's going on.

Thanks,
Yehuda

On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller
<daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
> Hi again!
>
> We have done some tests regarding the limits of storing lots and
> lots of buckets through Rados Gateway into Ceph.
>
> Our test used a single user for which we removed the default max
> buckets limit. It then continuously created containers - both empty
> and such with 10 objects of around 100k random data in them.
>
> With 3 parallel processes we saw relatively consistent time of
> about   500-700ms    per such container.
>
> This kept steady until we reached approx. 3 million containers
> after which the time per insert sharply went up to currently
> around   1600ms   and rising. Due to some hiccups with network
> equipment the tests were aborted a few times, but then resumed without
> deleting any of the previous runs created containers, so the actual
> number might be 2.8 or 3.2 million, but still in that ballpark.
> We aborted the test here.
>
> Judging by the advice given earlier (see quoted mail below) that
> we might hit a limit on some per-user data structures, we created
> another user account, removed its max-bucket limit as well and
> restarted the benchmark with that one, _expecting_ the times to be
> down to the original range of 500-700ms.
>
> However, what we are seeing is that the times stay at the   1600ms
> and higher levels even for that fresh account.
>
> Here is the output of `rados df`, reformatted to fit the email.
> clones, degraded and unfound were 0 in all cases and have been
> left out for clarity:
>
> .rgw
> =========================
>        KB:     1,966,932
>   objects:     9,094,552
>        rd:   195,747,645
>     rd KB:   153,585,472
>        wr:    30,191,844
>     wr KB:    10,751,065
>
> .rgw.buckets
> =========================
>        KB: 2,038,313,855
>   objects:    22,088,103
>        rd:     5,455,123
>     rd KB:   408,416,317
>        wr:   149,377,728
>     wr KB: 1,882,517,472
>
> .rgw.buckets.index
> =========================
>        KB:             0
>   objects:     5,374,376
>        rd:   267,996,778
>     rd KB:   262,626,106
>        wr:   107,142,891
>     wr KB:             0
>
> .rgw.control
> =========================
>        KB:             0
>   objects:             8
>        rd:             0
>     rd KB:             0
>        wr:             0
>     wr KB:             0
>
> .rgw.gc
> =========================
>        KB:             0
>   objects:            32
>        rd:     5,554,407
>     rd KB:     5,713,942
>        wr:     8,355,934
>     wr KB:             0
>
> .rgw.root
> =========================
>        KB:             1
>   objects:             3
>        rd:           524
>     rd KB:           346
>        wr:             3
>     wr KB:             3
>
>
> We would very much like to understand what is going on here
> in order to decide if Rados Gateway is a viable option to base
> our production system on (where we expect similar counts
> as in the benchmark), or if we need to investigate using librados
> directly which we would like to avoid if possible.
>
> Any advice on what configuration parameters to check or
> which additional information to provide to analyze this would be
> very much welcome.
>
> Cheers,
> Daniel
>
>
> --
> Daniel Schneller
> Mobile Development Lead
>
> CenterDevice GmbH                  | Merscheider Straße 1
>                                   | 42699 Solingen
> tel: +49 1754155711                | Deutschland
> daniel.schneller@xxxxxxxxxxxxxxxx  | www.centerdevice.com
>
>
>
>
> On 10 Sep 2014, at 19:42, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>
> On Wednesday, September 10, 2014, Daniel Schneller
> <daniel.schneller@xxxxxxxxxxxxxxxx> wrote:
>>
>> On 09 Sep 2014, at 21:43, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>>
>> Yehuda can talk about this with more expertise than I can, but I think
>> it should be basically fine. By creating so many buckets you're
>> decreasing the effectiveness of RGW's metadata caching, which means
>>
>> the initial lookup in a particular bucket might take longer.
>>
>>
>> Thanks for your thoughts. With “initial lookup in a particular bucket”
>> do you mean accessing any of the objects in a bucket? If we directly
>> access the object (not enumerating the buckets content), would that
>> still be an issue?
>> Just trying to understand the inner workings a bit better to make
>> more educated guesses :)
>
>
> When doing an object lookup, the gateway combines the "bucket ID" with a
> mangled version of the object name to try and do a read out of RADOS. It
> first needs to get that bucket ID though -- it will cache an the bucket
> name->ID mapping, but if you have a ton of buckets there could be enough
> entries to degrade the cache's effectiveness. (So, you're more likely to pay
> that extra disk access lookup.)
>
>>
>>
>>
>> The big concern is that we do maintain a per-user list of all their
>> buckets — which is stored in a single RADOS object — so if you have an
>> extreme number of buckets that RADOS object could get pretty big and
>> become a bottleneck when creating/removing/listing the buckets. You
>>
>>
>> Alright. Listing buckets is no problem, that we don’t do. Can you
>> say what “pretty big” would be in terms of MB? How much space does a
>> bucket record consume in there? Based on that I could run a few numbers.
>
>
> Uh, a kilobyte per bucket? You could look it up in the source (I'm on my
> phone) but I *believe* the bucket name is allowed to be larger than the rest
> combined...
> More particularly, though, if you've got a single user uploading documents,
> each creating a new bucket, then those bucket creates are going to serialize
> on this one object.
> -Greg
>
>>
>>
>>
>> should run your own experiments to figure out what the limits are
>> there; perhaps you have an easy way of sharding up documents into
>> different users.
>>
>>
>> Good advice. We can do that per distributor (an org unit in our
>> software) to at least compartmentalize any potential locking issues
>> in this area to that single entity. Still, there would be quite
>> a lot of buckets/objects per distributor, so some more detail on
>> the above items would be great.
>>
>> Thanks a lot!
>>
>>
>> Daniel
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com