max_bucket limit -- safe to disable?

greg@xxxxxxxxxxx (Gregory Farnum) · Wed, 10 Sep 2014 10:42:54 -0700

On Wednesday, September 10, 2014, Daniel Schneller <
daniel.schneller at centerdevice.com> wrote:

> On 09 Sep 2014, at 21:43, Gregory Farnum <greg at inktank.com
> <javascript:_e(%7B%7D,'cvml','greg at inktank.com');>> wrote:
>
>
> Yehuda can talk about this with more expertise than I can, but I think
> it should be basically fine. By creating so many buckets you're
> decreasing the effectiveness of RGW's metadata caching, which means
>
> the initial lookup in a particular bucket might take longer.
>
>
> Thanks for your thoughts. With ?initial lookup in a particular bucket?
> do you mean accessing any of the objects in a bucket? If we directly
> access the object (not enumerating the buckets content), would that
> still be an issue?
> Just trying to understand the inner workings a bit better to make
> more educated guesses :)
>

When doing an object lookup, the gateway combines the "bucket ID" with a
mangled version of the object name to try and do a read out of RADOS. It
first needs to get that bucket ID though -- it will cache an the bucket
name->ID mapping, but if you have a ton of buckets there could be enough
entries to degrade the cache's effectiveness. (So, you're more likely to
pay that extra disk access lookup.)

>
>
> The big concern is that we do maintain a per-user list of all their
> buckets ? which is stored in a single RADOS object ? so if you have an
> extreme number of buckets that RADOS object could get pretty big and
> become a bottleneck when creating/removing/listing the buckets. You
>
>
> Alright. Listing buckets is no problem, that we don?t do. Can you
> say what ?pretty big? would be in terms of MB? How much space does a
> bucket record consume in there? Based on that I could run a few numbers.
>

Uh, a kilobyte per bucket? You could look it up in the source (I'm on my
phone) but I *believe* the bucket name is allowed to be larger than the
rest combined...
More particularly, though, if you've got a single user uploading documents,
each creating a new bucket, then those bucket creates are going to
serialize on this one object.
-Greg

>
>
> should run your own experiments to figure out what the limits are
> there; perhaps you have an easy way of sharding up documents into
> different users.
>
>
> Good advice. We can do that per distributor (an org unit in our
> software) to at least compartmentalize any potential locking issues
> in this area to that single entity. Still, there would be quite
> a lot of buckets/objects per distributor, so some more detail on
> the above items would be great.
>
> Thanks a lot!
>
>
> Daniel
>

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140910/e801080f/attachment.htm>