On Wednesday, September 10, 2014, Daniel Schneller < daniel.schneller at centerdevice.com> wrote: > On 09 Sep 2014, at 21:43, Gregory Farnum <greg at inktank.com > <javascript:_e(%7B%7D,'cvml','greg at inktank.com');>> wrote: > > > Yehuda can talk about this with more expertise than I can, but I think > it should be basically fine. By creating so many buckets you're > decreasing the effectiveness of RGW's metadata caching, which means > > the initial lookup in a particular bucket might take longer. > > > Thanks for your thoughts. With ?initial lookup in a particular bucket? > do you mean accessing any of the objects in a bucket? If we directly > access the object (not enumerating the buckets content), would that > still be an issue? > Just trying to understand the inner workings a bit better to make > more educated guesses :) > When doing an object lookup, the gateway combines the "bucket ID" with a mangled version of the object name to try and do a read out of RADOS. It first needs to get that bucket ID though -- it will cache an the bucket name->ID mapping, but if you have a ton of buckets there could be enough entries to degrade the cache's effectiveness. (So, you're more likely to pay that extra disk access lookup.) > > > The big concern is that we do maintain a per-user list of all their > buckets ? which is stored in a single RADOS object ? so if you have an > extreme number of buckets that RADOS object could get pretty big and > become a bottleneck when creating/removing/listing the buckets. You > > > Alright. Listing buckets is no problem, that we don?t do. Can you > say what ?pretty big? would be in terms of MB? How much space does a > bucket record consume in there? Based on that I could run a few numbers. > Uh, a kilobyte per bucket? You could look it up in the source (I'm on my phone) but I *believe* the bucket name is allowed to be larger than the rest combined... More particularly, though, if you've got a single user uploading documents, each creating a new bucket, then those bucket creates are going to serialize on this one object. -Greg > > > should run your own experiments to figure out what the limits are > there; perhaps you have an easy way of sharding up documents into > different users. > > > Good advice. We can do that per distributor (an org unit in our > software) to at least compartmentalize any potential locking issues > in this area to that single entity. Still, there would be quite > a lot of buckets/objects per distributor, so some more detail on > the above items would be great. > > Thanks a lot! > > > Daniel > -- Software Engineer #42 @ http://inktank.com | http://ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140910/e801080f/attachment.htm>