It'd be interesting to see which rados operation is slowing down the requests. Can you provide a log dump of a request (with 'debug rgw = 20', and 'debug ms = 1'). This might give us a better idea as to what's going on. Thanks, Yehuda On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: > Hi again! > > We have done some tests regarding the limits of storing lots and > lots of buckets through Rados Gateway into Ceph. > > Our test used a single user for which we removed the default max > buckets limit. It then continuously created containers - both empty > and such with 10 objects of around 100k random data in them. > > With 3 parallel processes we saw relatively consistent time of > about 500-700ms per such container. > > This kept steady until we reached approx. 3 million containers > after which the time per insert sharply went up to currently > around 1600ms and rising. Due to some hiccups with network > equipment the tests were aborted a few times, but then resumed without > deleting any of the previous runs created containers, so the actual > number might be 2.8 or 3.2 million, but still in that ballpark. > We aborted the test here. > > Judging by the advice given earlier (see quoted mail below) that > we might hit a limit on some per-user data structures, we created > another user account, removed its max-bucket limit as well and > restarted the benchmark with that one, _expecting_ the times to be > down to the original range of 500-700ms. > > However, what we are seeing is that the times stay at the 1600ms > and higher levels even for that fresh account. > > Here is the output of `rados df`, reformatted to fit the email. > clones, degraded and unfound were 0 in all cases and have been > left out for clarity: > > .rgw > ========================= > KB: 1,966,932 > objects: 9,094,552 > rd: 195,747,645 > rd KB: 153,585,472 > wr: 30,191,844 > wr KB: 10,751,065 > > .rgw.buckets > ========================= > KB: 2,038,313,855 > objects: 22,088,103 > rd: 5,455,123 > rd KB: 408,416,317 > wr: 149,377,728 > wr KB: 1,882,517,472 > > .rgw.buckets.index > ========================= > KB: 0 > objects: 5,374,376 > rd: 267,996,778 > rd KB: 262,626,106 > wr: 107,142,891 > wr KB: 0 > > .rgw.control > ========================= > KB: 0 > objects: 8 > rd: 0 > rd KB: 0 > wr: 0 > wr KB: 0 > > .rgw.gc > ========================= > KB: 0 > objects: 32 > rd: 5,554,407 > rd KB: 5,713,942 > wr: 8,355,934 > wr KB: 0 > > .rgw.root > ========================= > KB: 1 > objects: 3 > rd: 524 > rd KB: 346 > wr: 3 > wr KB: 3 > > > We would very much like to understand what is going on here > in order to decide if Rados Gateway is a viable option to base > our production system on (where we expect similar counts > as in the benchmark), or if we need to investigate using librados > directly which we would like to avoid if possible. > > Any advice on what configuration parameters to check or > which additional information to provide to analyze this would be > very much welcome. > > Cheers, > Daniel > > > -- > Daniel Schneller > Mobile Development Lead > > CenterDevice GmbH | Merscheider Straße 1 > | 42699 Solingen > tel: +49 1754155711 | Deutschland > daniel.schneller@xxxxxxxxxxxxxxxx | www.centerdevice.com > > > > > On 10 Sep 2014, at 19:42, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > > On Wednesday, September 10, 2014, Daniel Schneller > <daniel.schneller@xxxxxxxxxxxxxxxx> wrote: >> >> On 09 Sep 2014, at 21:43, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> >> Yehuda can talk about this with more expertise than I can, but I think >> it should be basically fine. By creating so many buckets you're >> decreasing the effectiveness of RGW's metadata caching, which means >> >> the initial lookup in a particular bucket might take longer. >> >> >> Thanks for your thoughts. With “initial lookup in a particular bucket” >> do you mean accessing any of the objects in a bucket? If we directly >> access the object (not enumerating the buckets content), would that >> still be an issue? >> Just trying to understand the inner workings a bit better to make >> more educated guesses :) > > > When doing an object lookup, the gateway combines the "bucket ID" with a > mangled version of the object name to try and do a read out of RADOS. It > first needs to get that bucket ID though -- it will cache an the bucket > name->ID mapping, but if you have a ton of buckets there could be enough > entries to degrade the cache's effectiveness. (So, you're more likely to pay > that extra disk access lookup.) > >> >> >> >> The big concern is that we do maintain a per-user list of all their >> buckets — which is stored in a single RADOS object — so if you have an >> extreme number of buckets that RADOS object could get pretty big and >> become a bottleneck when creating/removing/listing the buckets. You >> >> >> Alright. Listing buckets is no problem, that we don’t do. Can you >> say what “pretty big” would be in terms of MB? How much space does a >> bucket record consume in there? Based on that I could run a few numbers. > > > Uh, a kilobyte per bucket? You could look it up in the source (I'm on my > phone) but I *believe* the bucket name is allowed to be larger than the rest > combined... > More particularly, though, if you've got a single user uploading documents, > each creating a new bucket, then those bucket creates are going to serialize > on this one object. > -Greg > >> >> >> >> should run your own experiments to figure out what the limits are >> there; perhaps you have an easy way of sharding up documents into >> different users. >> >> >> Good advice. We can do that per distributor (an org unit in our >> software) to at least compartmentalize any potential locking issues >> in this area to that single entity. Still, there would be quite >> a lot of buckets/objects per distributor, so some more detail on >> the above items would be great. >> >> Thanks a lot! >> >> >> Daniel > > > > -- > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com