On Mon, Oct 21, 2013 at 2:26 AM, Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> wrote: > Hi, > Thanks for your response. > >> That is definitely the obvious next step, but it's a non-trivial >> amount of work and hasn't yet been started on by anybody. This is >> probably a good subject for a CDS blueprint! > But we want to split our big bucket into the smallest ones. We want to > shard it before radosgw. > Do you think this is a good idea to make workaround of this problem > (big index issues)? Oh, yes, this is a good workaround. Sorry, I misread your initial post and thought you were discussing sharding the bucket index itself, rather than sharding across buckets in the application. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > Regards > Dominik > > > > 2013/10/18 Gregory Farnum <greg@xxxxxxxxxxx>: >> On Fri, Oct 18, 2013 at 4:01 AM, Dominik Mostowiec >> <dominikmostowiec@xxxxxxxxx> wrote: >>> Hi, >>> I plan to shard my largest bucket because of issues of deep-scrubbing >>> (when PG which index for this bucket is stored on is deep-scrubbed, it >>> appears many slow requests and OSD grows in memory - after latest >>> scrub it grows up to 9G). >>> >>> I trying to found why large bucket index make issues when it is scrubbed. >>> On test cluster: >>> radosgw-admin bucket stats --bucket=test1-XX >>> { "bucket": "test1-XX", >>> "pool": ".rgw.buckets", >>> "index_pool": ".rgw.buckets", >>> "id": "default.4211.2", >>> ... >>> >>> I guess index is in object .dir.default.4211.2. (pool: .rgw.buckets) >>> >>> rados -p .rgw.buckets get .dir.default.4211.2 - >>> <empty> >>> >>> But: >>> rados -p .rgw.buckets listomapkeys .dir.default.4211.2 >>> test_file_2.txt >>> test_file_2_11.txt >>> test_file_3.txt >>> test_file_4.txt >>> test_file_5.txt >>> >>> I guess that list of files are stored in leveldb not in one large file. >>> 'omap' files are stored in {osd_dir}/current/omap/, the largest file >>> that i found in this directory (on production) have 8.8M. >>> >>> I'm a little confused. >>> >>> How list of files (for bucket) is stored? >> >> The index is stored as a bunch of omap entries in a single object. >> >>> If list of objects in bucket is splitted on many small files in >>> leveldb that large bucket (with many files) should not cause larger >>> latency in PUT new object. >> >> That's not quite how it works. Leveldb has a custom storage format in >> which it stores sets of keys based on both time of update and the >> value of the key, so the size of the individual files in its directory >> has no correlation to the number or size of any given set of entries. >> >>> Scrubbing also should not be a problem i think ... >> >> The problem you're running into is that scrubbing is done on an >> object-by-object basis, and so the OSD is reading all of the keys >> associated with that object out of leveldb, and processing them, at >> once. This number can be very much larger than the 8MB file you've >> found in the leveldb directory, as discussed above. >> >>> What you think about using a sharding to split big buckets into the >>> smalest one to avoid the problems with big indexes? >> >> That is definitely the obvious next step, but it's a non-trivial >> amount of work and hasn't yet been started on by anybody. This is >> probably a good subject for a CDS blueprint! >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > -- > Pozdrawiam > Dominik -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html