Re: issues when bucket index deep-scrubbing

Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> · Mon, 21 Oct 2013 11:26:42 +0200

Hi,
Thanks for your response.

> That is definitely the obvious next step, but it's a non-trivial
> amount of work and hasn't yet been started on by anybody. This is
> probably a good subject for a CDS blueprint!
But we want to split our big bucket into the smallest ones. We want to
shard it before radosgw.
Do you think this is a good idea to make workaround of this problem
(big index issues)?

Regards
Dominik

2013/10/18 Gregory Farnum <greg@xxxxxxxxxxx>:
> On Fri, Oct 18, 2013 at 4:01 AM, Dominik Mostowiec
> <dominikmostowiec@xxxxxxxxx> wrote:
>> Hi,
>> I plan to shard my largest bucket because of issues of deep-scrubbing
>> (when PG which index for this bucket is stored on is deep-scrubbed, it
>> appears many slow requests and OSD grows in memory - after latest
>> scrub it grows up to 9G).
>>
>> I trying to found why large bucket index make issues when it is scrubbed.
>> On test cluster:
>> radosgw-admin bucket stats --bucket=test1-XX
>> { "bucket": "test1-XX",
>>   "pool": ".rgw.buckets",
>>   "index_pool": ".rgw.buckets",
>>   "id": "default.4211.2",
>> ...
>>
>> I guess index is in object .dir.default.4211.2. (pool: .rgw.buckets)
>>
>> rados -p .rgw.buckets get .dir.default.4211.2 -
>> <empty>
>>
>> But:
>> rados -p .rgw.buckets listomapkeys .dir.default.4211.2
>> test_file_2.txt
>> test_file_2_11.txt
>> test_file_3.txt
>> test_file_4.txt
>> test_file_5.txt
>>
>> I guess that list of files are stored in leveldb not in one large file.
>> 'omap' files are stored in {osd_dir}/current/omap/, the largest file
>> that i found in this directory (on production) have 8.8M.
>>
>> I'm a little confused.
>>
>> How list of files (for bucket) is stored?
>
> The index is stored as a bunch of omap entries in a single object.
>
>> If list of objects in bucket is splitted on many small files in
>> leveldb that large bucket (with many files) should not cause larger
>> latency in PUT new object.
>
> That's not quite how it works. Leveldb has a custom storage format in
> which it stores sets of keys based on both time of update and the
> value of the key, so the size of the individual files in its directory
> has no correlation to the number or size of any given set of entries.
>
>> Scrubbing also should not be a problem i think ...
>
> The problem you're running into is that scrubbing is done on an
> object-by-object basis, and so the OSD is reading all of the keys
> associated with that object out of leveldb, and processing them, at
> once. This number can be very much larger than the 8MB file you've
> found in the leveldb directory, as discussed above.
>
>> What you think about using a sharding to split big buckets into the
>> smalest one to avoid the problems with big indexes?
>
> That is definitely the obvious next step, but it's a non-trivial
> amount of work and hasn't yet been started on by anybody. This is
> probably a good subject for a CDS blueprint!
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html