Re: issues when bucket index deep-scrubbing

Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> · Mon, 21 Oct 2013 23:00:52 +0200

> You shouldn't run into any issues except the scrubbing on a large index object.
Great !!

> There's not a great way to get around that right now; sorry. :(
Ok.

Thanks for Your help.

--
Regards
Dominik

2013/10/21 Gregory Farnum <greg@xxxxxxxxxxx>:
> You shouldn't run into any issues except the scrubbing on a large
> index object. There's not a great way to get around that right now;
> sorry. :(
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Mon, Oct 21, 2013 at 1:44 PM, Dominik Mostowiec
> <dominikmostowiec@xxxxxxxxx> wrote:
>> Hi,
>> Thanks, for now i'm sure what to do.
>>
>> Maybe there is another way ( except turning off deep-scrubbing) to
>> avoid issues caused by large indexes?
>>
>> Now we have ~15m bojects in the largest bucket.
>> In the short term(after sharding) we want to put there 100m object more.
>> Are there any other limitations in ceph that can affect us?
>>
>> --
>> Regards
>> Dominik
>>
>>
>> 2013/10/21 Gregory Farnum <greg@xxxxxxxxxxx>:
>>> On Mon, Oct 21, 2013 at 2:26 AM, Dominik Mostowiec
>>> <dominikmostowiec@xxxxxxxxx> wrote:
>>>> Hi,
>>>> Thanks for your response.
>>>>
>>>>> That is definitely the obvious next step, but it's a non-trivial
>>>>> amount of work and hasn't yet been started on by anybody. This is
>>>>> probably a good subject for a CDS blueprint!
>>>> But we want to split our big bucket into the smallest ones. We want to
>>>> shard it before radosgw.
>>>> Do you think this is a good idea to make workaround of this problem
>>>> (big index issues)?
>>>
>>> Oh, yes, this is a good workaround.
>>> Sorry, I misread your initial post and thought you were discussing
>>> sharding the bucket index itself, rather than sharding across buckets
>>> in the application. :)
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>>>
>>>> Regards
>>>> Dominik
>>>>
>>>>
>>>>
>>>> 2013/10/18 Gregory Farnum <greg@xxxxxxxxxxx>:
>>>>> On Fri, Oct 18, 2013 at 4:01 AM, Dominik Mostowiec
>>>>> <dominikmostowiec@xxxxxxxxx> wrote:
>>>>>> Hi,
>>>>>> I plan to shard my largest bucket because of issues of deep-scrubbing
>>>>>> (when PG which index for this bucket is stored on is deep-scrubbed, it
>>>>>> appears many slow requests and OSD grows in memory - after latest
>>>>>> scrub it grows up to 9G).
>>>>>>
>>>>>> I trying to found why large bucket index make issues when it is scrubbed.
>>>>>> On test cluster:
>>>>>> radosgw-admin bucket stats --bucket=test1-XX
>>>>>> { "bucket": "test1-XX",
>>>>>>   "pool": ".rgw.buckets",
>>>>>>   "index_pool": ".rgw.buckets",
>>>>>>   "id": "default.4211.2",
>>>>>> ...
>>>>>>
>>>>>> I guess index is in object .dir.default.4211.2. (pool: .rgw.buckets)
>>>>>>
>>>>>> rados -p .rgw.buckets get .dir.default.4211.2 -
>>>>>> <empty>
>>>>>>
>>>>>> But:
>>>>>> rados -p .rgw.buckets listomapkeys .dir.default.4211.2
>>>>>> test_file_2.txt
>>>>>> test_file_2_11.txt
>>>>>> test_file_3.txt
>>>>>> test_file_4.txt
>>>>>> test_file_5.txt
>>>>>>
>>>>>> I guess that list of files are stored in leveldb not in one large file.
>>>>>> 'omap' files are stored in {osd_dir}/current/omap/, the largest file
>>>>>> that i found in this directory (on production) have 8.8M.
>>>>>>
>>>>>> I'm a little confused.
>>>>>>
>>>>>> How list of files (for bucket) is stored?
>>>>>
>>>>> The index is stored as a bunch of omap entries in a single object.
>>>>>
>>>>>> If list of objects in bucket is splitted on many small files in
>>>>>> leveldb that large bucket (with many files) should not cause larger
>>>>>> latency in PUT new object.
>>>>>
>>>>> That's not quite how it works. Leveldb has a custom storage format in
>>>>> which it stores sets of keys based on both time of update and the
>>>>> value of the key, so the size of the individual files in its directory
>>>>> has no correlation to the number or size of any given set of entries.
>>>>>
>>>>>> Scrubbing also should not be a problem i think ...
>>>>>
>>>>> The problem you're running into is that scrubbing is done on an
>>>>> object-by-object basis, and so the OSD is reading all of the keys
>>>>> associated with that object out of leveldb, and processing them, at
>>>>> once. This number can be very much larger than the 8MB file you've
>>>>> found in the leveldb directory, as discussed above.
>>>>>
>>>>>> What you think about using a sharding to split big buckets into the
>>>>>> smalest one to avoid the problems with big indexes?
>>>>>
>>>>> That is definitely the obvious next step, but it's a non-trivial
>>>>> amount of work and hasn't yet been started on by anybody. This is
>>>>> probably a good subject for a CDS blueprint!
>>>>> -Greg
>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>
>>>>
>>>>
>>>> --
>>>> Pozdrawiam
>>>> Dominik
>>
>>
>>
>> --
>> Pozdrawiam
>> Dominik

-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html