Re: Efficiently scan all objects in a rados pool/namespace

Dong Yuan <yuandong1222@xxxxxxxxx> · Thu, 17 Oct 2013 13:06:57 +0800

On 17 October 2013 12:25, Amit Tiwary <tiwaryamt@xxxxxxxxx> wrote:
> Sage Weil <sage <at> inktank.com> writes:
>
>>
>> On Wed, 16 Oct 2013, Amit Tiwary wrote:
>> > We are using ceph version 0.56.6, librados C++ APIs and have more than
> 750
>> > million objects in a single pool. Objects are named as "domain-
> name_file-
>> > name".
>> >
>> > We are unable to ascertain in what order objects are listed with the
> command
>> > "rados -p poolname ls". They are neither ordered on objectname, nor size
> or
>> > mtime.
>> >
>> > Q1) Is there any way we can control the way objects are scanned/listed
> in a
>> > pool with the below librados c++ code? We are interested in getting list
> of
>> > objects sorted or grouped by object name
>> >     librados::ObjectIterator it = ioctx.objects_begin();
>> >     for (; it != ioctx.objects_end(); ++it)
>> >         ...
>>
>> They are ordered by hash(object name).
>>
>> > Q2) In near future, if we upgrade and make use of namespaces (i.e make
>> > domain-name as namespace and store all objects of a particular domain in
> that
>> > namespace); would scanning of objects in a namespace be efficient than
>> > current scenario where we have to scan the entire pool to fetch all
> objects?
>>
>> The namespaces do not improve object listing efficiency; it is still
>> O(size of the pool).
>>
>> > Q3) Do you any other recommendations on top of your mind that can
> improve
>> > time required to scan all objects of pool/namespace?
>>
>> If you need efficient queries by name prefix (or whatever else) you need
>> to maintain some sort of seperate index.  Radosgw does this for the S3
>> interface by using key/value objects for each bucket.  The kvstore class
>> implements a btree on top of such objects to provide improved scalability.
>>
>> Hope that helps!
>> sage
>
>
> Thanks Yuan and Sage. I feel namespace level stats would definitely be a
> good thing to have in future releases.
>
> I have been lately following threads regarding optimal number of pools and I
> understand that increasing the number of pools increases memory footprint
> and can become a bottleneck.
> Does number of objects or amount of data in each pool (i.e if number of
> objects or data is highly skewed in different pools) has any impact on
> performance of ceph.

In my opinion , there is no impact.

By default crush map, the pool distributes over all osds..

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Dong Yuan
Email:yuandong1222@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html