Re: Stat speed for objects in ceph

Iain Buclaw <ibuclaw@xxxxxxxxx> · Wed, 21 Sep 2016 17:05:05 +0200

On 21 September 2016 at 02:57, Haomai Wang <haomai@xxxxxxxx> wrote:
> On Wed, Sep 21, 2016 at 2:41 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>>
>>> Op 20 september 2016 om 20:30 schreef Haomai Wang <haomai@xxxxxxxx>:
>>>
>>>
>>> On Wed, Sep 21, 2016 at 2:26 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>>> >
>>> >> Op 20 september 2016 om 19:27 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
>>> >>
>>> >>
>>> >> In librados getting a stat is basically equivalent to reading a small
>>> >> object; there's not an index or anything so FileStore needs to descend its
>>> >> folder hierarchy. If looking at metadata for all the objects in the system
>>> >> efficiently is important you'll want to layer an index in somewhere.
>>> >> -Greg
>>> >>
>>> >
>>> > Should we expect a improvement here with BlueStore vs FileStore? That would basically be a RocksDB lookup on the OSD, right?
>>>
>>> Yes, bluestore will be much better since it has indexed on Onode(like
>>> inode) in rocksdb. Although it's fast enough, it also cost some on
>>> construct object, if you only want to check object existence, we may
>>> need a more lightweight interface
>>>
>>
>> It's rados_stat() which would be called, that is the way to check if a object exists. If I remember the BlueStore architecture correctly it would be a lookup in RocksDB with all the information in there.
>
> Exactly, but compared to database query, this lookup is still heavy.
> Each onode construct need to get lots of keys and do inline construct.
> Of course, it's a cheaper one in all rados interfaces.
>

>From some preliminary tests, I've noted that BlueStore is exceedingly
quicker doing millions of random small file IO compared to FileStore.
But this is only with around 1/25th of the data we are holding.

So having an index pool is the only way to get faster lookup speeds?
I don't think having one is really for my use case, with billions of
objects being held, I don't think maintaining such an index would be
any quicker than what rados_stat() is capable of achieving already.

In any case, these clients maintain and validate the data that's
stored, it would inherently assume that any index is wrong.

--
Iain Buclaw
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com