On 21 September 2016 at 02:57, Haomai Wang <haomai@xxxxxxxx> wrote: > On Wed, Sep 21, 2016 at 2:41 AM, Wido den Hollander <wido@xxxxxxxx> wrote: >> >>> Op 20 september 2016 om 20:30 schreef Haomai Wang <haomai@xxxxxxxx>: >>> >>> >>> On Wed, Sep 21, 2016 at 2:26 AM, Wido den Hollander <wido@xxxxxxxx> wrote: >>> > >>> >> Op 20 september 2016 om 19:27 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>: >>> >> >>> >> >>> >> In librados getting a stat is basically equivalent to reading a small >>> >> object; there's not an index or anything so FileStore needs to descend its >>> >> folder hierarchy. If looking at metadata for all the objects in the system >>> >> efficiently is important you'll want to layer an index in somewhere. >>> >> -Greg >>> >> >>> > >>> > Should we expect a improvement here with BlueStore vs FileStore? That would basically be a RocksDB lookup on the OSD, right? >>> >>> Yes, bluestore will be much better since it has indexed on Onode(like >>> inode) in rocksdb. Although it's fast enough, it also cost some on >>> construct object, if you only want to check object existence, we may >>> need a more lightweight interface >>> >> >> It's rados_stat() which would be called, that is the way to check if a object exists. If I remember the BlueStore architecture correctly it would be a lookup in RocksDB with all the information in there. > > Exactly, but compared to database query, this lookup is still heavy. > Each onode construct need to get lots of keys and do inline construct. > Of course, it's a cheaper one in all rados interfaces. > >From some preliminary tests, I've noted that BlueStore is exceedingly quicker doing millions of random small file IO compared to FileStore. But this is only with around 1/25th of the data we are holding. So having an index pool is the only way to get faster lookup speeds? I don't think having one is really for my use case, with billions of objects being held, I don't think maintaining such an index would be any quicker than what rados_stat() is capable of achieving already. In any case, these clients maintain and validate the data that's stored, it would inherently assume that any index is wrong. -- Iain Buclaw _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com