Re: Stat speed for objects in ceph

Wido den Hollander <wido@xxxxxxxx> · Tue, 20 Sep 2016 20:41:43 +0200 (CEST)

> Op 20 september 2016 om 20:30 schreef Haomai Wang <haomai@xxxxxxxx>:
> 
> 
> On Wed, Sep 21, 2016 at 2:26 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> >
> >> Op 20 september 2016 om 19:27 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
> >>
> >>
> >> In librados getting a stat is basically equivalent to reading a small
> >> object; there's not an index or anything so FileStore needs to descend its
> >> folder hierarchy. If looking at metadata for all the objects in the system
> >> efficiently is important you'll want to layer an index in somewhere.
> >> -Greg
> >>
> >
> > Should we expect a improvement here with BlueStore vs FileStore? That would basically be a RocksDB lookup on the OSD, right?
> 
> Yes, bluestore will be much better since it has indexed on Onode(like
> inode) in rocksdb. Although it's fast enough, it also cost some on
> construct object, if you only want to check object existence, we may
> need a more lightweight interface
> 

It's rados_stat() which would be called, that is the way to check if a object exists. If I remember the BlueStore architecture correctly it would be a lookup in RocksDB with all the information in there.

Wido

> >
> > Wido
> >
> >> On Tuesday, September 20, 2016, Iain Buclaw <ibuclaw@xxxxxxxxx> wrote:
> >>
> >> > Hi,
> >> >
> >> > As a general observation, the speed of calling stat() on any object in
> >> > ceph is relatively slow.  I'm probably getting a rate of about 10K per
> >> > second using AIO, and even then it is really *really* bursty, to the
> >> > point where there could be 5 seconds of activity going in one
> >> > direction, then the callback thread wakes up and processes all queued
> >> > completions in a single blast.
> >> >
> >> > At our current rate with more than 1 billion objects in a pool, it's
> >> > looking like if I was to check the existence of every object, that it
> >> > would take around 19-24 hours to complete.
> >> >
> >> > Granted that our starting point before beginning some migrations to
> >> > Ceph was around 1 hour to check the existence of every object, this is
> >> > something of a concern.  Are there any ways via librados to improve
> >> > the throughput of processing objects?
> >> >
> >> > Adding more instances or sharding work doesn't seem to increase the
> >> > overall throughput at all.  And cache won't help either, there is no
> >> > determinism in what's accessed, and given the size of the pool OS
> >> > filesystem cache is useless anyway.
> >> >
> >> > Thanks,
> >> > --
> >> > Iain Buclaw
> >> >
> >> > *(p < e ? p++ : p) = (c & 0x0f) + '0';
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx <javascript:;>
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com