Re: Stat speed for objects in ceph

Wido den Hollander <wido@xxxxxxxx> · Wed, 21 Sep 2016 17:29:47 +0200 (CEST)

> Op 21 september 2016 om 17:23 schreef Iain Buclaw <ibuclaw@xxxxxxxxx>:
> 
> 
> On 20 September 2016 at 19:27, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > In librados getting a stat is basically equivalent to reading a small
> > object; there's not an index or anything so FileStore needs to descend its
> > folder hierarchy. If looking at metadata for all the objects in the system
> > efficiently is important you'll want to layer an index in somewhere.
> > -Greg
> >
> 
> Yeah, that's not particularly good to hear.  Is this slowness also
> inherent in list_nobjects too?  It looks like I can iterate all
> objects at a rate no faster than 25K per second.  No chance at
> speeding this up either by having two or more instances starting at
> different pg offsets.
> 

RADOS has no index of objects. Everything is done using calculation. So when listing objects you basically have to go to each primary OSD for all PGs in a pool and ask them what objects are in the pool/PG.

> For this particular operation, it's only looking for orphaned objects.
> This wouldn't be needed if a mechanism for TTLs existed and set on all
> objects.  But that would mean finding out how RGW gets away with it,
> and I assume with another very large index and actively keeping track
> of all set destruction times.
> 

RGW indeed keeps a index, but that limits it in the amount of objects you can store in a single bucket. Yes, bucket sharding helps, but the limit for a RGW bucket is still way lower then for a RADOS pool.

Wido

> -- 
> Iain Buclaw
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com