Re: Stat speed for objects in ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 21, 2016 at 2:41 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
>> Op 20 september 2016 om 20:30 schreef Haomai Wang <haomai@xxxxxxxx>:
>>
>>
>> On Wed, Sep 21, 2016 at 2:26 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>> >
>> >> Op 20 september 2016 om 19:27 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
>> >>
>> >>
>> >> In librados getting a stat is basically equivalent to reading a small
>> >> object; there's not an index or anything so FileStore needs to descend its
>> >> folder hierarchy. If looking at metadata for all the objects in the system
>> >> efficiently is important you'll want to layer an index in somewhere.
>> >> -Greg
>> >>
>> >
>> > Should we expect a improvement here with BlueStore vs FileStore? That would basically be a RocksDB lookup on the OSD, right?
>>
>> Yes, bluestore will be much better since it has indexed on Onode(like
>> inode) in rocksdb. Although it's fast enough, it also cost some on
>> construct object, if you only want to check object existence, we may
>> need a more lightweight interface
>>
>
> It's rados_stat() which would be called, that is the way to check if a object exists. If I remember the BlueStore architecture correctly it would be a lookup in RocksDB with all the information in there.

Exactly, but compared to database query, this lookup is still heavy.
Each onode construct need to get lots of keys and do inline construct.
Of course, it's a cheaper one in all rados interfaces.

>
> Wido
>
>> >
>> > Wido
>> >
>> >> On Tuesday, September 20, 2016, Iain Buclaw <ibuclaw@xxxxxxxxx> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > As a general observation, the speed of calling stat() on any object in
>> >> > ceph is relatively slow.  I'm probably getting a rate of about 10K per
>> >> > second using AIO, and even then it is really *really* bursty, to the
>> >> > point where there could be 5 seconds of activity going in one
>> >> > direction, then the callback thread wakes up and processes all queued
>> >> > completions in a single blast.
>> >> >
>> >> > At our current rate with more than 1 billion objects in a pool, it's
>> >> > looking like if I was to check the existence of every object, that it
>> >> > would take around 19-24 hours to complete.
>> >> >
>> >> > Granted that our starting point before beginning some migrations to
>> >> > Ceph was around 1 hour to check the existence of every object, this is
>> >> > something of a concern.  Are there any ways via librados to improve
>> >> > the throughput of processing objects?
>> >> >
>> >> > Adding more instances or sharding work doesn't seem to increase the
>> >> > overall throughput at all.  And cache won't help either, there is no
>> >> > determinism in what's accessed, and given the size of the pool OS
>> >> > filesystem cache is useless anyway.
>> >> >
>> >> > Thanks,
>> >> > --
>> >> > Iain Buclaw
>> >> >
>> >> > *(p < e ? p++ : p) = (c & 0x0f) + '0';
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@xxxxxxxxxxxxxx <javascript:;>
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux