Re: radosgw performance after 0.37

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Mon, 21 Nov 2011 10:55:18 -0800



On Sun, Nov 20, 2011 at 3:03 PM, Yehuda Sadeh Weinraub
<yehudasa@xxxxxxxxx> wrote:
> On Sun, Nov 20, 2011 at 9:49 AM, Leander Yu <leander.yu@xxxxxxxxx> wrote:
>> Hi all,
>> I found that after 0.37 the radosgw have some fundamental changes
>> which put all object in to .rgw.buckets pool.
>> From the release note it seem for better scalability however I wonder
>> how this changes will improve the scalability? Base on our test, a
>> simple list bucket command by s3cmd will take more than 10 sec when
>> the file number is bigger than 10k. is this normal? or it's a
>> potential bug?
>
> The scaling issue that was solved was the ability to increase the
> number of buckets, whereas you're hitting a different issue now that
> relates to the number of objects per bucket. The problem is with the
> inefficient implementation of the rados tmap (trivial map) that
> requires that every read/write from the directory index requires
> reading the entire object, which is not too scalable. We are going to
> replace tmap with a not-so-trivial-map that would scale much better
> (feature #1571 in the ceph tracker, currently planned for 0.39).
>
> I verified that this is in fact the issue. The problem with listing
> object using s3cmd is that it requests the data in chunks of 1000,
> which means that going through 10k objects requires that the entire
> directory is being read of disk (on the osd side) 10 times.

I wouldn't expect this to be so slow, though — presumably the
directory object is in cache so all it's doing is some memory copies
after the first read off disk?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html