On Sun, Nov 20, 2011 at 3:03 PM, Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx> wrote: > On Sun, Nov 20, 2011 at 9:49 AM, Leander Yu <leander.yu@xxxxxxxxx> wrote: >> Hi all, >> I found that after 0.37 the radosgw have some fundamental changes >> which put all object in to .rgw.buckets pool. >> From the release note it seem for better scalability however I wonder >> how this changes will improve the scalability? Base on our test, a >> simple list bucket command by s3cmd will take more than 10 sec when >> the file number is bigger than 10k. is this normal? or it's a >> potential bug? > > The scaling issue that was solved was the ability to increase the > number of buckets, whereas you're hitting a different issue now that > relates to the number of objects per bucket. The problem is with the > inefficient implementation of the rados tmap (trivial map) that > requires that every read/write from the directory index requires > reading the entire object, which is not too scalable. We are going to > replace tmap with a not-so-trivial-map that would scale much better > (feature #1571 in the ceph tracker, currently planned for 0.39). > > I verified that this is in fact the issue. The problem with listing > object using s3cmd is that it requests the data in chunks of 1000, > which means that going through 10k objects requires that the entire > directory is being read of disk (on the osd side) 10 times. I wouldn't expect this to be so slow, though — presumably the directory object is in cache so all it's doing is some memory copies after the first read off disk? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html