On Sun, Nov 20, 2011 at 9:49 AM, Leander Yu <leander.yu@xxxxxxxxx> wrote: > Hi all, > I found that after 0.37 the radosgw have some fundamental changes > which put all object in to .rgw.buckets pool. > From the release note it seem for better scalability however I wonder > how this changes will improve the scalability? Base on our test, a > simple list bucket command by s3cmd will take more than 10 sec when > the file number is bigger than 10k. is this normal? or it's a > potential bug? The scaling issue that was solved was the ability to increase the number of buckets, whereas you're hitting a different issue now that relates to the number of objects per bucket. The problem is with the inefficient implementation of the rados tmap (trivial map) that requires that every read/write from the directory index requires reading the entire object, which is not too scalable. We are going to replace tmap with a not-so-trivial-map that would scale much better (feature #1571 in the ceph tracker, currently planned for 0.39). I verified that this is in fact the issue. The problem with listing object using s3cmd is that it requests the data in chunks of 1000, which means that going through 10k objects requires that the entire directory is being read of disk (on the osd side) 10 times. > > I haven't fully understand the radosgw code but it seems when you list > a bucket, it has to list all object in .rgw.buckets and filter out > those object not belong to the bucket id? if my understanding is > correct then it make sense to have 10 sec for listing a bucket from > s3cmd since "rados -p .rgw.bucket ls" took about 7 sec in my case. I think you misread it. The old implementation did have to list the entire pool and then filtered out the result, but there was a 1:1 mapping between pools and buckets. The new implementation issues a rgw class operation that reads the directory index. Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html