Re: radosgw bucket listing (s3 ls s3://$bucketname) slow with ~2 billion objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The main problem with efficiently listing many-sharded buckets is the requirement to provide entries in sorted order. This means that each http request has to fetch ~1000 entries from every shard, combine them into a sorted order, and throw out the leftovers. The next request to continue the listing will advance its position slightly, but still end up fetching many of the same entries from each shard. As the number of shards increases, the more these shard listings will overlap, and the performance falls off.

Eric Ivancich recently added s3 and swift extensions for unordered bucket listing in https://github.com/ceph/ceph/pull/21026 (for mimic). That allows radosgw to list each shard separately, and avoid the step that throws away extra entries. If your application can tolerate unsorted listings, that could be a big help without having to resort to indexless buckets.


On 05/01/2018 11:09 AM, Robert Stanford wrote:

 I second the indexless bucket suggestion.  The downside being that you can't use bucket policies like object expiration in that case.

On Tue, May 1, 2018 at 10:02 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
Any time using shared storage like S3 or cephfs/nfs/gluster/etc the absolute rule that I refuse to break is to never rely on a directory listing to know where objects/files are.  You should be maintaining a database of some sort or a deterministic naming scheme.  The only time a full listing of a directory should be required is if you feel like your tooling is orphaning files and you want to clean them up.  If I had someone with a bucket with 2B objects, I would force them to use an index-less bucket.

That's me, though.  I'm sure there are ways to manage a bucket in other ways, but it sounds awful.

On Tue, May 1, 2018 at 10:10 AM Robert Stanford <rstanford8896@xxxxxxxxx> wrote:

 Listing will always take forever when using a high shard number, AFAIK.  That's the tradeoff for sharding.  Are those 2B objects in one bucket?  How's your read and write performance compared to a bucket with a lower number (thousands) of objects, with that shard number?

On Tue, May 1, 2018 at 7:59 AM, Katie Holly <8ld3jg4d@xxxxxx> wrote:
One of our radosgw buckets has grown a lot in size, `rgw bucket stats --bucket $bucketname` reports a total of 2,110,269,538 objects with the bucket index sharded across 32768 shards, listing the root context of the bucket with `s3 ls s3://$bucketname` takes more than an hour which is the hard limit to first-byte on our nginx reverse proxy and the aws-cli times out long before that timeout limit is hit.

The software we use supports sharding the data across multiple s3 buckets but before I go ahead and enable this, has anyone ever had that many objects in a single RGW bucket and can let me know how you solved the problem of RGW taking a long time to read the full index?

--
Best regards

Katie Holly
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux