Re: radosgw bucket listing (s3 ls s3://$bucketname) slow with ~2 billion objects

Casey Bodley <cbodley@xxxxxxxxxx> · Tue, 1 May 2018 12:45:00 -0400



    The main problem with efficiently listing many-sharded buckets is
      the requirement to provide entries in sorted order. This means
      that each http request has to fetch ~1000 entries from every
      shard, combine them into a sorted order, and throw out the
      leftovers. The next request to continue the listing will advance
      its position slightly, but still end up fetching many of the same
      entries from each shard. As the number of shards increases, the
      more these shard listings will overlap, and the performance falls
      off.

    
    Eric Ivancich recently added s3 and swift extensions for
      unordered bucket listing in
      https://github.com/ceph/ceph/pull/21026 (for mimic). That allows
      radosgw to list each shard separately, and avoid the step that
      throws away extra entries. If your application can tolerate
      unsorted listings, that could be a big help without having to
      resort to indexless buckets.

    
    On 05/01/2018 11:09 AM, Robert Stanford
      wrote:

    
         I second the indexless bucket suggestion.  The downside being
        that you can't use bucket policies like object expiration in
        that case.

      
        On Tue, May 1, 2018 at 10:02 AM, David
          Turner <drakonstein@xxxxxxxxx>
          wrote:

          
            Any time using shared storage like S3 or
              cephfs/nfs/gluster/etc the absolute rule that I refuse to
              break is to never rely on a directory listing to know
              where objects/files are.  You should be maintaining a
              database of some sort or a deterministic naming scheme. 
              The only time a full listing of a directory should be
              required is if you feel like your tooling is orphaning
              files and you want to clean them up.  If I had someone
              with a bucket with 2B objects, I would force them to use
              an index-less bucket.
              

              That's me, though.  I'm sure there are ways to manage
                a bucket in other ways, but it sounds awful.
            
            
                  On Tue, May 1, 2018 at 10:10 AM Robert
                    Stanford <rstanford8896@xxxxxxxxx>
                    wrote:

                  
                       Listing will always take forever when using a
                      high shard number, AFAIK.  That's the tradeoff for
                      sharding.  Are those 2B objects in one bucket? 
                      How's your read and write performance compared to
                      a bucket with a lower number (thousands) of
                      objects, with that shard number?

                    
                      On Tue, May 1, 2018 at
                        7:59 AM, Katie Holly <8ld3jg4d@xxxxxx>
                        wrote:

                        One of our radosgw
                          buckets has grown a lot in size, `rgw bucket
                          stats --bucket $bucketname` reports a total of
                          2,110,269,538 objects with the bucket index
                          sharded across 32768 shards, listing the root
                          context of the bucket with `s3 ls
                          s3://$bucketname` takes more than an hour
                          which is the hard limit to first-byte on our
                          nginx reverse proxy and the aws-cli times out
                          long before that timeout limit is hit.

                          
                          The software we use supports sharding the data
                          across multiple s3 buckets but before I go
                          ahead and enable this, has anyone ever had
                          that many objects in a single RGW bucket and
                          can let me know how you solved the problem of
                          RGW taking a long time to read the full index?

                          
                              -- 

                              Best regards

                              
                              Katie Holly

                              _______________________________________________

                              ceph-users mailing list

                              ceph-users@xxxxxxxxxxxxxx

                              http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                            
                    _______________________________________________

                    ceph-users mailing list

                    ceph-users@xxxxxxxxxxxxxx

                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                  
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com