Re: RGW listing slower on nominally faster setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stefan;

I can't find it, but I seem to remember a discussion in this mailing list that sharded RGW performance is significantly better if the shard count is a power of 2, so you might try increasing shards to 64.

Also, you might looks at OSD logs while a listing is trying to run, to see if this illuminates anything for you.

You said: "2 x SATA SSDs for RGW index pool," but do you have the zone's index pool running on a rule which only targets SSDs, or only targets those SSDs?  Are you running your RGW multi-site?  Are you running replication for RGW in multi-site?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
DHilsbos@xxxxxxxxxxxxxx 
www.PerformAir.com


-----Original Message-----
From: Stefan Wild [mailto:swild@xxxxxxxxxxxxx] 
Sent: Wednesday, June 10, 2020 6:05 PM
To: ceph-users@xxxxxxx
Subject:  RGW listing slower on nominally faster setup

Hi everyone,

We are currently transitioning from a temporary machine to our production hardware. Since we're starting with under 200 TB raw storage, we are currently on only 1–2 physical machines per cluster, eventually in 3 zones. The temporary machine is undersized for even that with an older single 6-core CPU and spinning disks only. As of now that "cluster-of-one" is running on Nautilus and has 3 buckets with 98K, 1.1M and 1.4M objects, respectively for a total of 9.1 TB. As we're expecting these to grow to around 5M objects each and will be in a multisite configuration, I went with 50 shards per bucket.

Listing "directories" via S3 is somewhat slow (sometimes to the point of read timeouts) but mostly bearable. After the new production setup (dual 8-core/16-thread Xeon Silvers, 2 x SATA SSDs for RGW index pool, on Octopus, with enough free memory to easily fit all bucket indexes multiple times) synced successfully, listings via S3 always time out on the RGW on that machine/zone.

As soon as I trigger a single listing via S3 (even on the 98K object bucket), reads go up to a sustained 300–500MB/s and 20–50K IOPS on the bucket index pool for several hours. The RGW debug log is flooded with lines like this:

{"log":"debug 2020-06-08T19:31:08.315+0000 7f83d704c700  1 RGWRados::Bucket::List::list_objects_ordered INFO ordered bucket listing requires read #1\n","stream":"stdout","time":"2020-06-08T19:31:08.317198682Z"}

I get that sharded RGW indexes (and listing objects in S3 buckets in general) are not very efficient, but after getting somewhat decent results on slower hardware and an older Ceph version, I wasn't expecting the nominally much better setup to be orders of magnitude slower.

Any help or pointers would be greatly appreciated.

Thank you,
Stefan


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux