Re: Performance issues RGW (S3)

sinan@xxxxxxxx · Tue, 11 Jun 2024 21:42:36 +0200

On 2024-06-11 01:01, Anthony D'Atri wrote:
To be clear, you don't need more nodes.  You can add RGWs to the ones 
you already have.  You have 12 OSD nodes - why not put an RGW on 
each?

Might be an option, just don't like the idea to host multiple 
components on nodes. But I'll consider it.

I really don't like mixing mon/mgr with other components because of 
coupled failure domains, and past experience with mon misbehavior, but 
many people do that.  ymmv.  With a bunch of RGWs none of them need 
grow to consume significant resources, and it can be difficult to get 
an RGW daemon to itself really use all of a dedicated node.

I am not sure adding more RGW's will increase the performance.

Just tested with 1 and with 2 RGW's:

Client 1 -> RGW Node A = 150-250 objects/s
Client 1 -> RGW Node A = 60-120 objects/s and simultaneously Client 2 -> 
RGW Node B = 60-120 objects/s. Together makes 150-250 objects/s.

So, it does not matter performance wise if I am using 1 or 2 RGW nodes.

Client 1 -> HAProxy -> 3 RGW's = 150-250 objects/s.

There are still serializations in the OSD and PG code.  You have 
240 OSDs, does your index pool have *at least* 256 PGs?
Index as the data pool has 256 PG's.
To be clear, that means whatever.rgw.buckets.index ?

No, sorry my bad. .index is 32 and .data is 256.

Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG 
replicas on each OSD?  You want (IMHO) to end up with 100-200, keeping 
each pool's pg_num to a power of 2 ideally.

No, my RBD pool is larger. My average PG per OSD is round 60-70.

Assuming all your pools span all OSDs, I suggest at a minimum 256 for 
.index and 8192 for .data, assuming you have only RGW pools.  And would 
be included to try 512 / 8192.  Assuming your  other minor pools are at 
32, I'd bump .log and .non-ec to 128 or 256 as well.

If you have RBD or other pools colocated, those numbers would change.

^ above assume disabling the autoscaler

I bumped my .data pool from 256 to 1024 and .index from 32 to 128. Also 
doubled the .non-e and .log pools. Performance wise I don't see any 
improvement. If I would see 10-20% improvement, I definitely would 
increase it to 512 / 8192.
With 0.5MB object size I am still limited at about 150 up to 250 
objects/s.

The disks aren't saturated. The wr await is mostly around 1ms and does 
not get higher when benchmarking with S3.

Other suggestions, or does anyone else has suggestions?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx