On 2024-06-11 01:01, Anthony D'Atri wrote:
To be clear, you don't need more nodes. You can add RGWs to the ones
you already have. You have 12 OSD nodes - why not put an RGW on
each?
Might be an option, just don't like the idea to host multiple
components on nodes. But I'll consider it.
I really don't like mixing mon/mgr with other components because of
coupled failure domains, and past experience with mon misbehavior, but
many people do that. ymmv. With a bunch of RGWs none of them need
grow to consume significant resources, and it can be difficult to get
an RGW daemon to itself really use all of a dedicated node.
I am not sure adding more RGW's will increase the performance.
Just tested with 1 and with 2 RGW's:
Client 1 -> RGW Node A = 150-250 objects/s
Client 1 -> RGW Node A = 60-120 objects/s and simultaneously Client 2 ->
RGW Node B = 60-120 objects/s. Together makes 150-250 objects/s.
So, it does not matter performance wise if I am using 1 or 2 RGW nodes.
Client 1 -> HAProxy -> 3 RGW's = 150-250 objects/s.
There are still serializations in the OSD and PG code. You have
240 OSDs, does your index pool have *at least* 256 PGs?
Index as the data pool has 256 PG's.
To be clear, that means whatever.rgw.buckets.index ?
No, sorry my bad. .index is 32 and .data is 256.
Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG
replicas on each OSD? You want (IMHO) to end up with 100-200, keeping
each pool's pg_num to a power of 2 ideally.
No, my RBD pool is larger. My average PG per OSD is round 60-70.
Assuming all your pools span all OSDs, I suggest at a minimum 256 for
.index and 8192 for .data, assuming you have only RGW pools. And would
be included to try 512 / 8192. Assuming your other minor pools are at
32, I'd bump .log and .non-ec to 128 or 256 as well.
If you have RBD or other pools colocated, those numbers would change.
^ above assume disabling the autoscaler
I bumped my .data pool from 256 to 1024 and .index from 32 to 128. Also
doubled the .non-e and .log pools. Performance wise I don't see any
improvement. If I would see 10-20% improvement, I definitely would
increase it to 512 / 8192.
With 0.5MB object size I am still limited at about 150 up to 250
objects/s.
The disks aren't saturated. The wr await is mostly around 1ms and does
not get higher when benchmarking with S3.
Other suggestions, or does anyone else has suggestions?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx